Knowledge Management

Why is there a discrepancy in data being pulled from DbConnect app vs the events in Splunk?

juhiacc
Explorer

We have configured DBConnect data from MySQL db under some index at hourly frequency.
Data is being pulled however we see that the count of Splunk events is much higher than the count of rows in its respective table.
This is due to the fact that the SQL table is real-time in nature and always have the entries updating, whereas, Splunk keeps storing the entries as per the hourly execution frequency. So as a result, Splunk will have historical events too which currently is not present in SQL table.

We need to counter this situation as we plan to build some analytics report on this data so it has to be true and updated in Splunk as well.

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Both sources are true in their own way - as you pointed out, one is the real-time state of the data and the other is the historic record of the data.

In order to "counter" this, what are you trying to achieve? Do you just need to find the latest historic event for each and every item?

| stats latest(*) as * by item_key
0 Karma

juhiacc
Explorer

Thanks for responding.  The stats latest command will still show us the entries which no longer are in table as it got removed from the main table. For example, if there is a record -X in table at 1pm , and it got removed at 4pm from table. Splunk will still be showing it as our splunk events will be fetching everything.

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Can you identify the action which deleted the entry from the events?

| stats latest(*) as * by item_key
| where action != "delete"
0 Karma

juhiacc
Explorer

There is no such column/field as  'action'.

The problem is that the entry will be deleted from the main table if someone performs any DML, but splunk still will be having that entry as part of  events. 

In short if table has 100 entries, we only want to see 100 entries in Splunk too at that point of time. But if after sometime table has 99 entries (1 row gets deleted) then also we want splunk to show 99 entries to us.

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

If you are retrieving the data every hour, have you timestamped the entries in splunk so you can identify which events come from which extract?

0 Karma

juhiacc
Explorer

Yes timestamp of splunk has been matched with the timestamp column of table and data is as old as 2021. So we have to use 'All Time' in Splunk while searching for entire data. 
Thus it brings in that data as well which now may no longer is been present in the table.

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Have you checked _indextime to see if it represents the time the dbconnect was done? If so, you might be able to use that to filter the events.

0 Karma

juhiacc
Explorer

Sure, Let me try this out and revert.

Tags (1)
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...