Knowledge Management

Why is there a discrepancy in data being pulled from DbConnect app vs the events in Splunk?

juhiacc
Explorer

We have configured DBConnect data from MySQL db under some index at hourly frequency.
Data is being pulled however we see that the count of Splunk events is much higher than the count of rows in its respective table.
This is due to the fact that the SQL table is real-time in nature and always have the entries updating, whereas, Splunk keeps storing the entries as per the hourly execution frequency. So as a result, Splunk will have historical events too which currently is not present in SQL table.

We need to counter this situation as we plan to build some analytics report on this data so it has to be true and updated in Splunk as well.

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Both sources are true in their own way - as you pointed out, one is the real-time state of the data and the other is the historic record of the data.

In order to "counter" this, what are you trying to achieve? Do you just need to find the latest historic event for each and every item?

| stats latest(*) as * by item_key
0 Karma

juhiacc
Explorer

Thanks for responding.  The stats latest command will still show us the entries which no longer are in table as it got removed from the main table. For example, if there is a record -X in table at 1pm , and it got removed at 4pm from table. Splunk will still be showing it as our splunk events will be fetching everything.

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Can you identify the action which deleted the entry from the events?

| stats latest(*) as * by item_key
| where action != "delete"
0 Karma

juhiacc
Explorer

There is no such column/field as  'action'.

The problem is that the entry will be deleted from the main table if someone performs any DML, but splunk still will be having that entry as part of  events. 

In short if table has 100 entries, we only want to see 100 entries in Splunk too at that point of time. But if after sometime table has 99 entries (1 row gets deleted) then also we want splunk to show 99 entries to us.

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

If you are retrieving the data every hour, have you timestamped the entries in splunk so you can identify which events come from which extract?

0 Karma

juhiacc
Explorer

Yes timestamp of splunk has been matched with the timestamp column of table and data is as old as 2021. So we have to use 'All Time' in Splunk while searching for entire data. 
Thus it brings in that data as well which now may no longer is been present in the table.

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Have you checked _indextime to see if it represents the time the dbconnect was done? If so, you might be able to use that to filter the events.

0 Karma

juhiacc
Explorer

Sure, Let me try this out and revert.

Tags (1)
0 Karma
Get Updates on the Splunk Community!

Get ready to show some Splunk Certification swagger at .conf24!

Dive into the deep end of data by earning a Splunk Certification at .conf24. We're enticing you again this ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Now On-Demand Join us to learn more about how you can leverage Service Level Objectives (SLOs) and the new ...

Database Performance Sidebar Panel Now on APM Database Query Performance & Service ...

We’ve streamlined the troubleshooting experience for database-related service issues by adding a database ...