Knowledge Management

Why is there a discrepancy in data being pulled from DbConnect app vs the events in Splunk?

juhiacc
Explorer

We have configured DBConnect data from MySQL db under some index at hourly frequency.
Data is being pulled however we see that the count of Splunk events is much higher than the count of rows in its respective table.
This is due to the fact that the SQL table is real-time in nature and always have the entries updating, whereas, Splunk keeps storing the entries as per the hourly execution frequency. So as a result, Splunk will have historical events too which currently is not present in SQL table.

We need to counter this situation as we plan to build some analytics report on this data so it has to be true and updated in Splunk as well.

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Both sources are true in their own way - as you pointed out, one is the real-time state of the data and the other is the historic record of the data.

In order to "counter" this, what are you trying to achieve? Do you just need to find the latest historic event for each and every item?

| stats latest(*) as * by item_key
0 Karma

juhiacc
Explorer

Thanks for responding.  The stats latest command will still show us the entries which no longer are in table as it got removed from the main table. For example, if there is a record -X in table at 1pm , and it got removed at 4pm from table. Splunk will still be showing it as our splunk events will be fetching everything.

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Can you identify the action which deleted the entry from the events?

| stats latest(*) as * by item_key
| where action != "delete"
0 Karma

juhiacc
Explorer

There is no such column/field as  'action'.

The problem is that the entry will be deleted from the main table if someone performs any DML, but splunk still will be having that entry as part of  events. 

In short if table has 100 entries, we only want to see 100 entries in Splunk too at that point of time. But if after sometime table has 99 entries (1 row gets deleted) then also we want splunk to show 99 entries to us.

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

If you are retrieving the data every hour, have you timestamped the entries in splunk so you can identify which events come from which extract?

0 Karma

juhiacc
Explorer

Yes timestamp of splunk has been matched with the timestamp column of table and data is as old as 2021. So we have to use 'All Time' in Splunk while searching for entire data. 
Thus it brings in that data as well which now may no longer is been present in the table.

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Have you checked _indextime to see if it represents the time the dbconnect was done? If so, you might be able to use that to filter the events.

0 Karma

juhiacc
Explorer

Sure, Let me try this out and revert.

Tags (1)
0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...