We have configured DBConnect data from MySQL db under some index at hourly frequency.
Data is being pulled however we see that the count of Splunk events is much higher than the count of rows in its respective table.
This is due to the fact that the SQL table is real-time in nature and always have the entries updating, whereas, Splunk keeps storing the entries as per the hourly execution frequency. So as a result, Splunk will have historical events too which currently is not present in SQL table.
We need to counter this situation as we plan to build some analytics report on this data so it has to be true and updated in Splunk as well.
Both sources are true in their own way - as you pointed out, one is the real-time state of the data and the other is the historic record of the data.
In order to "counter" this, what are you trying to achieve? Do you just need to find the latest historic event for each and every item?
| stats latest(*) as * by item_key
Thanks for responding. The stats latest command will still show us the entries which no longer are in table as it got removed from the main table. For example, if there is a record -X in table at 1pm , and it got removed at 4pm from table. Splunk will still be showing it as our splunk events will be fetching everything.
Can you identify the action which deleted the entry from the events?
| stats latest(*) as * by item_key | where action != "delete"
There is no such column/field as 'action'.
The problem is that the entry will be deleted from the main table if someone performs any DML, but splunk still will be having that entry as part of events.
In short if table has 100 entries, we only want to see 100 entries in Splunk too at that point of time. But if after sometime table has 99 entries (1 row gets deleted) then also we want splunk to show 99 entries to us.
If you are retrieving the data every hour, have you timestamped the entries in splunk so you can identify which events come from which extract?
Yes timestamp of splunk has been matched with the timestamp column of table and data is as old as 2021. So we have to use 'All Time' in Splunk while searching for entire data.
Thus it brings in that data as well which now may no longer is been present in the table.
Have you checked _indextime to see if it represents the time the dbconnect was done? If so, you might be able to use that to filter the events.