I want to do a Batch DB input because the table in the DB I'm pulling from deletes records instead of marks them with a soft deleted column. So what i want to do is pull the whole table in every 10 minutes with a batch db input, but then when i query the indexed data i want to only look at the latest dataset pulled in the batch process, so that I don't include records that were deleted in the DB.
Is this possible? Does splunk have some Batch IDs where i can just use the latest batch?
If there's no way of differentiating the data between queries, here's a couple of ways I would approach this.
_indextime
When querying your data in Splunk, you could specify that you only want to see data that was indexed in the last 10 minutes. Example
index=_internal earliest=-10m latest=now() | addinfo | where ((_indextime > info_min_time) AND (_indextime < info_max_time))
sysdate()
Or, when running your batch input SQL query, create a timestamp using sysdate()
. Example(Using MySQL):
SELECT
column1,
column2,
UNIX_TIMESTAMP(sysdate()) AS query_time,
column3
FROM table1
This would mean, every event that is indexed would contain the timestamp(query_time) of when the query was executed on the database. This method will rely on your database server time being the same as your Splunk server time.
Then your Splunk search could be:
index=_internal earliest=-10m latest=now() | addinfo | where ((query_time > info_min_time) AND (querytime < info_max_time))
I hope this helps.
These are great tips. Thank you very much for the help.