I want to do a Batch DB input because the table in the DB I'm pulling from deletes records instead of marks them with a soft deleted column. So what i want to do is pull the whole table in every 10 minutes with a batch db input, but then when i query the indexed data i want to only look at the latest dataset pulled in the batch process, so that I don't include records that were deleted in the DB.
Is this possible? Does splunk have some Batch IDs where i can just use the latest batch?
If there's no way of differentiating the data between queries, here's a couple of ways I would approach this.
When querying your data in Splunk, you could specify that you only want to see data that was indexed in the last 10 minutes. Example
index=_internal earliest=-10m latest=now() | addinfo | where ((_indextime > info_min_time) AND (_indextime < info_max_time))
Or, when running your batch input SQL query, create a timestamp using sysdate(). Example(Using MySQL):
UNIX_TIMESTAMP(sysdate()) AS query_time,
This would mean, every event that is indexed would contain the timestamp(query_time) of when the query was executed on the database. This method will rely on your database server time being the same as your Splunk server time.
Then your Splunk search could be:
index=_internal earliest=-10m latest=now() | addinfo | where ((query_time > info_min_time) AND (querytime < info_max_time))