Splunk Search

Question on how to prevent an event from being reindexed

dsadowski
New Member

I have a web application that produces a fairly complicate log structure that looks something like the following.

{ "total":6789, data:[{e1}. {e2}, {e3}] }

I have a python script that's scraping the application every few minutes to get the json out of the web app and onto the file system. The structure of the file looks something like the following.

I've been able to break the events out of the data section in the array so that splunk can index the individual {e1}, {e2}, {e3} events. The problem that I am facing is that each time that scraping script runs, I am getting duplicate events. I seem to get the same event repeated n-times until it rolls out of the log.

I think that the problem is that over time the events 'move' through the log files so it looks to splunk like the file is always changing.

Over time, the files look something like the following:
{ "total":6743, data:[{e1}. {e2}, {e3}] }
{ "total":6522, data:[{e2}. {e3}, {e4}] }
{ "total":6456, data:[{e3}. {e4}, {e5}] }

Which seems to make splunk index e3 three times.

Is there an easy way to keep Splunk from reindexing the events that it already has seen without having to get do a bunch of diffing in scripting to filter out the duplicate events?

Thanks,
Dan

Tags (1)
0 Karma
Get Updates on the Splunk Community!

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Watch On Demand the Tech Talk on November 6 at 11AM PT, and empower your SOC to reach new heights! Duration: ...

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...