Splunk Search

Question on how to prevent an event from being reindexed

dsadowski
New Member

I have a web application that produces a fairly complicate log structure that looks something like the following.

{ "total":6789, data:[{e1}. {e2}, {e3}] }

I have a python script that's scraping the application every few minutes to get the json out of the web app and onto the file system. The structure of the file looks something like the following.

I've been able to break the events out of the data section in the array so that splunk can index the individual {e1}, {e2}, {e3} events. The problem that I am facing is that each time that scraping script runs, I am getting duplicate events. I seem to get the same event repeated n-times until it rolls out of the log.

I think that the problem is that over time the events 'move' through the log files so it looks to splunk like the file is always changing.

Over time, the files look something like the following:
{ "total":6743, data:[{e1}. {e2}, {e3}] }
{ "total":6522, data:[{e2}. {e3}, {e4}] }
{ "total":6456, data:[{e3}. {e4}, {e5}] }

Which seems to make splunk index e3 three times.

Is there an easy way to keep Splunk from reindexing the events that it already has seen without having to get do a bunch of diffing in scripting to filter out the duplicate events?

Thanks,
Dan

Tags (1)
0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...