Splunk Search

Question on how to prevent an event from being reindexed

dsadowski
New Member

I have a web application that produces a fairly complicate log structure that looks something like the following.

{ "total":6789, data:[{e1}. {e2}, {e3}] }

I have a python script that's scraping the application every few minutes to get the json out of the web app and onto the file system. The structure of the file looks something like the following.

I've been able to break the events out of the data section in the array so that splunk can index the individual {e1}, {e2}, {e3} events. The problem that I am facing is that each time that scraping script runs, I am getting duplicate events. I seem to get the same event repeated n-times until it rolls out of the log.

I think that the problem is that over time the events 'move' through the log files so it looks to splunk like the file is always changing.

Over time, the files look something like the following:
{ "total":6743, data:[{e1}. {e2}, {e3}] }
{ "total":6522, data:[{e2}. {e3}, {e4}] }
{ "total":6456, data:[{e3}. {e4}, {e5}] }

Which seems to make splunk index e3 three times.

Is there an easy way to keep Splunk from reindexing the events that it already has seen without having to get do a bunch of diffing in scripting to filter out the duplicate events?

Thanks,
Dan

Tags (1)
0 Karma
Get Updates on the Splunk Community!

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...