Getting Data In

Event logs getting duplicated every collection

clintla
Contributor

I have a duplication problem w/ splunk- I know why but not how to fix it.

I have an event log that I have to extract every hour (entire log every time).

so if I get an error im searching for.

10:00am target event.

An hour later my script runs & I get a few more events it counts the previous
events with my time chart.

10:00am target event

11:05am target event.

Now my chart is incorrect due to now the previous error count -1 is now 2 since it has been re-indexed (3 events total). these logs go back a year (I cant clear them every time on the
device which would make it easy)
It should read 1 target event per hour but every hour- the previous events get doubled making
the chart incredibly inaccurate.

10:00am (2 events)

11:05am (1 event)

Seems like this would be something easy to fix- things like followtail dont seem to apply though.

If I could dedup on a chart that would be good. Dont think that works in a chart though. Anyone have another solution?

Tags (3)
0 Karma

kristian_kolb
Ultra Champion

I think you should rather try to fix your script so that it does not read the entire file each time it runs (if possible). By getting the input data correct, you don't have to worry about 'fixing' the output of your searches. Also, re-indexing the entire file each time consumes part of your license. That may not be a big issue if you read a small log file once an hour, but someday you might be asked to run the script once a minute....

/kristian

Takajian
Builder

Could you try following dedup command? I think this remove duplicated events. Please let me know if this help in your environment.

... | dedup _raw

Takajian
Builder

In your case, how it works? The dedup will be before timechart command.

sourcetype="getlog" | dedup _raw| timechart count by host useother="f"

But dedup command is a kinds of workaround. The ideal is to fix your script.

0 Karma

clintla
Contributor

sourcetype="getlog" | timechart count by host useother="f" | dedup _raw

add "| dedup _raw" & it goes from that sloped output to "No results"- that seems like it should have worked but I dont understand fully the command I guess.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Mile High Learning with Splunk University, Denver, Colorado

If Denver is known for its mile-high elevation, Splunk University is about to raise the bar on technical ...

IT Service Intelligence 5.0 Series: Your Guide to the June Launch

We are excited to announce the June release of Splunk IT Service Intelligence (ITSI) 5.0. This update ...

Agent Mode Engaged! Enchaining Agentic Operations with Splunk AI Assistant 2.0

    Are you ready to transform how your team handles complex data requests? We invite you to our upcoming ...