Getting Data In

How to exclude duplicate events from beign summary indexed

desi
New Member

Hello, i have log comming in, which i use to create summary index,
here is the flow:

i get some logs

location=1 starttime=2011-09-26T05:10:00
location=1 starttime=2011-10-26T05:10:00
location=2 starttime=2011-09-26T05:20:00

create summary index from above data based on _indextime in last hour

after couple of days i get a duplicate of first record

location=1 starttime=2011-09-26T05:00:00

i want above record to be excluded from being used in summmary index,

basically what i need is if a record has more than one copy in last 30 days then do not use it in summary index query, even its _indextime is in last hour

thanks

0 Karma

sroux
New Member

4 years later in 2015 I use "dedup".
Example :

source=sourceA  Hostname="myhost" | join Hostname [search source=sourceB] | dedup Application Hostname Produit| table Application Hostname Produit

Hope it will help someone

0 Karma

RyanAdams
Engager

I'm not 100% clear on your questions, but here is my best shot from what I understand.

You could edit the search that is generatign the summary results to include a join with the existing events in the summary index (with earliest=-30d). Then exclude those events found. For example:

index=testindex inSummary="F" | join location, starttime type=inner [search index=testindex_summary earliest=-30d | eval inSummary="T"] | search inSummary="False"

Hopefully that gives you a starting point.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...