Building for the Splunk Platform

How can I remove a second log entry from a query that seems to be a duplicate.

jkinny
New Member

I am using the transaction function to group several log entries by a 'claimID' field. I've noticed that when I do this, for each 'claimID' I am getting an extra log entry that gets returned 'outside' of the transaction. For all intents and purposes this second entry is a duplicate of the 'endswith' parameter of the transaction. I've tried using the 'dedup' function by the 'claimID' field, but it only seems to throw away the oldest result and keep the newest, as in, it removes the transaction block that I want, but keeps the duplicate. (in the attached screenshot, dedup would keep the top entry and remove the bottom transaction-fied entry)
alt text
query: index=prd sourcetype=app source="/logs/app.log" ("editCode=CA010" OR "status=A") | transaction claimID endswith="status=A"

Tags (1)
0 Karma
1 Solution

tom_frotscher
Builder

Hi,

you can still use dedup. Just use the sortby clause of dedup. For your example, you could use | dedup claimID sortby +_time

You can also use sortby -_time if you need it the other way around for other searches.

Over all, i think you can improve your search in regard to performance by using a clever combination of stats or streamstats and some more splunk spl magic. If needed, you get some more examples here: https://answers.splunk.com/answers/103/transaction-vs-stats-commands.html and here https://www.splunk.com/blog/2012/11/29/book-excerpt-when-to-use-transaction-and-when-to-use-stats.ht...

Greetings,

Tom

View solution in original post

0 Karma

tom_frotscher
Builder

Hi,

you can still use dedup. Just use the sortby clause of dedup. For your example, you could use | dedup claimID sortby +_time

You can also use sortby -_time if you need it the other way around for other searches.

Over all, i think you can improve your search in regard to performance by using a clever combination of stats or streamstats and some more splunk spl magic. If needed, you get some more examples here: https://answers.splunk.com/answers/103/transaction-vs-stats-commands.html and here https://www.splunk.com/blog/2012/11/29/book-excerpt-when-to-use-transaction-and-when-to-use-stats.ht...

Greetings,

Tom

0 Karma

jkinny
New Member

Thanks for the response Tom, I'll respond to your second answer first, unless I'm misunderstanding, I think that based on what I'm trying to do, I might need to stick with transaction. I'm basically trying to find instances where status=A exists, but there was no prior log message for the same claimID that shows editCode=CA010. The last section to the query here should turn up any 'orphans' (I don't know what you call an orphan that has a closing entry, but no beginning entry).

so I tried adding both | dedup claimID sortby +_time and | dedup claimID sortby -_time to the end of my query, and unfortunately both seem to return the same result, and neither is the one that I want. If you look at my screenshot, adding the dedup/sortby command is returning only the top result in my log (@ .716) not the bottom (@.256)

0 Karma

tom_frotscher
Builder

Thats strange,
i tried it for myself with a transaction search and it worked like i thought it would work.

However, you can still do some streamstats and search magic. Append something like this to your search:

| sort limit=0 _time | streamstats count by group | search count=1 | sort limit=0 -_time

Does this help?

0 Karma

jkinny
New Member

Thanks!! It turned out to be a mix, I ended up doing this: | sort limit=0 _time | dedup claimID sortby +_time

The 'sort limit' seems to have arranged the list in the correct order for the 'sortby' to work correctly this time. I honestly don't know why your first answer didn't work, but glad to have gotten this anyway!

0 Karma

tom_frotscher
Builder

Nice!

be aware, sort is quite performance heavy. Therefore, it is limited to 10,000 events by default, which in this case we exceeded with the help of the limit=0 option.

But if your search is still fast enough for your use case, you now have a solution 🙂

Get Updates on the Splunk Community!

Splunk Training for All: Meet Aspiring Cybersecurity Analyst, Marc Alicea

Splunk Education believes in the value of training and certification in today’s rapidly-changing data-driven ...

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...

Observability Highlights | January 2023 Newsletter

 January 2023New Product Releases Splunk Network Explorer for Infrastructure MonitoringSplunk unveils Network ...