Splunk Search

Using dedup to keep the oldest events

acdevlin
Communicator

Hi all,

I know that the "dedup" command returns the most recent values in time. However, I'm currently in a situation where I want to use dedup to only keep the oldest events from my data (example below). I found the following thread which is identical to my question, but the proposed solution (sorting on +_time) does not seem to work for me.

What I specifically have are a bunch of client requests to a web server. Each event has an associated req_time and a session_id; many transactions can share the same session_id. What I want to do is call '...|dedup session_id' and have only the OLDEST transaction from each individual session_id be returned, rather than the NEWEST.

Any suggestions on how to accomplish this?

Tags (1)
0 Karma
1 Solution

David
Splunk Employee
Splunk Employee

I think you will find the sortby parameter to do this for you.

YourSearch | dedup session_id sortby +_time

Check out the docs for more ways you can tweak dedup:

http://www.splunk.com/base/Documentation/latest/SearchReference/Dedup

View solution in original post

fli
Explorer

maybe the correct is:

Your_search | reverse | dedup ...

David
Splunk Employee
Splunk Employee

I think you will find the sortby parameter to do this for you.

YourSearch | dedup session_id sortby +_time

Check out the docs for more ways you can tweak dedup:

http://www.splunk.com/base/Documentation/latest/SearchReference/Dedup

acdevlin
Communicator

Indeed it does! Thanks for the help David, and for confirming that I'm not going crazy.

0 Karma

David
Splunk Employee
Splunk Employee

Fortunately, if you need to grab the newest events after running a concurrency (or either way want to wrest control of your search's fate out from the hands of concurrency), you can work around this by creating another time field. I was able to do:

MySearch | eval MyTime = _time | concurrency duration=duration output=concurrentevents | dedup MyField sortby -MyTime

Without the same issue. Likewise, +MyTime works.

Does that get you where you need to be?

0 Karma

rashi83
Path Finder

Hi David,

I am in kind of same situation , I need to retrieve results for latest time instead of old events.
I performed search as -
index=x | eval sorttime=strptime('_time',"%m/%d/%Y %H:%M:%S%p")| sort -sorttime |dedup hostname compName +_time keepempty=true | xyseries hostname compName status

This should retrieve latest week / time results instead it's showing old week data

0 Karma

David
Splunk Employee
Splunk Employee

I just tried that, and can definitely confirm what you found. If you toss a concurrency before the dedup, it does return the same results as if you had done a sortby +_time. You should be able to override this by doing a sortby -_time, but that search failed for me ("job ... is a zombie and is no longer with us"). This appears to be a bug, where concurrency is doing some sort of work on _time, and breaking dedup.

0 Karma

acdevlin
Communicator

Thanks for the reply, David.

I mentioned that I tried this solution in my earlier question. For some reason, it did not work yesterday and only the oldest events were removed. However, it is working this morning to my pleasant surprise.

Any idea as to why that happened?


EDIT: Answered my own question, but I'm still mystified by it. The query which successfully returned the oldest events included some concurrency information that I had been playing around with.

... | eval timeout=1599 | ... | concurrency duration=timeout | dedup session_id

The above works. I have no idea why.

0 Karma
Get Updates on the Splunk Community!

Optimize Cloud Monitoring

  TECH TALKS Optimize Cloud Monitoring Tuesday, August 13, 2024  |  11:00AM–12:00PM PST   Register to ...

What's New in Splunk Cloud Platform 9.2.2403?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2403! Analysts can ...

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...