Dedup lost data, a bug?

april_tao · ‎09-26-2014

For below search :

eventtype=MYTYPE [search eventtype=MYTYPE | sort 0 _time desc | dedup fieldX | return 1000 source]

Expect to return the latest source for 1 fieldX value.

In our data, we have over 10,000,000 events for the latest source with fieldX=A, fieldX=B, fieldX=C respectively.
Thus expect the search returns over 30,000,000 results.
However, it returns the results for fieldX=A only.

Question : is the search correctly written? If yes, is this a bug of dedup? Is there any limitation of dedup about the result size? If we use a smaller dataset, dedup works properly with the same search.

lguinn2 · ‎09-27-2014

The problem is that the subsearch has a limit - and I don't see that you need the subsearch at all. You also do not need the sort, Splunk returns events in reverse time order (newest first) by default.Try this

eventtype=MYTYPE | dedup fieldX

This will return the most recent event for each value of fieldX.

Dedup lost data, a bug?

The OpenTelemetry Certified Associate (OTCA) Exam

From Manual to Agentic: Level Up Your SOC at Cisco Live

Splunk Classroom Chronicles: Training Tales and Testimonials (Episode 4)

Join the Conversation

Dedup lost data, a bug?

The OpenTelemetry Certified Associate (OTCA) Exam

From Manual to Agentic: Level Up Your SOC at Cisco Live

Splunk Classroom Chronicles: Training Tales and Testimonials (Episode 4)