Splunk Search
Highlighted

dedup gives different result if a 'table' command is used before it. A bug??

Explorer

In an running a command which uses the dedup command:

index=myindex earliest=-5d@d latest=@d | 
bin _time span=1d | 
dedup id, _time | stats count

The above query returns 794.

However, if I add a table command before dedup:

index=myindex earliest=-5d@d latest=@d | 
bin _time span=1d | 
table id, _time | dedup id, _time | stats count

The result is 798! This really puzzles me as I don't expect the table command will change my answer. Am I missing something? Or is it a bug?

0 Karma
Highlighted

Re: dedup gives different result if a 'table' command is used before it. A bug??

SplunkTrust
SplunkTrust

Is it possible the event list changed between the two queries? Try running them each with the same explicit start and end times (use the time selector and remove the earliest and latest keywords from the query).

---
If this reply helps you, an upvote would be appreciated.
0 Karma
Highlighted

Re: dedup gives different result if a 'table' command is used before it. A bug??

Explorer

Thanks, but I don't think the event list was changing because the events are updated once a day in a nightly batch job. And I switched between these two queries many times (to debug them) and I always got back 794 and 798 respectively.

And since I use the @d in both the earliest and latest, the time range was fixed while I was debugging it.

0 Karma
Highlighted

Re: dedup gives different result if a 'table' command is used before it. A bug??

SplunkTrust
SplunkTrust

How about this query's result?

index=myindex earliest=-5d@d latest=@d | 
 bin _time span=1d | 
 fields id, _time | dedup id, _time | stats count

AND

index=myindex earliest=-5d@d latest=@d id=* | 
 bin _time span=1d | 
 table id, _time | dedup id, _time | stats count
0 Karma
Highlighted

Re: dedup gives different result if a 'table' command is used before it. A bug??

Explorer

Thanks. Actually I tried your first one as well and it also gave 794, the same answer from the query without the fields id, _time part.

0 Karma
Highlighted

Re: dedup gives different result if a 'table' command is used before it. A bug??

Super Champion

It may be because with the table command the only two fields available are Id and time and without it has all fields? Dedup keeps the first event with the specified fields and dumps the rest.

Not sure if that's right but it's the best I've got.

https://docs.splunk.com/Documentation/Splunk/6.5.0/SearchReference/Dedup

0 Karma
Highlighted

Re: dedup gives different result if a 'table' command is used before it. A bug??

Explorer

To make sure there isn't any "missing values"-related problem, I changed my queries to:

index=myindex earliest=-5d@d latest=@d | 
bin _time span=1d | search id=* AND _time=* |
dedup id, _time | stats count

AND

index=myindex earliest=-5d@d latest=@d | 
bin _time span=1d | search id=* AND _time=* |
table id, _time | dedup id, _time | stats count

But once again, they gave different result (794 and 798 respectively). 😞

0 Karma
Highlighted

Re: dedup gives different result if a 'table' command is used before it. A bug??

Super Champion

What I meant was that the table is only bringing back two fields and all other fields are lost and when you run the dedup without table, all other fields are still available (same with when you use "fields" instead of table). That could b e why. Not that field values are missing, but that fields themselves are gone.

0 Karma
Highlighted

Re: dedup gives different result if a 'table' command is used before it. A bug??

Super Champion

Try adding a dc(id) by _time to your searches and see how many ids you actually have per _time

0 Karma
Highlighted

Re: dedup gives different result if a 'table' command is used before it. A bug??

Explorer

If I run this:

index=myindex earliest=-5d@d latest=@d | 
bin _time span=1d | search id=* AND _time=* |
table id, _time | stats dc(id) by _time

I got identical results without or without the table command. But again, replace dc by dedup id, _time | stats count then I got different answer when I have the table.

Regarding the worry that "table is only bringing back two fields, as a try I changed that part to table id, _time, host, source, but still the problem is isn't resolved. So I think it's not about "table is only bringing back two fields".

0 Karma