In an running a command which uses the dedup
command:
index=myindex earliest=-5d@d latest=@d |
bin _time span=1d |
dedup id, _time | stats count
The above query returns 794.
However, if I add a table
command before dedup:
index=myindex earliest=-5d@d latest=@d |
bin _time span=1d |
table id, _time | dedup id, _time | stats count
The result is 798! This really puzzles me as I don't expect the table command will change my answer. Am I missing something? Or is it a bug?
If you use rex to extract id from _raw, and then run the two searches from the results of rex what do you get? If you don't know how to use rex, then post an example event.
Without rex, after you run these searches, what id count to you see in Interesting Fields?
Have you identified the 4 combinations of id and _time that are getting missed - is it a difference in id or _time?
It may be because with the table command the only two fields available are Id and time and without it has all fields? Dedup keeps the first event with the specified fields and dumps the rest.
Not sure if that's right but it's the best I've got.
https://docs.splunk.com/Documentation/Splunk/6.5.0/SearchReference/Dedup
To make sure there isn't any "missing values"-related problem, I changed my queries to:
index=myindex earliest=-5d@d latest=@d |
bin _time span=1d | search id=* AND _time=* |
dedup id, _time | stats count
AND
index=myindex earliest=-5d@d latest=@d |
bin _time span=1d | search id=* AND _time=* |
table id, _time | dedup id, _time | stats count
But once again, they gave different result (794 and 798 respectively). 😞
What I meant was that the table is only bringing back two fields and all other fields are lost and when you run the dedup without table, all other fields are still available (same with when you use "fields" instead of table). That could b e why. Not that field values are missing, but that fields themselves are gone.
Try adding a dc(id) by _time to your searches and see how many ids you actually have per _time
If I run this:
index=myindex earliest=-5d@d latest=@d |
bin _time span=1d | search id=* AND _time=* |
table id, _time | stats dc(id) by _time
I got identical results without or without the table
command. But again, replace dc
by dedup id, _time | stats count
then I got different answer when I have the table
.
Regarding the worry that "table is only bringing back two fields, as a try I changed that part to table id, _time, host, source
, but still the problem is isn't resolved. So I think it's not about "table is only bringing back two fields".
If you tried |table *|stats count
Are your results different? Unless the only fields in your index are id, _time,host and source
Again, |table *|dedup id, _time|stats count
and |fields *|dedup id, _time|stats count
give different results, with the table
answer always larger than the fields
one.
And there are many other fields in the events.
so |table *
gives you 798 and |fields *
gives you 794?
this is quite the tricky widget...
How about this query's result?
index=myindex earliest=-5d@d latest=@d |
bin _time span=1d |
fields id, _time | dedup id, _time | stats count
AND
index=myindex earliest=-5d@d latest=@d id=* |
bin _time span=1d |
table id, _time | dedup id, _time | stats count
Thanks. Actually I tried your first one as well and it also gave 794, the same answer from the query without the fields id, _time
part.
Is it possible the event list changed between the two queries? Try running them each with the same explicit start and end times (use the time selector and remove the earliest
and latest
keywords from the query).
Thanks, but I don't think the event list was changing because the events are updated once a day in a nightly batch job. And I switched between these two queries many times (to debug them) and I always got back 794 and 798 respectively.
And since I use the @d
in both the earliest
and latest
, the time range was fixed while I was debugging it.