My problem is that I cannot understand why I get a different statistics number depending on wether I place the dedup command before or after sort command.
query:
host="web_application" status=200 action=purchase* file=succ*
| table JSESSIONID action status
| rename JSESSIONID as "UserSessions"
| sort "UserSessions"
| dedup "UserSessions"
Results:
Statistics: (3569)
query 2
host="web_application" status=200 action=purchase* file=succ*
| table JSESSIONID action status
| rename JSESSIONID as "UserSessions"
| dedup "UserSessions"
| sort "UserSessions"
statistic count: (5726)
why is there a difference between the two queries when the only difference is the location of dedup?
The sort
command has a default limit of 10,000 events. Your first search is probably hitting that limit and then removing duplicates from the 10,000. The second search removes duplicates from (say) 20,000 events and so produces the larger number. You can verify this by examining the Job Inspector output for each search.
The sort
command has a default limit of 10,000 events. Your first search is probably hitting that limit and then removing duplicates from the 10,000. The second search removes duplicates from (say) 20,000 events and so produces the larger number. You can verify this by examining the Job Inspector output for each search.
omg so sort command just decide to remove results by itself.
How do I remove that function /limit of sort. Are there some other commands that do that self behaviour I need to be aware of ?
I attach pictures of job inspect for the two queries but I am bit confused where do I start debugging from? what is the order of the steps in job inspect?
Sort doesn't just randomly delete results. It has a limit to the number of results it can process. This limit is documented in the Search Reference manual as is the limit=n
option to change it.
I don't see any attached pictures.
The Job Inspect is a bit of a challenge to read, especially for new-comers. Commands are listed in alphabetical rather than chronological order. I usually go by the "in" and "out" numbers in the two right-most columns. In your case, however, you need only look at command.sort and command.dedup to see how many results each command is processing.
Yes, you are right. Sort has 10k default input and then rest is removed from result set .
I fix with sort 0
@net1993 If your problem is resolved, please accept the answer to help future readers.
ok . that I think is really stupid for sort command. How can I change remove the default limit behaviour ?