Splunk Search

Why does the position of my dedup command in a query alter the results of my statistics?

net1993
Path Finder

My problem is that I cannot understand why I get a different statistics number depending on wether I place the dedup command before or after sort command.

  1. query:

    host="web_application" status=200  action=purchase* file=succ* 
    | table  JSESSIONID action  status 
    | rename JSESSIONID as "UserSessions" 
    | sort "UserSessions"
    | dedup "UserSessions"
    Results:
    Statistics: (3569)
    

    query 2

    host="web_application" status=200 action=purchase* file=succ*
    | table JSESSIONID action status
    | rename JSESSIONID as "UserSessions"
    | dedup "UserSessions"
    | sort "UserSessions"
    statistic count: (5726)
    why is there a difference between the two queries when the only difference is the location of dedup?

0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

The sort command has a default limit of 10,000 events. Your first search is probably hitting that limit and then removing duplicates from the 10,000. The second search removes duplicates from (say) 20,000 events and so produces the larger number. You can verify this by examining the Job Inspector output for each search.

---
If this reply helps you, an upvote would be appreciated.

View solution in original post

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The sort command has a default limit of 10,000 events. Your first search is probably hitting that limit and then removing duplicates from the 10,000. The second search removes duplicates from (say) 20,000 events and so produces the larger number. You can verify this by examining the Job Inspector output for each search.

---
If this reply helps you, an upvote would be appreciated.

View solution in original post

0 Karma

net1993
Path Finder

omg so sort command just decide to remove results by itself.

How do I remove that function /limit of sort. Are there some other commands that do that self behaviour I need to be aware of ?

I attach pictures of job inspect for the two queries but I am bit confused where do I start debugging from? what is the order of the steps in job inspect?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Sort doesn't just randomly delete results. It has a limit to the number of results it can process. This limit is documented in the Search Reference manual as is the limit=n option to change it.

I don't see any attached pictures.

The Job Inspect is a bit of a challenge to read, especially for new-comers. Commands are listed in alphabetical rather than chronological order. I usually go by the "in" and "out" numbers in the two right-most columns. In your case, however, you need only look at command.sort and command.dedup to see how many results each command is processing.

---
If this reply helps you, an upvote would be appreciated.
0 Karma

net1993
Path Finder

Yes, you are right. Sort has 10k default input and then rest is removed from result set .
I fix with sort 0

0 Karma

richgalloway
SplunkTrust
SplunkTrust

@net1993 If your problem is resolved, please accept the answer to help future readers.

---
If this reply helps you, an upvote would be appreciated.
0 Karma

net1993
Path Finder

ok . that I think is really stupid for sort command. How can I change remove the default limit behaviour ?

0 Karma