Here are two searches, which I think are logically equivalent, yet they return different results in Splunk.
Option 1: with a subsearch
index=web sourcetype=access_combined status<400
[ search index=web sourcetype=access_combined status>=400
| dedup clientip | fields clientip ]
| stats sum(bytes) as bytes by clientip
| stats avg(bytes) as avg_bytes, median(bytes) as median_bytes
Option 2: using an eval to replace the subsearch
index=web sourcetype=access_combined
| eval check=if(status>=400,"Bad","Okay")
| chart sum(bytes) as bytes by clientip check
| where Bad > 0
| stats avg(Okay) as avg_bytes, median(Okay) as median_bytes
The concept of both searches is the same: Identify IPs that have had HTTP errors in the previous week, and summarize the number of bytes of "successful" traffic, average and median during that timeframe. (My real search is slightly different, but this illustrates the problem perfectly.) Successful traffic is defined as status<400, and HTTP errors are status >=400. I am using the standard access_combined sourcetype for this example, so clientip is the IP address that is connecting to the Apache server, status is the HTTP status code, and bytes is the number of bytes in the HTTP request.
But the searches give slightly different results. For example, the second search gave an average of 251923.11538461538 while the first search gave an average of 42823.32638888889.
I am sure this is something simple that I have overlooked, but I don't see it! I've even looked at the Search Job Inspector, but nothing shows up there either. The subsearch is not hitting any limits on execution time or number of results; the overall data set is fairly small.
What have I missed?
[Update: fixed a couple of typos, which of course I didn't see until I posted... and I had messed up the second search in a big way!]
... View more