Sumologic Query:
_source="VerizonCDN"
| json field=_raw "path"
| json field=_raw "client_ip"
| json field=_raw "referer" | where %referer = "" | where %status_code = 200
| json field=_raw "user_agent"
| count by %host,%path,%client_ip,%referer,%user_agent | where _count >= 100
| order by _count desc
and my conversion to splunk:
source="http:Emerson_P1CDN" AND status_code=200 AND referer=""
| stats count by host,path,client_ip,referer,user_agent | where count >= 100 | sort - count
Do think I convert it right? because the result of splunk was different from sumologic.
It depends on how sumologic deals with null values. Splunk will only count the events if all the by fields (host,path,client_ip,referer,user_agent) are non-null. To test this, you could use fillnull
source="http:Emerson_P1CDN" AND status_code=200 AND referer=""
| fillnull value="NULL"
| stats count by host,path,client_ip,referer,user_agent | where count >= 100 | sort - count
Thanks for answering but I'm having problem with the exact result of the query. it should be the same on sumologic. because right now our team is migrating to splunk.
In what way are the results different?
They throw a different result and same with the count.
If you don't wish to provide some details, you will need to work out under what circumstances the results are different, how are they different in detail, does this happen all the time, can you reduce the events until the difference goes away, then increase it to find out which events are causing the difference, and what it is about those events that are treated different by Splunk and sumologic.
I reduce it to status_code=200 then show the count of cdn 200 in total. it doubled the number in sumologic that is the first difference I notice.
Have a look at the events (run the search in verbose mode). Have the events been duplicated in Splunk? Have unexpected events been included?
sorry for the confusion, what I mean is that it doubled the count result from sumologic. sample count is
status code(200) sumologic= 100 count
status code (200) splunk = 250 count
what could be the reason it doubles?
Timeframes could be different. Raw events could be different / include events from different sources. Field extraction could be different resulting in additional events being found.
You could try reducing your timeframes for both systems so you have a manageable number of results (100/250 sounds reasonably manageable but you could try for fewer), and compare which events are included in the splunk data set which aren't in the sumologic data set (and vice versa if appropriate).
they just have same timeframe that is why I'm wondering why in splunk has more than count. when it should be almost same count.
Are the differences restricted to particular hosts, paths, client_ips, referers, user_agents or across the board.
What about different times of the day, days of the week, etc., are the differences more pronounced at different times?
There does not appear to be anything amiss with the search you are doing, which means it is probably in the data being used by splunk compared to the data being used by sumologic.