I've below line in my logs:
[2013-01-15 20:06:51:641 GMT+00:00] INFO #new# userid=1234 chair_count=1 table_count=1 sofaAvailable=true
[2013-02-15 21:06:51:642 GMT+00:00] INFO userid=1234 chair_count=1 table_count=0 sofaAvailable=false
[2013-03-15 22:06:51:643 GMT+00:00] INFO #new# userid=3452 chair_count=1 table_count=1 sofaAvailable=true
[2013-04-15 23:06:51:644 GMT+00:00] INFO #new# userid=1234 chair_count=2 table_count=3 sofaAvailable=false
[2013-05-01 10:06:51:645 GMT+00:00] INFO #new# userid=3564 chair_count=1 table_count=2 sofaAvailable=true
[2013-05-05 11:06:51:646 GMT+00:00] INFO #new# userid=2443 chair_count=1 table_count=1 sofaAvailable=true
[2013-05-07 12:06:51:647 GMT+00:00] INFO #new# userid=2265 chair_count=1 table_count=1 sofaAvailable=false
[2013-05-01 10:06:51:645 GMT+00:00] INFO #new# userid=3564 chair_count=1 table_count=0 sofaAvailable=true
Can any one confirm if below query is correct to get the count of all users who are having at least 1 chair & who are new users (by looking at tag #new#) & whose sofaAvailable is true?
Note that if there are multiple entries for the same user in the log, then I need to take the chair count & table count of the latest entry only.
#new# chair_count > 0 | stats count
Here, the output should be 3
Also, can you confirm if below query is correct to get count of all users who are having at least 1 chair & who have at least 1 table & who are new users (by looking at tag #new#) & whose sofaAvailable is true?
#new# chair_count > 0 table_count > 0 | stats count
Here, the output should be 2
Thanks!
What about this search: #new# chair_count > 0 table_count > 0 sofaAvailable=true |stats latest(chair_count) as chair_count latest(table_count) as table_count latest(sofaAvailable) as sofaAvailable latest(_raw) by userid |stats count
I think this will be fast as we are getting all results in first search not in second(after |)and also search result will be less as #new# chair_count > 0 table_count > 0 sofaAvailable=true and specfic than only #new#, but please give suggestions then it will be good....
Quick question: The 2nd part which is
...| stats latest(chair_count) as chair_count latest(table_count) as table_count latest(sofaAvailable) as sofaAvailable latest(_raw) by userid
here, why u need to stats all 3 together? Why can't we just do as shown below?
...| stats latest(chair_count) by userid | stats count
try this also
...| dedup userid | stats count
What about this search: #new# chair_count > 0 table_count > 0 sofaAvailable=true |stats latest(chair_count) as chair_count latest(table_count) as table_count latest(sofaAvailable) as sofaAvailable latest(_raw) by userid |stats count
I think this will be fast as we are getting all results in first search not in second(after |)and also search result will be less as #new# chair_count > 0 table_count > 0 sofaAvailable=true and specfic than only #new#, but please give suggestions then it will be good....
Can any one answer my question?
Quick question: The 2nd part which is "...| stats latest(chair_count) as chair_count latest(table_count) as table_count latest(sofaAvailable) as sofaAvailable latest(_raw) by userid", here, why u need to stats all 3 together? Why can't we just do "...| stats latest(chair_count) by userid | stats count"?
For your first search, you need to search based on the latest information for each user.
You can adjust search terms order for optimization, but here is a sample search.
#new# | stats latest(chair_count) as chair_count latest(table_count) as table_count latest(sofaAvailable) as sofaAvailable latest(_raw) by userid | where chair_count>0 and sofaAvailable="true"
For your 2nd search,
#new# | stats latest(chair_count) as chair_count latest(table_count) as table_count latest(sofaAvailable) as sofaAvailable latest(_raw) by userid | where chair_count>0 and table_count>0 and sofaAvailable="true"
By adding "| stats" count after each earch, you can get the number of users with your criteria.
For example,
$ /opt/splunk/bin/splunk search '#new# | stats latest(chair_count) as chair_count latest(table_count) as table_count latest(sofaAvailable) as sofaAvailable latest(_raw) by userid | where chair_count>0 and sofaAvailable="true" | stats count'
count
-----
3
Hope this helps.
yes same concept I have as all events are filterd out in begining in my search 🙂
kml_uvce's search is faster as all unnecessary events are filtered out at the beginning. I was trying to show basic idea step by step in the search 🙂 also you can customize my search to do similar analysis.
Can any Splunk Expert comment about the performance among these 3 different answers & tell us which one is better?
Thanks for the answer. I would like to know if below query is also an alternate correct answer or not for the 2nd question:
#new# chair_count > 0 table_count > 0 sofaAvailable=true | stats first(table_count) as tcount, first(sofaAvailable) as sofa first(chair_count) as ccount by userid | where sofa = "true" and tcount > 0 and ccount > 0 | stats count
If above is also correct, then which one is best in terms of performance?