Have following data in the logfile
{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC5"}
{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC6","account":"verified"}
{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC7","account":"unverified"}
{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC8","account":"verified"}
{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC9"}
Need Report like the following so that I should get the count of "verified" where it is explicitly mentioned otherwise it should show under "unverified" -
Type Count
Verified 2
Unverified 3
How can we achieve this. Will appreciate your inputs!
There is more than one way to do it but they all boil down to the same thing - categorize your data into two sets and then count them.
<your_initial_search>
| eval account=if(account=="verified","verified","unverified")
| stats count by account
Thanks Rick.
| eval account=if(account=="verified","verified","unverified")
| stats count by account
Although data is there for both "verified" and "unverified" but I am getting result only for "unverified" (whatever is in the ELSE).
Any reason that you can think of this behavior?
You need to accurately describe your raw data (anonymize as needed) and any relevant characteristics. (As a general rule, always describe data when asking data analytics questions.) Which field name gives you "account"? Based on your description, "account" is NOT the top level path in the JSON data; additionally, this path to "account" is inside an array according to your partial reveal. Is it second level? Third level?
Suppose your top level path is "events", i.e., raw data looks like
{"events" : [{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC5"}]}
{"events" : [{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC6","account":"verified"}]}
{"events" : [{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC7","account":"unverified"}]}
{"events" : [{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC8","account":"verified"}]}
{"events" : [{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC9"}]}
Splunk would have given you flattened field names like events{}.GUID, events{}.account, etc. If you know that every array events{} contains only a single event{}.account, you can just substitute "account" in solutions with event{}.account. But as an array, events{}.account could be multivalued. In that case, you need to make them single-valued first, i.e.,
| spath path=events{} ``` events{} should be the actual path of that array ```
| mvexpand events{}
| spath input=events{}
| eval account=if(account=="verified","verified","unverified")
| stats count by account
Alternatively, use fillnull
| spath path=events{} ``` events{} should be the actual path of that array ```
| mvexpand events{}
| spath input=events{}
| fillnull account value="unverified"
| stats count by account
If "account" is not second level, or it is not really inside an array as your original description implied, you need to give accurate description of your data.
Yes, thought about fillnull myself. The difference is that fillnull only fills events where there is no value at all whereas the if-based eval can just sort the verified ones from all the rest (even if you have many other possible values like "unverified", "half-verified", "maybe verified but I really don't know" and so on ;-)). So depending on the use case either of the solutions can be appropriate.