How to search and report based on presence or lack...

runiyal · ‎10-02-2023

Have following data in the logfile

{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC5"}
{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC6","account":"verified"}
{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC7","account":"unverified"}
{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC8","account":"verified"}
{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC9"}

Need Report like the following so that I should get the count of "verified" where it is explicitly mentioned otherwise it should show under "unverified" -

Type       Count
Verified     2
Unverified   3

How can we achieve this. Will appreciate your inputs!

PickleRick · ‎10-02-2023

There is more than one way to do it but they all boil down to the same thing - categorize your data into two sets and then count them.

<your_initial_search>
| eval account=if(account=="verified","verified","unverified")
| stats count by account

runiyal · ‎10-04-2023

Thanks Rick.

| eval account=if(account=="verified","verified","unverified")
| stats count by account

Although data is there for both "verified" and "unverified" but I am getting result only for "unverified" (whatever is in the ELSE).

Any reason that you can think of this behavior?

yuanliu · ‎10-04-2023

You need to accurately describe your raw data (anonymize as needed) and any relevant characteristics. (As a general rule, always describe data when asking data analytics questions.) Which field name gives you "account"? Based on your description, "account" is NOT the top level path in the JSON data; additionally, this path to "account" is inside an array according to your partial reveal. Is it second level? Third level?

Suppose your top level path is "events", i.e., raw data looks like

{"events" : [{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC5"}]}
{"events" : [{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC6","account":"verified"}]}
{"events" : [{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC7","account":"unverified"}]}
{"events" : [{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC8","account":"verified"}]}
{"events" : [{xxxx},{"GUID":"5561859B8D624FFC8FF0B87219060DC9"}]}

Splunk would have given you flattened field names like events{}.GUID, events{}.account, etc. If you know that every array events{} contains only a single event{}.account, you can just substitute "account" in solutions with event{}.account. But as an array, events{}.account could be multivalued. In that case, you need to make them single-valued first, i.e.,

| spath path=events{} ``` events{} should be the actual path of that array ```
| mvexpand events{}
| spath input=events{}
| eval account=if(account=="verified","verified","unverified")
| stats count by account

Alternatively, use fillnull

| spath path=events{} ``` events{} should be the actual path of that array ```
| mvexpand events{}
| spath input=events{}
| fillnull account value="unverified"
| stats count by account

If "account" is not second level, or it is not really inside an array as your original description implied, you need to give accurate description of your data.

yuanliu · ‎10-03-2023

Alternatively, use fillnull.

<your_initial_search>
| fillnull account value="unverified"
| stats count by account

PickleRick · ‎10-03-2023

Yes, thought about fillnull myself. The difference is that fillnull only fills events where there is no value at all whereas the if-based eval can just sort the verified ones from all the rest (even if you have many other possible values like "unverified", "half-verified", "maybe verified but I really don't know" and so on ;-)). So depending on the use case either of the solutions can be appropriate.

How to search and report based on presence or lack of string?

count

search job inspector

subsearch

table

AppDynamics Summer Webinars

SOCin’ it to you at Splunk University

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor

Are you a member of the Splunk Community?