We have a rule engine that assigns category codes to items. The category codes are assigned per location. We want to extract a statistical data from the log to show how many messages were published for each location.
For example
We want to get a result like below from below message
Location code | count |
ABC | 2 |
XYZ | 1 |
DEF | 1 |
IJK | 2 |
message #1:
{"Item Id": "1", "locationCategoryCodes": [{"categoryCodes": [{"categoryCode": "CAT_1", "ruleID": ["138563"]}], "locationCode": "ABC"}, {"categoryCodes": [{"categoryCode": "CAT_1", "ruleID": ["138563"]}], "locationCode": "XYZ"}, {"categoryCodes": [{"categoryCode": "CAT_2", "ruleID": ["138561"]}], "locationCode": "DEF"}, {"categoryCodes": [{"categoryCode": "CAT_3", "ruleID": ["138614"]}], "locationCode": "IJK"}], "timestamp": "2023-01-27T00:10:32.367 +0000"}
message #2:
{"Item Id": "2", "locationCategoryCodes": [{"categoryCodes": [{"categoryCode": "CAT_1", "ruleID": ["138563"]}], "locationCode": "ABC"}, {"categoryCodes": [{"categoryCode": "CAT_3", "ruleID": ["138614"]}], "locationCode": "IJK"}], "timestamp": "2023-01-27T00:10:32.367 +0000"}
Thanks
Anirban
The sample messages you posted are conformant JSON. I don't understand why you need to use rex to extract - or why you need extraction at all. If these are _raw, Splunk would have automatically recognized it and give you a field named locationCategoryCodes{}.locationCode. As I always say, do not treat structured data as text. If for whatever reason Splunk doesn't give you that field, you can use spath to extract all fields ("nodes" in JSON lingo) in these messages.
Here is emulation using your sample data:
| makeresults
| fields - _time
| eval test=split("{\"Item Id\": \"1\", \"locationCategoryCodes\": [{\"categoryCodes\": [{\"categoryCode\": \"CAT_1\", \"ruleID\": [\"138563\"]}], \"locationCode\": \"ABC\"}, {\"categoryCodes\": [{\"categoryCode\": \"CAT_1\", \"ruleID\": [\"138563\"]}], \"locationCode\": \"XYZ\"}, {\"categoryCodes\": [{\"categoryCode\": \"CAT_2\", \"ruleID\": [\"138561\"]}], \"locationCode\": \"DEF\"}, {\"categoryCodes\": [{\"categoryCode\": \"CAT_3\", \"ruleID\": [\"138614\"]}], \"locationCode\": \"IJK\"}], \"timestamp\": \"2023-01-27T00:10:32.367 +0000\"}|||{\"Item Id\": \"2\", \"locationCategoryCodes\": [{\"categoryCodes\": [{\"categoryCode\": \"CAT_1\", \"ruleID\": [\"138563\"]}], \"locationCode\": \"ABC\"}, {\"categoryCodes\": [{\"categoryCode\": \"CAT_3\", \"ruleID\": [\"138614\"]}], \"locationCode\": \"IJK\"}], \"timestamp\": \"2023-01-27T00:10:32.367 +0000\"}", "|||")
| mvexpand test
| eval isgood = if(json_valid(test), "yes", "no") ``` quick test to see if message is conformant ```
| rename test as _raw
``` data emulation above ```
| spath ``` use this if Splunk doesn't give you locationCategoryCodes{}.locationCode ```
| stats count by locationCategoryCodes{}.locationCode
| rename locationCategoryCodes{}.locationCode as "Location Code"
Location Code | count |
ABC | 2 |
DEF | 1 |
IJK | 2 |
XYZ | 1 |
Here you go:
| makeresults
| eval test="( {\"Item Id\": \"1\", \"locationCategoryCodes\": [{\"categoryCodes\": [{\"categoryCode\": \"CAT_1\", \"ruleID\": [\"138563\"]}], \"locationCode\": \"ABC\"}, {\"categoryCodes\": [{\"categoryCode\": \"CAT_1\", \"ruleID\": [\"138563\"]}], \"locationCode\": \"XYZ\"}, {\"categoryCodes\": [{\"categoryCode\": \"CAT_2\", \"ruleID\": [\"138561\"]}], \"locationCode\": \"DEF\"}, {\"categoryCodes\": [{\"categoryCode\": \"CAT_3\", \"ruleID\": [\"138614\"]}], \"locationCode\": \"IJK\"}], \"timestamp\": \"2023-01-27T00:10:32.367 +0000\"} ¤ {\"Item Id\": \"2\", \"locationCategoryCodes\": [{\"categoryCodes\": [{\"categoryCode\": \"CAT_1\", \"ruleID\": [\"138563\"]}], \"locationCode\": \"ABC\"}, {\"categoryCodes\": [{\"categoryCode\": \"CAT_3\", \"ruleID\": [\"138614\"]}], \"locationCode\": \"IJK\"}], \"timestamp\": \"2023-01-27T00:10:32.367 +0000\"})"
| makemv test delim="¤"
| mvexpand test
| rename test as _raw
| rex max_match=0 "locationCode.: .(?<location>[^\"]+)"
| mvexpand location
| stats count by location
Everything before the two empty lines are just to create sample data to make proof of concept.
What have you tried so far? How did that go?
Is the locationCode field already extracted? If so, then you can count them with the stats command.
| stats count by locationCode
| rename locationCode to "Location code"
| table "Location code" count
Thanks @richgalloway ..Unfortunately I could not extract locationCode fields dynamically. I tried to write some rex but those did not work.
Need to find a way to locationCodes (e.g. ABC,XYZ,DEF) from the log
{"Item Id": "1", "locationCategoryCodes": [{"categoryCodes": [{"categoryCode": "CAT_1", "ruleID": ["138563"]}], "locationCode": "ABC"}, {"categoryCodes": [{"categoryCode": "CAT_1", "ruleID": ["138563"]}], "locationCode": "XYZ"}, {"categoryCodes": [{"categoryCode": "CAT_2", "ruleID": ["138561"]}], "locationCode": "DEF"}, {"categoryCodes": [{"categoryCode": "CAT_3", "ruleID": ["138614"]}], "locationCode": "IJK"}], "timestamp": "2023-01-27T00:10:32.367 +0000"}
What rex did you try?
Try this
| rex "locationCode\\\": \\\"(?<locationCode>[^\\\"]+)"