Hello,
I am currently working on a use case which has complex ingested data with nested json. The data I am trying to capture is non compliant. I am looking for guidance on how to categorize the nested json objects into fields within the array. Here is the redacted information I currently have, thank you!
Search I am using:
index=fsctcenter sourcetype=fsctcenter_json
| regex "Non Compliant[^\:]+\:\"\d+\"\,\"status\":\"Match"
| rex field=_raw "policy_name\":\"(?<policy_name>[a-zA-z1-9\.\s+]+Non\sCompliant[^\"]+)"
| rex field=_raw "rule_name\":\"(?<rule_name>[a-zA-z1-9\.\s+]+Non\sCompliant[^\"]+)"
Raw:
{"ctupdate":"policyinfo","ip":"X.X.X.X","policies":[{"rule_name":"XXXX","policy_name":"XXXX","since":"XXXX","status":"XXXX"},{"rule_name":"XXXX","policy_name":"XXXX","since":"XXXX","status":"XXXX"},{"rule_name":"XXXX","policy_name":"XXXX","since":"XXXX","status":"XXXX"},{"rule_name":"XXXX","policy_name":"XXXX","since":"XXXX","status":"XXXX"},...etc
List:
policies: [ [-]
{ [-]
policy_name: XXXX
rule_name: XXXX
since: XXXX
status: XXXX
}
{ [-]
policy_name: XXXX
rule_name: XXXX
since: XXXX
status: XXXX
}
Etc...
Currently Splunk ES is not itemizing the fields correctly for the nested json above. Any help or guidance would be greatly appreciated, thanks!
First of all, it is rarely good to use regex to handle structured data like JSON. Splunk has far more powerful tools. Second, you really want to illustrate fake/anonymized data in a way that can exemplify your desired outcome. Third, it is always good to illustrate your desired outcome.
The following illustration is based on my speculation of your requirement.
Raw data:
{"ctupdate":"policyinfo","ip":"X.X.X.X","policies":[{"rule_name":"rule1","policy_name":"policy1","since":"2022-01-01","status":"Match"},{"rule_name":"rule2","policy_name":"policy2","since":"2022-02-01","status":"Match"},{"rule_name":"rule1","policy_name":"policy1","since":"2022-03-01","status":"expired"},{"rule_name":"rule4","policy_name":"policy4","since":"2022-04-01","status":"revoked"}]}
With this, Splunk should already have these fields extracted:
ctupdate | ip | policies{}.policy_name | policies{}.rule_name | policies{}.since | policies{}.status |
policyinfo | X.X.X.X | policy1 policy2 policy3 policy4 | rule1 rule2 rule3 rule4 | 2022-01-01 2022-02-01 2022-03-01 2022-04-01 | Match Match expired revoked |
(Even if these fields are not extracted, they can be extracted with spath command.)
I further speculate that you want to access elements of policies{}. This can be achieved with spath command with path parameter:
| spath path=policies{}
This should give you an additional multivalue field policies{} like
policies{} |
{"rule_name":"rule1","policy_name":"policy1","since":"2022-01-01","status":"Match"} {"rule_name":"rule2","policy_name":"policy2","since":"2022-02-01","status":"Match"} {"rule_name":"rule3","policy_name":"policy3","since":"2022-03-01","status":"expired"} {"rule_name":"rule4","policy_name":"policy4","since":"2022-04-01","status":"revoked"} |
To operate on individual elements in the array, use mvexpand followed by another spath, i.e.,
| spath path=policies{}
| mvexpand policies{}
| spath input=policies{}
This will generate one event per value of policies{}, like this
policies{} | policy_name | rules_name | since | status |
{"rule_name":"rule1","policy_name":"policy1","since":"2022-01-01","status":"Match"} | policy1 | rule1 | 2022-01-01 | Match |
{"rule_name":"rule2","policy_name":"policy2","since":"2022-02-01","status":"Match"} | policy2 | rule2 | 2022-02-01 | Match |
{"rule_name":"rule3","policy_name":"policy3","since":"2022-03-01","status":"expired"} | policy3 | rule3 | 2022-03-01 | expired |
{"rule_name":"rule4","policy_name":"policy4","since":"2022-04-01","status":"revoked"} | policy4 | rule4 | 2022-04-01 | revoked |
Hope this helps.