Hi,
I am having trouble generating a stats report based on JSON data containing an array. I want to produce the following report:
ErrorCode | ErrorMessage | Count |
212 | The image quality is poor | 1 |
680 | The image could not be found | 1 |
809 | Document not detected | 1 |
Result:
embedded.metadata.data.results{}.notifications.809 | count |
Document not detected | 1 |
Here example of the json data
{ "_embedded": { "metadata": { "environment": { "id": "6b3dc" }, "data": { "results": [ { "documentId": "f18a20f1", "notifications": { "212": "The image quality was poor" } }, { "documentId": "f0fdf5e8c", "notifications": { "680": "The image could not be found" } }, { "documentId": "95619532", "notifications": { "809": "Document not detected" } } ] } } } } |
Thanks in advance for any assistance!!
| spath _embedded.metadata.data.results{}.notifications output=results
| mvexpand results
| rex field=results "\"(?<ErrorCode>\d+)\":\s*\"(?<ErrorMessage>[^\"]*)\""
| stats count by ErrorCode ErrorMessage
This is an interesting challenge because of the leading underscore (_) in the root node key. spath can't seem to recognize the path as is. (Could be a subtle bug.) One potential way to work around this is to use text replacement to get rid of the underscore. But I prefer a more syntactic method. In Splunk 9, you can do fronjson with an arbitrary prefix so spath can do its job. (You can also use fromjson repeatedly. But with deep paths like this problem, that's undesirable.)
| fromjson jsondata prefix=my
| spath input=my_embedded path=metadata.data.results{}
| mvexpand metadata.data.results{}
| spath input=metadata.data.results{}
| foreach notifications.*
[eval ErrorCode = if(isnull(ErrorCode), "<<MATCHSTR>>", ErrorCode), ErrorMessage = mvappend(ErrorMessage, '<<FIELD>>')]
| stats count by ErrorCode ErrorMessage
Your sample data results in
ErrorCode | ErrorMessage | count |
212 | Document not detected | 1 |
212 | The image could not be found | 1 |
212 | The image quality was poor | 1 |
This is an emulation you can play with and compare with real data
| makeresults
| eval jsondata = "{
\"_embedded\": {
\"metadata\": {
\"environment\": {
\"id\": \"6b3dc\"
},
\"data\": {
\"results\": [
{
\"documentId\": \"f18a20f1\",
\"notifications\": {
\"212\": \"The image quality was poor\"
}
},
{
\"documentId\": \"f0fdf5e8c\",
\"notifications\": {
\"680\": \"The image could not be found\"
}
},
{
\"documentId\": \"95619532\",
\"notifications\": {
\"809\": \"Document not detected\"
}
}
]
}
}
}
}"
Note: Once you get past that leading underscore in JSON path, you can then use the text manipulation method proposed by @ITWhisperer (with a small path correction), like this
| fromjson jsondata prefix=my
| spath input=my_embedded path=metadata.data.results{}
| mvexpand metadata.data.results{} ``` not metadata.data.results{}.notifications ```
| rex field=metadata.data.results{} "\"(?<ErrorCode>\d+)\":\s*\"(?<ErrorMessage>[^\"]*)\""
| stats count by ErrorCode ErrorMessage
But text manipulation on structured data is not as robust because a developer/software can always change format in the future without altering semantics.
| spath _embedded.metadata.data.results{}.notifications output=results
| mvexpand results
| rex field=results "\"(?<ErrorCode>\d+)\":\s*\"(?<ErrorMessage>[^\"]*)\""
| stats count by ErrorCode ErrorMessage