Hi all.
Background is I have recently acquired a JSON feed via Kafka but the schema was developed with other uses in mind so it's not working particularly well for our logging requirements.
For some reason the event type is being used as a key name. It annoys me greatly as there's no way to run a search for a specific event type. Its also playing havoc with extractions as the full path to the required value needs to be known, but can not be in a search as the event type can change.
We can get the values if we know a specific event type (eg. Eventtype4.request.someKey), however it's very difficult to return the same value when the specific event type is unknown (eg. *.request.someKey)
The second issue is all event types are present in the event but all but one has a null value. This can get very messy when there could be several hundred types of events....
I have tried stripping the first level and extracting the name of a key that has a non-null value but nothing works particularly all that well. Any assistance will be appreciated.
{
Eventtype1: null
Eventtype2: null
Eventtype3: null
Eventtype4: {
request: {
"someKey": "helloworld"
"anotherKey": "anothervalue"
}
response: {
"datachunk": "a few values here"
}
}
header: {
"timestamp": 123456789012
"someid": "323abcd"
}
}
It's not a valid json if I see correctly.
But apart from that, can't you strip the null entries?
Something similar to (fixing your json ;-))
| makeresults
| eval _raw="{
\"Eventtype1\": null,
\"Eventtype2\": null,
\"Eventtype3\": null,
\"Eventtype4\": {
\"request\": {
\"someKey\": \"helloworld\",
\"anotherKey\": \"anothervalue\"
},
\"response\": {
\"datachunk\": \"a few values here\"
},
\"header\": {
\"timestamp\": 123456789012,
\"someid\": \"323abcd\"
}
}"
| rex "(*CR)\{[^{]*(?<eventtype>\"\S+\")\s*:(\s|\n)*(?<json>\{.*\})[^}]*\}"
| spath input=json
(this one was tested on the multiline version on regex101. If you're getting it all on one line would be a bit simpler)
Depending on how the JSON is fixed, it could be interpreted a different way. This is an alternative to @PickleRick
| makeresults
| eval _raw="{
\"Eventtype1\": null,
\"Eventtype2\": null,
\"Eventtype3\": null,
\"Eventtype4\": {
\"request\": {
\"someKey\": \"helloworld\",
\"anotherKey\": \"anothervalue\"
},
\"response\": {
\"datachunk\": \"a few values here\"
}
},
\"header\": {
\"timestamp\": 123456789012,
\"someid\": \"323abcd\"
}
}"
| spath
| foreach *.*
[| spath <<MATCHSEG1>>]
| fields - *.* _*
| table *
Thanks ITWhisperer,
I often think foreach is an underappreciated command. It can be quite helpful in many cases.
... | spath
| foreach *.*
[| spath <<MATCHSEG1>>]
| fields - *.* _*
| table *
This solution worked quite well (after getting rid of the nulls) to help determine which events (and how many) were allocated to which event types, especially with aggregated stats, but it required more massaging to make it usable at an event level.
It's not a valid json if I see correctly.
But apart from that, can't you strip the null entries?
Something similar to (fixing your json ;-))
| makeresults
| eval _raw="{
\"Eventtype1\": null,
\"Eventtype2\": null,
\"Eventtype3\": null,
\"Eventtype4\": {
\"request\": {
\"someKey\": \"helloworld\",
\"anotherKey\": \"anothervalue\"
},
\"response\": {
\"datachunk\": \"a few values here\"
},
\"header\": {
\"timestamp\": 123456789012,
\"someid\": \"323abcd\"
}
}"
| rex "(*CR)\{[^{]*(?<eventtype>\"\S+\")\s*:(\s|\n)*(?<json>\{.*\})[^}]*\}"
| spath input=json
(this one was tested on the multiline version on regex101. If you're getting it all on one line would be a bit simpler)
I combined the suggested solution with an additional configured extraction to get both the event name and the event data.
Put to the example, i noticed that "request" is consistently the first word in a node that contains data. This means a pretty simple regex can be used to match the node name before a non-null node (one with the data).
",\"(?P<eventname>[\w]+)\":\{\"request\":"
I combined that with a slightly tweaked version of your regex to grab the "payload" for the event.
"(*CR)\{[^{]*\":\"\S+\"\s*:(\s|\n)*(?<json>\{.*\})[^}]*\}"
Works rather well.
Thanks for you replies. Apologies for the not quite correct example, it was a quick rip from the formatted output which i sort of JSONafined when i stripped the values.
Firstly, I originally was using an index time sed to strip the nulls but was having issues keeping it contained to just the first level.... Also needed an extract prior to the strip to somehow intelligently only collect the name of the node when the node is non-null... Thanks PickleRick for the regex ill try it out.
Here is a more accurate version of the example.. which i have validated this time 🙂
In this case the event name would be "eventtype3".
{
"header": { "eId": "1234567890", "timestamp": "12345678","src": "10.0.0.0"},
"eventtype1": null,
"eventtype2": null,
"eventtype3": {
"context": {"opId": "abc123", "method": "GET"},
"request": {"payload": "value","params": "value"},
"response": {"payload": "value","params": "value"}
},
"eventtype4": null
}