Getting Data In

Extracting JSON keys as values

Urbanpope
Explorer

Hi all.
Background is I have recently acquired a JSON feed via Kafka but the schema was developed with other uses in mind so it's not working particularly well for our logging requirements.
For some reason the event type is being used as a key name. It annoys me greatly as there's no way to run a search for a specific event type. Its also playing havoc with extractions as the full path to the required value needs to be known, but can not be in a search as the event type can change.

We can get the values if we know a specific event type (eg. Eventtype4.request.someKey), however it's very difficult to return the same value when the specific event type is unknown (eg. *.request.someKey)

The second issue is all event types are present in the event but all but one has a null value. This can get very messy when there could be several hundred types of events....

I have tried stripping the first level and extracting the name of a key that has a non-null value but nothing works particularly all that well.  Any assistance will be appreciated.

{
Eventtype1: null
Eventtype2: null
Eventtype3: null
Eventtype4: {
   request: {
              "someKey": "helloworld"
              "anotherKey": "anothervalue"
   }
   response: {
            "datachunk": "a few values here"
   }
}
header: {
              "timestamp": 123456789012
              "someid": "323abcd"
}
}

Labels (1)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

It's not a valid json if I see correctly.

But apart from that, can't you strip the null entries?

Something similar to (fixing your json ;-))

| makeresults 
| eval _raw="{
\"Eventtype1\": null,
\"Eventtype2\": null,
\"Eventtype3\": null,
\"Eventtype4\": {
\"request\": {
\"someKey\": \"helloworld\",
\"anotherKey\": \"anothervalue\"
},
\"response\": {
\"datachunk\": \"a few values here\"
},
\"header\": {
\"timestamp\": 123456789012,
\"someid\": \"323abcd\"
}
}"
| rex "(*CR)\{[^{]*(?<eventtype>\"\S+\")\s*:(\s|\n)*(?<json>\{.*\})[^}]*\}"
| spath input=json

(this one was tested on the multiline version on regex101. If you're getting it all on one line would be a bit simpler)

View solution in original post

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Depending on how the JSON is fixed, it could be interpreted a different way. This is an alternative to @PickleRick 

| makeresults 
| eval _raw="{
\"Eventtype1\": null,
\"Eventtype2\": null,
\"Eventtype3\": null,
\"Eventtype4\": {
\"request\": {
\"someKey\": \"helloworld\",
\"anotherKey\": \"anothervalue\"
},
\"response\": {
\"datachunk\": \"a few values here\"
}
},
\"header\": {
\"timestamp\": 123456789012,
\"someid\": \"323abcd\"
}
}" 
| spath
| foreach *.*
    [| spath <<MATCHSEG1>>]
| fields - *.* _*
| table *
0 Karma

Urbanpope
Explorer

Thanks ITWhisperer,

I often think foreach is an underappreciated command. It can be quite helpful in many cases.

 

0 Karma

Urbanpope
Explorer

... | spath
| foreach *.*
[| spath <<MATCHSEG1>>]
| fields - *.* _*
| table *

This solution worked quite well (after getting rid of the nulls) to help determine which events (and how many) were allocated to which event types, especially with aggregated stats, but it required more massaging to make it usable at an event level.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

It's not a valid json if I see correctly.

But apart from that, can't you strip the null entries?

Something similar to (fixing your json ;-))

| makeresults 
| eval _raw="{
\"Eventtype1\": null,
\"Eventtype2\": null,
\"Eventtype3\": null,
\"Eventtype4\": {
\"request\": {
\"someKey\": \"helloworld\",
\"anotherKey\": \"anothervalue\"
},
\"response\": {
\"datachunk\": \"a few values here\"
},
\"header\": {
\"timestamp\": 123456789012,
\"someid\": \"323abcd\"
}
}"
| rex "(*CR)\{[^{]*(?<eventtype>\"\S+\")\s*:(\s|\n)*(?<json>\{.*\})[^}]*\}"
| spath input=json

(this one was tested on the multiline version on regex101. If you're getting it all on one line would be a bit simpler)

0 Karma

Urbanpope
Explorer

I combined the suggested solution with an additional configured extraction to get both the event name and the event data.
Put to the example, i noticed that "request" is consistently the first word in a node that contains data. This means a pretty simple regex can be used to match the node name before a non-null node (one with the data).

",\"(?P<eventname>[\w]+)\":\{\"request\":"

 

 

I combined that with a slightly tweaked version of your regex to grab the "payload" for the event.

"(*CR)\{[^{]*\":\"\S+\"\s*:(\s|\n)*(?<json>\{.*\})[^}]*\}"

 

 

Works rather well.

0 Karma

Urbanpope
Explorer

Thanks for you replies. Apologies for the not quite correct example, it was a quick rip from the formatted output which i sort of JSONafined when i stripped the values.

Firstly, I originally was using an index time sed to strip the nulls but was having issues keeping it contained to just the first level.... Also needed an extract prior to the strip to somehow intelligently only collect the name of the node when the node is non-null... Thanks PickleRick for the regex ill try it out.

Here is a more accurate version of the example.. which i have validated this time 🙂
In this case the event name would be "eventtype3".

{
"header": { "eId": "1234567890", "timestamp": "12345678","src": "10.0.0.0"},
"eventtype1": null,
"eventtype2": null,
"eventtype3": {
"context": {"opId": "abc123", "method": "GET"},
"request": {"payload": "value","params": "value"},
"response": {"payload": "value","params": "value"}
},
"eventtype4": null
}

Tags (1)
0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...