Splunk Search

Flatten JSON associative array with variable key names that even containing dots

rikinet
Explorer

I have JSON event data like this (it is shown as a collapsable tree structure in the event view):

 

{
    "data": {
        "192.168.1.1": {
            "ip": "192.168.1.1",
            "number": 0,
            "list": [
                {
                    "msg": "msg1",
                    "time": "2023-07-01T01:00:00",
                },
                {
                    "msg": "msg2",
                    "time": "2023-07-01T02:00:00",
                },
                {
                    "msg": "msg3",
                    "time": "2023-07-01T03:00:00",
                }
            ]
        },
        "192.168.1.2": {
            "ip": "192.168.1.2",
            "number": 2,
            "list": [
                {
                    "msg": "msg1",
                    "time": "2023-07-02T01:00:00",
                },
                {
                    "msg": "msg2",
                    "time": "2023-07-02T02:00:00",
                }
            ]
        }
    }
}

 


Please note: 

  • The key names under "data" are not known beforehand, but they are guaranteed to be IP addresses. That means they contain dots, which makes direct addressing such as "data.192.168.1.1.ip" difficult.
  • The number of entries in "data.X.list" is variable.
  • "data.X.number" is just any number, it does not contain the length of the list or so.

I want to flatten the structure into a tabular form, like this:

ipnumbermsgtime
"192.168.1.1"0"msg1""2023-07-01T01:00:00"
"192.168.1.1"0"msg2""2023-07-01T02:00:00"
"192.168.1.1"0"msg3""2023-07-01T03:00:00"
"192.168.1.2"2"msg1""2023-07-02T01:00:00"
"192.168.1.2"2"msg1""2023-07-02T02:00:00"

 

My strategy so far was:

  1. In the raw data (_raw), replace all dots in IP addresses by underscores, to avoid the dot notation hassle
  2. Then use foreach to generate "iterator variables" for each ip entry
  3. Then iterate over all "iterator variables", use <<MATCHSTR>> as a placeholder for all spath operations within each IP address's sub-tree.

Something along the lines of

 

| rex field=_raw mode=sed "s/192.168.([0-9]{1,3}).([0-9]{1,3})/192_168_\1_\2/g"
| foreach data.*.ip [ eval iterator_<<MATCHSTR>>='<<FIELD>>']
| foreach iterator_* [
    spath path=data.<<MATCHSTR>>.list{} output=<<MATCHSTR>>_json
    | eval <<MATCHSTR>>_json=mvmap(<<MATCHSTR>>_json, <<MATCHSTR>>_json."##<<MATCHSTR>>")
    | eval messages=mvappend(messages, <<MATCHSTR>>_json)
    | fields - <<MATCHSTR>>_json ]

 

 

But the problems start with the fact that rex applied to _raw does not seem to have the desired effect. The closest I get are iterator variables still containing dotted IP addresses, such as "iterator_192.168.1.1". (This behaviour might be difficult to reproduce with makeresults sample data!)


What am I missing here?

Labels (1)
0 Karma
1 Solution

spavin
Path Finder

Hi @rikinet.,

I've had a crack at this and I think I've found a working solution.

Splunk has a few JSON commands (see docs) that come in handy.

| makeresults | eval json="{\"data\":{\"192.168.1.1\":{\"ip\":\"192.168.1.1\",\"number\":0,\"list\":[{\"msg\":\"msg1\",\"time\":\"2023-07-01T01:00:00\"},{\"msg\":\"msg2\",\"time\":\"2023-07-01T02:00:00\"},{\"msg\":\"msg3\",\"time\":\"2023-07-01T03:00:00\"}]},\"192.168.1.2\":{\"ip\":\"192.168.1.2\",\"number\":2,\"list\":[{\"msg\":\"msg1\",\"time\":\"2023-07-02T01:00:00\"},{\"msg\":\"msg2\",\"time\":\"2023-07-02T02:00:00\"}]}}}"
| fromjson json
``` Above is just creating the test data ```

``` Get the "data" field from the JSON```
| eval ips = json_extract(json, "data")
| fields ips

``` Extract all the IPs - we'll use these later ```
| eval keys = json_keys(ips)
| eval keys= json_array_to_mv(keys)
| mvexpand keys

``` Now we have one IP address per row. Next we get each IPs specific data ```
| foreach keys [ | eval values = json_extract_exact(ips, <<FIELD>>)]
| fields keys, values
| fromjson values
| mvexpand list
| fromjson list

``` Now table it out```
| table ip, number, msg, time

 

That results in a table like this:

spavin_0-1689318896673.png

As per your notes:

  • The key names are dynamically identified under "data". The fact that they have dots doesn't really matter for the json commands used.
  • We mvxpand the "data.X.list" variable to make sure we get them all.
  • "data.X.number" is just any number - it's just an extra field in the list.

 

Hopefully this query gets you closer to what you're looking for.

 

Cheers,
Daniel

View solution in original post

spavin
Path Finder

Hi @rikinet.,

I've had a crack at this and I think I've found a working solution.

Splunk has a few JSON commands (see docs) that come in handy.

| makeresults | eval json="{\"data\":{\"192.168.1.1\":{\"ip\":\"192.168.1.1\",\"number\":0,\"list\":[{\"msg\":\"msg1\",\"time\":\"2023-07-01T01:00:00\"},{\"msg\":\"msg2\",\"time\":\"2023-07-01T02:00:00\"},{\"msg\":\"msg3\",\"time\":\"2023-07-01T03:00:00\"}]},\"192.168.1.2\":{\"ip\":\"192.168.1.2\",\"number\":2,\"list\":[{\"msg\":\"msg1\",\"time\":\"2023-07-02T01:00:00\"},{\"msg\":\"msg2\",\"time\":\"2023-07-02T02:00:00\"}]}}}"
| fromjson json
``` Above is just creating the test data ```

``` Get the "data" field from the JSON```
| eval ips = json_extract(json, "data")
| fields ips

``` Extract all the IPs - we'll use these later ```
| eval keys = json_keys(ips)
| eval keys= json_array_to_mv(keys)
| mvexpand keys

``` Now we have one IP address per row. Next we get each IPs specific data ```
| foreach keys [ | eval values = json_extract_exact(ips, <<FIELD>>)]
| fields keys, values
| fromjson values
| mvexpand list
| fromjson list

``` Now table it out```
| table ip, number, msg, time

 

That results in a table like this:

spavin_0-1689318896673.png

As per your notes:

  • The key names are dynamically identified under "data". The fact that they have dots doesn't really matter for the json commands used.
  • We mvxpand the "data.X.list" variable to make sure we get them all.
  • "data.X.number" is just any number - it's just an extra field in the list.

 

Hopefully this query gets you closer to what you're looking for.

 

Cheers,
Daniel

rikinet
Explorer

Hi @spavin , thank you very much - that works! Great help!

Just a small addition: I think we can even skip the 'foreach' command and simply write

| eval values = json_extract_exact(ips, keys)


...because at this point after the 'mvexpand', 'keys' only contains a single key.

0 Karma

danspav
SplunkTrust
SplunkTrust

Well spotted! 

I had a couple of variations, and left this in but you're right, the foreach isn't needed here.

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...