Splunk Search

Flatten JSON associative array with variable key names that even containing dots

rikinet
Path Finder

I have JSON event data like this (it is shown as a collapsable tree structure in the event view):

 

{
    "data": {
        "192.168.1.1": {
            "ip": "192.168.1.1",
            "number": 0,
            "list": [
                {
                    "msg": "msg1",
                    "time": "2023-07-01T01:00:00",
                },
                {
                    "msg": "msg2",
                    "time": "2023-07-01T02:00:00",
                },
                {
                    "msg": "msg3",
                    "time": "2023-07-01T03:00:00",
                }
            ]
        },
        "192.168.1.2": {
            "ip": "192.168.1.2",
            "number": 2,
            "list": [
                {
                    "msg": "msg1",
                    "time": "2023-07-02T01:00:00",
                },
                {
                    "msg": "msg2",
                    "time": "2023-07-02T02:00:00",
                }
            ]
        }
    }
}

 


Please note: 

  • The key names under "data" are not known beforehand, but they are guaranteed to be IP addresses. That means they contain dots, which makes direct addressing such as "data.192.168.1.1.ip" difficult.
  • The number of entries in "data.X.list" is variable.
  • "data.X.number" is just any number, it does not contain the length of the list or so.

I want to flatten the structure into a tabular form, like this:

ipnumbermsgtime
"192.168.1.1"0"msg1""2023-07-01T01:00:00"
"192.168.1.1"0"msg2""2023-07-01T02:00:00"
"192.168.1.1"0"msg3""2023-07-01T03:00:00"
"192.168.1.2"2"msg1""2023-07-02T01:00:00"
"192.168.1.2"2"msg1""2023-07-02T02:00:00"

 

My strategy so far was:

  1. In the raw data (_raw), replace all dots in IP addresses by underscores, to avoid the dot notation hassle
  2. Then use foreach to generate "iterator variables" for each ip entry
  3. Then iterate over all "iterator variables", use <<MATCHSTR>> as a placeholder for all spath operations within each IP address's sub-tree.

Something along the lines of

 

| rex field=_raw mode=sed "s/192.168.([0-9]{1,3}).([0-9]{1,3})/192_168_\1_\2/g"
| foreach data.*.ip [ eval iterator_<<MATCHSTR>>='<<FIELD>>']
| foreach iterator_* [
    spath path=data.<<MATCHSTR>>.list{} output=<<MATCHSTR>>_json
    | eval <<MATCHSTR>>_json=mvmap(<<MATCHSTR>>_json, <<MATCHSTR>>_json."##<<MATCHSTR>>")
    | eval messages=mvappend(messages, <<MATCHSTR>>_json)
    | fields - <<MATCHSTR>>_json ]

 

 

But the problems start with the fact that rex applied to _raw does not seem to have the desired effect. The closest I get are iterator variables still containing dotted IP addresses, such as "iterator_192.168.1.1". (This behaviour might be difficult to reproduce with makeresults sample data!)


What am I missing here?

Labels (1)
0 Karma
1 Solution

spavin
Path Finder

Hi @rikinet.,

I've had a crack at this and I think I've found a working solution.

Splunk has a few JSON commands (see docs) that come in handy.

| makeresults | eval json="{\"data\":{\"192.168.1.1\":{\"ip\":\"192.168.1.1\",\"number\":0,\"list\":[{\"msg\":\"msg1\",\"time\":\"2023-07-01T01:00:00\"},{\"msg\":\"msg2\",\"time\":\"2023-07-01T02:00:00\"},{\"msg\":\"msg3\",\"time\":\"2023-07-01T03:00:00\"}]},\"192.168.1.2\":{\"ip\":\"192.168.1.2\",\"number\":2,\"list\":[{\"msg\":\"msg1\",\"time\":\"2023-07-02T01:00:00\"},{\"msg\":\"msg2\",\"time\":\"2023-07-02T02:00:00\"}]}}}"
| fromjson json
``` Above is just creating the test data ```

``` Get the "data" field from the JSON```
| eval ips = json_extract(json, "data")
| fields ips

``` Extract all the IPs - we'll use these later ```
| eval keys = json_keys(ips)
| eval keys= json_array_to_mv(keys)
| mvexpand keys

``` Now we have one IP address per row. Next we get each IPs specific data ```
| foreach keys [ | eval values = json_extract_exact(ips, <<FIELD>>)]
| fields keys, values
| fromjson values
| mvexpand list
| fromjson list

``` Now table it out```
| table ip, number, msg, time

 

That results in a table like this:

spavin_0-1689318896673.png

As per your notes:

  • The key names are dynamically identified under "data". The fact that they have dots doesn't really matter for the json commands used.
  • We mvxpand the "data.X.list" variable to make sure we get them all.
  • "data.X.number" is just any number - it's just an extra field in the list.

 

Hopefully this query gets you closer to what you're looking for.

 

Cheers,
Daniel

View solution in original post

spavin
Path Finder

Hi @rikinet.,

I've had a crack at this and I think I've found a working solution.

Splunk has a few JSON commands (see docs) that come in handy.

| makeresults | eval json="{\"data\":{\"192.168.1.1\":{\"ip\":\"192.168.1.1\",\"number\":0,\"list\":[{\"msg\":\"msg1\",\"time\":\"2023-07-01T01:00:00\"},{\"msg\":\"msg2\",\"time\":\"2023-07-01T02:00:00\"},{\"msg\":\"msg3\",\"time\":\"2023-07-01T03:00:00\"}]},\"192.168.1.2\":{\"ip\":\"192.168.1.2\",\"number\":2,\"list\":[{\"msg\":\"msg1\",\"time\":\"2023-07-02T01:00:00\"},{\"msg\":\"msg2\",\"time\":\"2023-07-02T02:00:00\"}]}}}"
| fromjson json
``` Above is just creating the test data ```

``` Get the "data" field from the JSON```
| eval ips = json_extract(json, "data")
| fields ips

``` Extract all the IPs - we'll use these later ```
| eval keys = json_keys(ips)
| eval keys= json_array_to_mv(keys)
| mvexpand keys

``` Now we have one IP address per row. Next we get each IPs specific data ```
| foreach keys [ | eval values = json_extract_exact(ips, <<FIELD>>)]
| fields keys, values
| fromjson values
| mvexpand list
| fromjson list

``` Now table it out```
| table ip, number, msg, time

 

That results in a table like this:

spavin_0-1689318896673.png

As per your notes:

  • The key names are dynamically identified under "data". The fact that they have dots doesn't really matter for the json commands used.
  • We mvxpand the "data.X.list" variable to make sure we get them all.
  • "data.X.number" is just any number - it's just an extra field in the list.

 

Hopefully this query gets you closer to what you're looking for.

 

Cheers,
Daniel

rikinet
Path Finder

Hi @spavin , thank you very much - that works! Great help!

Just a small addition: I think we can even skip the 'foreach' command and simply write

| eval values = json_extract_exact(ips, keys)


...because at this point after the 'mvexpand', 'keys' only contains a single key.

0 Karma

danspav
SplunkTrust
SplunkTrust

Well spotted! 

I had a couple of variations, and left this in but you're right, the foreach isn't needed here.

0 Karma
Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...