Getting Data In

Expanding nested JSON events into multiple events for search

himynamesdave
Contributor

I have nested json events indexed in Splunk. Here's an example of 2 (note confidence value differs):

Event 1:

 {  [-] 
         email:  hidden@hidden.com  
         filter:     confidence >= 60   
         id:     2087   
         integrations:  [   [-] 
            {   [-] 
             name:   nitro  
             product:    nitro  
             product_version:    9.3    
            }   
            {   [-] 
             name:   paloaltonetworks   
             product:    paloaltonetworks   
             product_version:    3020   
            }   
        ]   
         last_intelligence:  2017-02-21T11:54:39.260329+00:00   
         title:  hidden 
         user_id:    8721   
         username:   hidden@hidden.com  
    }

Raw E1:

{"username": " hidden@hidden.com", "user_id": 8721, "title": "hidden", "integrations": [{"product": "nitro", "product_version": "9.3", "name": "nitro"}, {"product": "paloaltonetworks", "product_version": "3020", "name": "paloaltonetworks"}], "email": " hidden@hidden.com", "filter": "confidence >= 60", "id": 2087, "last_intelligence": "2017-02-21T11:54:39.260329+00:00"}

Event 2:

{   [-] 
         email:  hidden@hidden.com  
         filter:     confidence >= 50   
         id:     2087   
         integrations:  [   [-] 
            {   [-] 
             name:   nitro  
             product:    nitro  
             product_version:    9.3    
            }   
            {   [-] 
             name:   paloaltonetworks   
             product:    paloaltonetworks   
             product_version:    3020   
            }   
        ]   
         last_intelligence:  2017-02-21T11:54:39.260329+00:00   
         title:  hidden 
         user_id:    8721   
         username:   hidden@hidden.com  
    }

Raw E2

{"username": " hidden@hidden.com", "user_id": 8721, "title": "hidden", "integrations": [{"product": "nitro", "product_version": "9.3", "name": "nitro"}, {"product": "paloaltonetworks", "product_version": "3020", "name": "paloaltonetworks"}], "email": " hidden@hidden.com", "filter": "confidence >= 50", "id": 2087, "last_intelligence": "2017-02-21T11:54:39.260329+00:00"}

Fields are extracted into fields integration{}.name, integration{}.product, integration{}.product_version. i.e integration{}.product_version=9.3, integration{}.product_version=3020.

I want to have each nested value for each represent a single event for each "integration{}.*". If we imagine this as events:

Event 1A:

{   [-] 
     email:  hidden@hidden.com  
     filter:     confidence >= 60   
     id:     2087   
     integrations:  [   [-] 
        {   [-] 
         name:   nitro  
         product:    nitro  
         product_version:    9.3    
        }
    ]   
     last_intelligence:  2017-02-21T11:54:39.260329+00:00   
     title:  hidden 
     user_id:    8721   
     username:   hidden@hidden.com  
}

Event 1B:

{   [-] 
     email:  hidden@hidden.com  
     filter:     confidence >= 60   
     id:     2087   
     integrations:  [   [-] 
        {   [-] 
         name:   paloaltonetworks   
         product:    paloaltonetworks   
         product_version:    3020   
        }   
    ]   
     last_intelligence:  2017-02-21T11:54:39.260329+00:00   
     title:  hidden 
     user_id:    8721   
     username:   hidden@hidden.com  
}

Event 2A:

{   [-] 
     email:  hidden@hidden.com  
     filter:     confidence >= 50   
     id:     2087   
     integrations:  [   [-] 
        {   [-] 
         name:   nitro  
         product:    nitro  
         product_version:    9.3    
        }
    ]   
     last_intelligence:  2017-02-21T11:54:39.260329+00:00   
     title:  hidden 
     user_id:    8721   
     username:   hidden@hidden.com  
}

Event 2B:

{   [-] 
     email:  hidden@hidden.com  
     filter:     confidence >= 50   
     id:     2087   
     integrations:  [   [-] 
        {   [-] 
         name:   paloaltonetworks   
         product:    paloaltonetworks   
         product_version:    3020   
        }   
    ]   
     last_intelligence:  2017-02-21T11:54:39.260329+00:00   
     title:  hidden 
     user_id:    8721   
     username:   hidden@hidden.com  
}

I am experimenting with spath and mvexpand searches but I am getting some odd results and behaviour using examples from previous answer threads (lots of duplicated events, mvfields, etc).

Ultimately I want to graph these events as tables like:

username, user_id, id, email,title,name,product,product_version,last_intelligence,filter
hidden@hidden.com, 8721, 2087, hidden@hidden.com, hidden, nitro, nitro, 9.3, 2017-02-21T11:54:39.260329+00:00, confidence >= 60
hidden@hidden.com, 8721, 2087, hidden@hidden.com, hidden, paloaltonetworks, paloaltonetworks, 3020, 2017-02-21T11:54:39.260329+00:00, confidence >= 60
hidden@hidden.com, 8721, 2087, hidden@hidden.com, hidden, nitro, nitro, 9.3, 2017-02-21T11:54:39.260329+00:00, confidence >= 50
hidden@hidden.com, 8721, 2087, hidden@hidden.com, hidden, paloaltonetworks, paloaltonetworks, 3020, 2017-02-21T11:54:39.260329+00:00, confidence >= 50

One note which might be pertinent: all my events have the same timestamp (using DATETIME_CONFIG=CURRENT)

Can anyone give me any pointers? Thanks!

0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

Here you go:

| makeresults | eval foo = "{\"username\": \" hidden@hidden.com\", \"user_id\": 8721, \"title\": \"hidden\", \"integrations\": [{\"product\": \"nitro\", \"product_version\": \"9.3\", \"name\": \"nitro\"}, {\"product\": \"paloaltonetworks\", \"product_version\": \"3020\", \"name\": \"paloaltonetworks\"}], \"email\": \" hidden@hidden.com\", \"filter\": \"confidence >= 60\", \"id\": 2087, \"last_intelligence\": \"2017-02-21T11:54:39.260329+00:00\"}
{\"username\": \" hidden@hidden.com\", \"user_id\": 8721, \"title\": \"hidden\", \"integrations\": [{\"product\": \"nitro\", \"product_version\": \"9.3\", \"name\": \"nitro\"}, {\"product\": \"paloaltonetworks\", \"product_version\": \"3020\", \"name\": \"paloaltonetworks\"}], \"email\": \" hidden@hidden.com\", \"filter\": \"confidence >= 50\", \"id\": 2087, \"last_intelligence\": \"2017-02-21T11:54:39.260329+00:00\"}" | makemv foo delim="
" | mvexpand foo | rename foo as _raw | spath
 | spath integrations{} | mvexpand integrations{}
 | spath input=integrations{}
 | table username, user_id, id, email,title,name,product,product_version,last_intelligence,filter

Depending on how your sourcetype is configured, you can probably leave off the first no-parameter spath, the key is spath integrations{} | mvexpand integrations{} | spath input=integrations{} - extract the objects inside the array into a multivalue field, expand the field, extract each object's content.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

Here you go:

| makeresults | eval foo = "{\"username\": \" hidden@hidden.com\", \"user_id\": 8721, \"title\": \"hidden\", \"integrations\": [{\"product\": \"nitro\", \"product_version\": \"9.3\", \"name\": \"nitro\"}, {\"product\": \"paloaltonetworks\", \"product_version\": \"3020\", \"name\": \"paloaltonetworks\"}], \"email\": \" hidden@hidden.com\", \"filter\": \"confidence >= 60\", \"id\": 2087, \"last_intelligence\": \"2017-02-21T11:54:39.260329+00:00\"}
{\"username\": \" hidden@hidden.com\", \"user_id\": 8721, \"title\": \"hidden\", \"integrations\": [{\"product\": \"nitro\", \"product_version\": \"9.3\", \"name\": \"nitro\"}, {\"product\": \"paloaltonetworks\", \"product_version\": \"3020\", \"name\": \"paloaltonetworks\"}], \"email\": \" hidden@hidden.com\", \"filter\": \"confidence >= 50\", \"id\": 2087, \"last_intelligence\": \"2017-02-21T11:54:39.260329+00:00\"}" | makemv foo delim="
" | mvexpand foo | rename foo as _raw | spath
 | spath integrations{} | mvexpand integrations{}
 | spath input=integrations{}
 | table username, user_id, id, email,title,name,product,product_version,last_intelligence,filter

Depending on how your sourcetype is configured, you can probably leave off the first no-parameter spath, the key is spath integrations{} | mvexpand integrations{} | spath input=integrations{} - extract the objects inside the array into a multivalue field, expand the field, extract each object's content.

martin_mueller
SplunkTrust
SplunkTrust

If your events already are configured to do the basic JSON extractions then leave off the first |spath. Only use lines 4-6 from my answer, the rest is preparation of the dummy data.

0 Karma

himynamesdave
Contributor

Thanks, Martin. Works perfectly.

0 Karma

himynamesdave
Contributor

Hi Martin - thanks for taking the time to answer my question. I think I have an issue with field extraction which might be causing a problem, one event (e.g event 1 above) reports a count of 2 for each integration.* value.

When I run SPATH it duplicates events:

alt text

Which results in 3 values in each row

alt text

And now reports a count of 6 (vs 2)

alt text

Now I'm really confused!

0 Karma

JDukeSplunk
Builder

Oddly enough, I have had some success with | xmlkv with supposedly json rest inputs.

It's worth throwing it in a quick search and seeing if some fields generate, and are more consistent.

index=yourindex sourcetype=yoursourcetype source=yourfilepath | xmlkv 

You may also want to look at the raw data, and see if Splunk is inserting line breakers in the wrong places (most likely at the embedded timestamp), and only giving you partial events, or lumping multiple events together. If so, you will need to put a transforms.conf in place for the input, and wrestle with the regex that determines a newline.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

It'd help if you added a pasteable or indexable example.

0 Karma

himynamesdave
Contributor

Of course, question now updated

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...