Splunk Search

How to extract fields from nested json

jayita1989
Loves-to-Learn Lots

Hello,

Can someone please help me in extracting nested json fields without regex?

I have tried below:

1. Updating KV_mode =json in the search head TA props.conf

2. Updating indexed_extractions=JSON in the search head TA props.conf

3. Updating the limits.conf with the spath stanza for the HF TA

[spath]

extraction_cutoff = 10000

4. Tried mvexpand command also. 

Nothing worked. My raw logs looks like this:

event": "{\"eventVersion\"😕"1.08\",\"userIdentity\":{\"type\"😕"AssumedRole\",\"principalId\"😕"AROAXYKJUXCU7M4FXD7ZZ:redlock\",\"arn\"😕"arn:aws:sts::533267265705:assumed-role/PrismaCloudRole-804603675133320192/redlock\",\"accountId\"😕"533267265705\",\"accessKeyId\"😕"ASIAXYKJUXCUSTP25SUE\",\"sessionContext\":{\"sessionIssuer\":{\"type\"😕"Role\",\"principalId\"😕"AROAXYKJUXCU7M4FXD7ZZ\",\"arn\"😕"arn:aws:iam::533267265705:role/PrismaCloudRole-804603675133320192\",\"accountId\"😕"533267265705\",\"userName\"😕"PrismaCloudRole-804603675133320192\"},\"webIdFederationData\":{},\"attributes\":{\"creationDate\"😕"2024-05-03T00:53:45Z\",\"mfaAuthenticated\"😕"false\"}}},\"eventTime\"😕"2024-05-03T04:09:07Z\",\"eventSource\"😕"autoscaling.amazonaws.com\",\"eventName\"😕"DescribeScalingPolicies\",\"awsRegion\"😕"us-west-2\",\"sourceIPAddress\"😕"13.52.105.217\",\"userAgent\"😕"Vert.x-WebClient/4.4.6\",\"requestParameters\":{\"maxResults\":10,\"serviceNamespace\"😕"cassandra\"},\"responseElements\":null,\"additionalEventData\":{\"service\"😕"application-autoscaling\"},\"requestID\"😕"ef12925d-0e9a-4913-8da5-1022cfd15964\",\"eventID\"😕"a1799eeb-1323-46b6-a964-efd9b2c30a8a\",\"readOnly\":true,\"eventType\"😕"AwsApiCall\",\"managementEvent\":true,\"recipientAccountId\"😕"533267265705\",\"eventCategory\"😕"Management\",\"tlsDetails\":{\"tlsVersion\"😕"TLSv1.3\",\"cipherSuite\"😕"TLS_AES_128_GCM_SHA256\",\"clientProvidedHostHeader\"😕"application-autoscaling.us-west-2.amazonaws.com\"}}"}

Labels (1)
Tags (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

@yuanliuhas many great points but let me add one more thing - this way of ingesting data is really very "Splunk un-friendly". The nested json payload is - for all goals and purposes - just a text blob for Splunk during automatic event processing.

True, you can extract the message field using KV_MODE=json (or even have it as an indexed field with INDEXED_EXTRACTIONS=json but that would be a horrible idea) but you can't make Splunk parse that field further automatically. If you need to do anything further with it you need to explicitly call spath to parse the contents.

It is important because with auto-extracted json fields you can just search for key=value pairs and the search will be relatively efficient because Splunk firstly searches for the values in the indexed data and then checks if the even parses properly so that the key matches value.

But if you have your whole payload as the message field, you don't have any fields, so Splunk cannot search for field values so it first have to parse all events from given time range only to match some of them because of some condition. It's highly inefficient. This "envelope" is a very very bad thing from Splunk's point of view.

 

0 Karma

jayita1989
Loves-to-Learn Lots

Hello,

This is the configuration that we have in the search head TA props.conf

[ sourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
TIME_FORMAT=%Y-%m-%dT%H:%M:%SZ
TIME_PREFIX=eventTime\\\"\:\\\"
EVENT_BREAKER=([\r\n]+)
TRUNCATE=0
MAX_TIMESTAMP_LOOKAHEAD=30
EVENT_BREAKER_ENABLE=true
KV_MODE=json
0 Karma

yuanliu
SplunkTrust
SplunkTrust

First, thanks for posting data in text.  Second, it's a huge risk posting text data without code box.  See how many smily faces you sprinkled all over.  Let me clean up for you here:

 

 

event": "{\"eventVersion\":\"1.08\",\"userIdentity\":{\"type\":\"AssumedRole\",\"principalId\":\"AROAXYKJUXCU7M4FXD7ZZ:redlock\",\"arn\":\"arn:aws:sts::533267265705:assumed-role/PrismaCloudRole-804603675133320192/redlock\",\"accountId\":\"533267265705\",\"accessKeyId\":\"ASIAXYKJUXCUSTP25SUE\",\"sessionContext\":{\"sessionIssuer\":{\"type\":\"Role\",\"principalId\":\"AROAXYKJUXCU7M4FXD7ZZ\",\"arn\":\"arn:aws:iam::533267265705:role/PrismaCloudRole-804603675133320192\",\"accountId\":\"533267265705\",\"userName\":\"PrismaCloudRole-804603675133320192\"},\"webIdFederationData\":{},\"attributes\":{\"creationDate\":\"2024-05-03T00:53:45Z\",\"mfaAuthenticated\":\"false\"}}},\"eventTime\":\"2024-05-03T04:09:07Z\",\"eventSource\":\"autoscaling.amazonaws.com\",\"eventName\":\"DescribeScalingPolicies\",\"awsRegion\":\"us-west-2\",\"sourceIPAddress\":\"13.52.105.217\",\"userAgent\":\"Vert.x-WebClient/4.4.6\",\"requestParameters\":{\"maxResults\":10,\"serviceNamespace\":\"cassandra\"},\"responseElements\":null,\"additionalEventData\":{\"service\":\"application-autoscaling\"},\"requestID\":\"ef12925d-0e9a-4913-8da5-1022cfd15964\",\"eventID\":\"a1799eeb-1323-46b6-a964-efd9b2c30a8a\",\"readOnly\":true,\"eventType\":\"AwsApiCall\",\"managementEvent\":true,\"recipientAccountId\":\"533267265705\",\"eventCategory\":\"Management\",\"tlsDetails\":{\"tlsVersion\":\"TLSv1.3\",\"cipherSuite\":\"TLS_AES_128_GCM_SHA256\",\"clientProvidedHostHeader\":\"application-autoscaling.us-west-2.amazonaws.com\"}}"}

 

 

Third, and this is key.  Are you sure that's the true form of a complete event?  For one thing, it seems that there is a missing opening curly bracket ({) and a missing double quotation mark (") before the entire snippet.  

If I am correct that you just forget to include the opening bracket and opening question mark, i.e., your real events look like

 

 

{"event": "{\"eventVersion\":\"1.08\",\"userIdentity\":{\"type\":\"AssumedRole\",\"principalId\":\"AROAXYKJUXCU7M4FXD7ZZ:redlock\",\"arn\":\"arn:aws:sts::533267265705:assumed-role/PrismaCloudRole-804603675133320192/redlock\",\"accountId\":\"533267265705\",\"accessKeyId\":\"ASIAXYKJUXCUSTP25SUE\",\"sessionContext\":{\"sessionIssuer\":{\"type\":\"Role\",\"principalId\":\"AROAXYKJUXCU7M4FXD7ZZ\",\"arn\":\"arn:aws:iam::533267265705:role/PrismaCloudRole-804603675133320192\",\"accountId\":\"533267265705\",\"userName\":\"PrismaCloudRole-804603675133320192\"},\"webIdFederationData\":{},\"attributes\":{\"creationDate\":\"2024-05-03T00:53:45Z\",\"mfaAuthenticated\":\"false\"}}},\"eventTime\":\"2024-05-03T04:09:07Z\",\"eventSource\":\"autoscaling.amazonaws.com\",\"eventName\":\"DescribeScalingPolicies\",\"awsRegion\":\"us-west-2\",\"sourceIPAddress\":\"13.52.105.217\",\"userAgent\":\"Vert.x-WebClient/4.4.6\",\"requestParameters\":{\"maxResults\":10,\"serviceNamespace\":\"cassandra\"},\"responseElements\":null,\"additionalEventData\":{\"service\":\"application-autoscaling\"},\"requestID\":\"ef12925d-0e9a-4913-8da5-1022cfd15964\",\"eventID\":\"a1799eeb-1323-46b6-a964-efd9b2c30a8a\",\"readOnly\":true,\"eventType\":\"AwsApiCall\",\"managementEvent\":true,\"recipientAccountId\":\"533267265705\",\"eventCategory\":\"Management\",\"tlsDetails\":{\"tlsVersion\":\"TLSv1.3\",\"cipherSuite\":\"TLS_AES_128_GCM_SHA256\",\"clientProvidedHostHeader\":\"application-autoscaling.us-west-2.amazonaws.com\"}}"}

 

 

you would have gotten a field "event" containing the following value

 

 

{"eventVersion":"1.08","userIdentity":{"type":"AssumedRole","principalId":"AROAXYKJUXCU7M4FXD7ZZ:redlock","arn":"arn:aws:sts::533267265705:assumed-role/PrismaCloudRole-804603675133320192/redlock","accountId":"533267265705","accessKeyId":"ASIAXYKJUXCUSTP25SUE","sessionContext":{"sessionIssuer":{"type":"Role","principalId":"AROAXYKJUXCU7M4FXD7ZZ","arn":"arn:aws:iam::533267265705:role/PrismaCloudRole-804603675133320192","accountId":"533267265705","userName":"PrismaCloudRole-804603675133320192"},"webIdFederationData":{},"attributes":{"creationDate":"2024-05-03T00:53:45Z","mfaAuthenticated":"false"}}},"eventTime":"2024-05-03T04:09:07Z","eventSource":"autoscaling.amazonaws.com","eventName":"DescribeScalingPolicies","awsRegion":"us-west-2","sourceIPAddress":"13.52.105.217","userAgent":"Vert.x-WebClient/4.4.6","requestParameters":{"maxResults":10,"serviceNamespace":"cassandra"},"responseElements":null,"additionalEventData":{"service":"application-autoscaling"},"requestID":"ef12925d-0e9a-4913-8da5-1022cfd15964","eventID":"a1799eeb-1323-46b6-a964-efd9b2c30a8a","readOnly":true,"eventType":"AwsApiCall","managementEvent":true,"recipientAccountId":"533267265705","eventCategory":"Management","tlsDetails":{"tlsVersion":"TLSv1.3","cipherSuite":"TLS_AES_128_GCM_SHA256","clientProvidedHostHeader":"application-autoscaling.us-west-2.amazonaws.com"}}

 

 

(By the way, event should be available whether or not you have KV_MODE=json, whether or not you have index_extraction=JSON.)  As you can see, this value is a compliant JSON.  All you need to do is to feed this field to spath.

 

 

| spath input=event

 

 

This way, if my speculation about missing bracket and quotation mark is correct, the sample you posted should give the following fields and values

field namefield value
additionalEventData.serviceapplication-autoscaling
awsRegionus-west-2
eventCategoryManagement
eventIDa1799eeb-1323-46b6-a964-efd9b2c30a8a
eventNameDescribeScalingPolicies
eventSourceautoscaling.amazonaws.com
eventTime2024-05-03T04:09:07Z
eventTypeAwsApiCall
eventVersion1.08
managementEventtrue
readOnlytrue
recipientAccountId533267265705
requestIDef12925d-0e9a-4913-8da5-1022cfd15964
requestParameters.maxResults10
requestParameters.serviceNamespacecassandra
responseElementsnull
sourceIPAddress13.52.105.217
tlsDetails.cipherSuiteTLS_AES_128_GCM_SHA256
tlsDetails.clientProvidedHostHeaderapplication-autoscaling.us-west-2.amazonaws.com
tlsDetails.tlsVersionTLSv1.3
userAgentVert.x-WebClient/4.4.6
userIdentity.accessKeyIdASIAXYKJUXCUSTP25SUE
userIdentity.accountId533267265705
userIdentity.arnarn:aws:sts::533267265705:assumed-role/PrismaCloudRole-804603675133320192/redlock
userIdentity.principalIdAROAXYKJUXCU7M4FXD7ZZ:redlock
userIdentity.sessionContext.attributes.creationDate2024-05-03T00:53:45Z
userIdentity.sessionContext.attributes.mfaAuthenticatedfalse
userIdentity.sessionContext.sessionIssuer.accountId533267265705
userIdentity.sessionContext.sessionIssuer.arnarn:aws:iam::533267265705:role/PrismaCloudRole-804603675133320192
userIdentity.sessionContext.sessionIssuer.principalIdAROAXYKJUXCU7M4FXD7ZZ
userIdentity.sessionContext.sessionIssuer.typeRole
userIdentity.sessionContext.sessionIssuer.userNamePrismaCloudRole-804603675133320192
userIdentity.typeAssumedRole

However, if your raw events truly miss the opening bracket and opening quotation mark, you need to examine your ingestion process and fix that.  No developer will knowingly omit those.   Temporarily, you can use SPL to "fix" the omission and extract data, like

 

 

| eval _raw = "{\"" . _raw
| spath
| spath input=event

 

 

But this is not a real solution.  Bad ingestion can do many other damage.

Lastly, here is an emulation you can play with an compare with real data

 

 

| makeresults
| eval _raw = "{\"event\": \"{\\\"eventVersion\\\":\\\"1.08\\\",\\\"userIdentity\\\":{\\\"type\\\":\\\"AssumedRole\\\",\\\"principalId\\\":\\\"AROAXYKJUXCU7M4FXD7ZZ:redlock\\\",\\\"arn\\\":\\\"arn:aws:sts::533267265705:assumed-role/PrismaCloudRole-804603675133320192/redlock\\\",\\\"accountId\\\":\\\"533267265705\\\",\\\"accessKeyId\\\":\\\"ASIAXYKJUXCUSTP25SUE\\\",\\\"sessionContext\\\":{\\\"sessionIssuer\\\":{\\\"type\\\":\\\"Role\\\",\\\"principalId\\\":\\\"AROAXYKJUXCU7M4FXD7ZZ\\\",\\\"arn\\\":\\\"arn:aws:iam::533267265705:role/PrismaCloudRole-804603675133320192\\\",\\\"accountId\\\":\\\"533267265705\\\",\\\"userName\\\":\\\"PrismaCloudRole-804603675133320192\\\"},\\\"webIdFederationData\\\":{},\\\"attributes\\\":{\\\"creationDate\\\":\\\"2024-05-03T00:53:45Z\\\",\\\"mfaAuthenticated\\\":\\\"false\\\"}}},\\\"eventTime\\\":\\\"2024-05-03T04:09:07Z\\\",\\\"eventSource\\\":\\\"autoscaling.amazonaws.com\\\",\\\"eventName\\\":\\\"DescribeScalingPolicies\\\",\\\"awsRegion\\\":\\\"us-west-2\\\",\\\"sourceIPAddress\\\":\\\"13.52.105.217\\\",\\\"userAgent\\\":\\\"Vert.x-WebClient/4.4.6\\\",\\\"requestParameters\\\":{\\\"maxResults\\\":10,\\\"serviceNamespace\\\":\\\"cassandra\\\"},\\\"responseElements\\\":null,\\\"additionalEventData\\\":{\\\"service\\\":\\\"application-autoscaling\\\"},\\\"requestID\\\":\\\"ef12925d-0e9a-4913-8da5-1022cfd15964\\\",\\\"eventID\\\":\\\"a1799eeb-1323-46b6-a964-efd9b2c30a8a\\\",\\\"readOnly\\\":true,\\\"eventType\\\":\\\"AwsApiCall\\\",\\\"managementEvent\\\":true,\\\"recipientAccountId\\\":\\\"533267265705\\\",\\\"eventCategory\\\":\\\"Management\\\",\\\"tlsDetails\\\":{\\\"tlsVersion\\\":\\\"TLSv1.3\\\",\\\"cipherSuite\\\":\\\"TLS_AES_128_GCM_SHA256\\\",\\\"clientProvidedHostHeader\\\":\\\"application-autoscaling.us-west-2.amazonaws.com\\\"}}\"}"
| spath
``` data emulation above ```
| spath input=event

 

 

 

0 Karma

jayita1989
Loves-to-Learn Lots

Hello,

Thanks for your response. I have tried your suggestion on the search head but unfortunately it did not extract the "event" field further.

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

You haven't answered my key questions about data.  Is there is a data ingestion problem that causes corrupt JSON snippet? (The data in your original illustration is NOT compliant.)  Do you have an "event" field from Splunk?  If yes, can you post an example? (Anonymize as needed.)  Can you post corrected raw event? (Anonymize as needed.)

Without correct data, you cannot expect any good result.

0 Karma

deepakc
Builder

Using regex and spath commands can be used to extract fields, but it’s easier to INDEXED_EXTRACTIONS= JSON OR KV-mode=json and json data can change.

If no events are getting auto extracted then it sounds like your sourcetype may not be applied.

 There are some steps/investigations on your part to undertake.

  1. Check at the inputs level the data is getting set with your TA props.conf sourcetype you have set - verify this. (The data must be coming in from a JSON file or HEC type of inputs somewhere)
  2. Once you know the correct sourcetype, ensure that the KV-mode=json has been applied with other settings such as the below.

Note: INDEXED_EXTRACTIONS= SON and KV-mode=json set for the same sourcetype together causes the Splunk software to extract the JSON fields twice: once at index time, and again at search time - advise do not do this, stick to KV-mode=json for now)

Analyse the data, and workout out some of the settings – (known as magic 6)  for props.conf such as in the example below. Tip - Ideally you should always place new data into a test index and get the props working and the place into production once its all working as expected.  

Example props

 

[my:json:data:sourcetype]
KV_MODE = json
#Tune the below to make Splunk more efficient
MAX_TIMESTAMP_LOOKAHEAD = (look no further in the data for timestamp)
SHOULD_LINEMERGE = false (leave default)
TIME_PREFIX = (REGEX before the timestamp)
TIME_FORMAT = (Check your time stamp and format it- example - %Y-%m-%d %H:%M:%S%:Z)
TRUNCATE = 10000 (Leave as default, may need tuning)
LINE_BREAKER =  (REGEX to Work out where to break the line)

 

 

Apply the above to your TA based on your specific, deploy, test and adjust as required.

Also, there may already be a props TA if this data is common data source from Splunkbase have you checked that?

0 Karma

jayita1989
Loves-to-Learn Lots

Hello,

Thanks for your reply. We already have tested putting this in the props.conf of our search head TA, but this also did not extract the event fields further.

Reg the splunkbase TA, I am not sure on this. May be I can give it a check.

0 Karma
Get Updates on the Splunk Community!

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...

Adoption of Infrastructure Monitoring at Splunk

  Splunk's Growth Engineering team showcases one of their first Splunk product adoption-Splunk Infrastructure ...

Modern way of developing distributed application using OTel

Recently, I had the opportunity to work on a complex microservice using Spring boot and Quarkus to develop a ...