Splunk Search

How to extract a JSON field from another JSON?

Codie
Engager

Hi, I would like to extract a field from a JSON logs which is in a prettier format already.

I would like to extract a field named "clientTransactionId" from below sample data.

{ [-]
   @timestamp2024-09-05T10:59:34.826855417+10:00
   appNameTestApp
   environmentUAT
   ivUserAshish
   levelINFO
   loggercom.app.login
   messageNew user state created - state_id: XXXX-YYYYYY, key_id: twoFactorAuth, key_value: {"tamSessionIndex":"1d1ad722-XXXX-11ef-8a2b-005056b70cf5","devicePrint":"DDDDDDDDDDD","createdAt":"2099-09-05T00:59:34.734404799Z","updatedAt":"2099-09-05T00:59:34.734404799Z","clientSessionId":"ppppppppppppp","sessionId":"WWWWWWWWW","clientTransactionId":"8fd2353d-d609-XXXX-52i6-2e1dc12359m4","transactionId":"9285-:f18c10db191:XXXXXXXX_TRX","twoFaResult":"CHALLENGE","newDevice":true,"newLocation":false,"overseas":true} with TTL: 46825
   parentId:
   spanId14223cXXXX6d63d5
   tamSessionIndex1d1ad722-6b22-11ef-8a2b-XXXXXXX
   threadhttps-jsse-nio-XXXX-exec-6
   traceId66d90275ecc565aa61XXXXXXXX02f5815

}

Labels (1)
0 Karma
1 Solution

tscroggins
Influencer

Hi @Codie,

If your _raw value looks like this:

{
  "@timestamp": "2024-09-05T10:59:34.826855417+10:00",
  "appName": "TestApp",
  "environment": "UAT",
  "ivUser": "Ashish",
  "level": "INFO",
  "logger": "com.app.login",
  "message": "New user state created - state_id: XXXX-YYYYYY, key_id: twoFactorAuth, key_value: {\"tamSessionIndex\":\"1d1ad722-XXXX-11ef-8a2b-005056b70cf5\",\"devicePrint\":\"DDDDDDDDDDD\",\"createdAt\":\"2099-09-05T00:59:34.734404799Z\",\"updatedAt\":\"2099-09-05T00:59:34.734404799Z\",\"clientSessionId\":\"ppppppppppppp\",\"sessionId\":\"WWWWWWWWW\",\"clientTransactionId\":\"8fd2353d-d609-XXXX-52i6-2e1dc12359m4\",\"transactionId\":\"9285-:f18c10db191:XXXXXXXX_TRX\",\"twoFaResult\":\"CHALLENGE\",\"newDevice\":true,\"newLocation\":false,\"overseas\":true} with TTL: 46825",
  "parentId": "",
  "spanId": "14223cXXXX6d63d5",
  "tamSessionIndex": "1d1ad722-6b22-11ef-8a2b-XXXXXXX",
  "thread": "https-jsse-nio-XXXX-exec-6",
  "traceId": "66d90275ecc565aa61XXXXXXXX02f5815"
}

You should have a message field with value:

New user state created - state_id: XXXX-YYYYYY, key_id: twoFactorAuth, key_value: {"tamSessionIndex":"1d1ad722-XXXX-11ef-8a2b-005056b70cf5","devicePrint":"DDDDDDDDDDD","createdAt":"2099-09-05T00:59:34.734404799Z","updatedAt":"2099-09-05T00:59:34.734404799Z","clientSessionId":"ppppppppppppp","sessionId":"WWWWWWWWW","clientTransactionId":"8fd2353d-d609-XXXX-52i6-2e1dc12359m4","transactionId":"9285-:f18c10db191:XXXXXXXX_TRX","twoFaResult":"CHALLENGE","newDevice":true,"newLocation":false,"overseas":true} with TTL: 46825

The key_value data may vary, and you'll need to adjust the regular expression as needed, but as a starting point, you can extract key_value (as message_key_value) and clientTransactionId cleanly in SPL using:

| rex field=message "key_value: (?<message_key_value>\{.*\}) with TTL:"
| spath input=message_key_value

or

| rex field=message "key_value: (?<message_key_value>\{.*\}) with TTL:"
| eval clientTransactionId=json_extract(json(message_key_value), "clientTransactionId")

or

| eval clientTransactionId=json_extract(json(replace(message, ".* key_value: (\{.*\}) with TTL: .*", "\\1")), "clientTransactionId")

or other variations.

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

First and foremost - this is not a json within a json. This is a json object embedded within something that resembles json  but is syntactically incorrect. I suspect you're getting that data with some filebeat, logstash or similar tool. I'd try to fix the format to be a proper well-formed json. Then it "just works".

EDIT: Ok, that's what you get when you're not posting raw data, but rather the preformatted output from webui. Still the key_value part should be a proper object containing key-value pairs, not an embedded string. That makes no sense. Fix your data.

0 Karma

tscroggins
Influencer

Still the key_value part should be a proper object containing key-value pairs, not an embedded string. That makes no sense. Fix your data.

This is incredibly common and in most cases, outside the control of the destination. In the Logstash/Elasticsearch world, I'd parse the message field with a grok filter/processor followed by a json filter/processor to parse key_value into a JSON object. ("Elastic" translates to "overhead," but it's really just a trade-off relative to how Lucene works.) In the Splunk world, I'd leave it as is and use search-time field extractions, field aliases, etc. and accelerated data models.

PickleRick
SplunkTrust
SplunkTrust

I know. You are receiving what they send you. But you can often just talk with the sending party 🙂

Anyway, since it looks like there is something ELK-like in the middle, it could be worthwhile to check the ingestion process architecture - why are there middle men? Are we ingesting into multiple desitnations from single source? Maybe we could drop the extra stuff and not only lower our license consumption but also make our data compatible with existing TAs?

So the short-time soluion is of course to extract the string from one field of the json and run spath on it (there is no way I know of to do it automatically unless you want to get messy with regexes on this - another reason for getting your data tidy). But long-term solution IMO is to get the data right.

0 Karma

tscroggins
Influencer

Hi @Codie,

If your _raw value looks like this:

{
  "@timestamp": "2024-09-05T10:59:34.826855417+10:00",
  "appName": "TestApp",
  "environment": "UAT",
  "ivUser": "Ashish",
  "level": "INFO",
  "logger": "com.app.login",
  "message": "New user state created - state_id: XXXX-YYYYYY, key_id: twoFactorAuth, key_value: {\"tamSessionIndex\":\"1d1ad722-XXXX-11ef-8a2b-005056b70cf5\",\"devicePrint\":\"DDDDDDDDDDD\",\"createdAt\":\"2099-09-05T00:59:34.734404799Z\",\"updatedAt\":\"2099-09-05T00:59:34.734404799Z\",\"clientSessionId\":\"ppppppppppppp\",\"sessionId\":\"WWWWWWWWW\",\"clientTransactionId\":\"8fd2353d-d609-XXXX-52i6-2e1dc12359m4\",\"transactionId\":\"9285-:f18c10db191:XXXXXXXX_TRX\",\"twoFaResult\":\"CHALLENGE\",\"newDevice\":true,\"newLocation\":false,\"overseas\":true} with TTL: 46825",
  "parentId": "",
  "spanId": "14223cXXXX6d63d5",
  "tamSessionIndex": "1d1ad722-6b22-11ef-8a2b-XXXXXXX",
  "thread": "https-jsse-nio-XXXX-exec-6",
  "traceId": "66d90275ecc565aa61XXXXXXXX02f5815"
}

You should have a message field with value:

New user state created - state_id: XXXX-YYYYYY, key_id: twoFactorAuth, key_value: {"tamSessionIndex":"1d1ad722-XXXX-11ef-8a2b-005056b70cf5","devicePrint":"DDDDDDDDDDD","createdAt":"2099-09-05T00:59:34.734404799Z","updatedAt":"2099-09-05T00:59:34.734404799Z","clientSessionId":"ppppppppppppp","sessionId":"WWWWWWWWW","clientTransactionId":"8fd2353d-d609-XXXX-52i6-2e1dc12359m4","transactionId":"9285-:f18c10db191:XXXXXXXX_TRX","twoFaResult":"CHALLENGE","newDevice":true,"newLocation":false,"overseas":true} with TTL: 46825

The key_value data may vary, and you'll need to adjust the regular expression as needed, but as a starting point, you can extract key_value (as message_key_value) and clientTransactionId cleanly in SPL using:

| rex field=message "key_value: (?<message_key_value>\{.*\}) with TTL:"
| spath input=message_key_value

or

| rex field=message "key_value: (?<message_key_value>\{.*\}) with TTL:"
| eval clientTransactionId=json_extract(json(message_key_value), "clientTransactionId")

or

| eval clientTransactionId=json_extract(json(replace(message, ".* key_value: (\{.*\}) with TTL: .*", "\\1")), "clientTransactionId")

or other variations.

0 Karma

Codie
Engager

@tscroggins Perfect. Worked smoothly.

I took a long way as follows:
| spath
| rename message as _raw
| extract
| rex "\"sessionId\"\:\"(?<SessionID>.*?)\"\,\"clientTransactionId\"\:\"(?<ClientTransactionId>.*?)\"\,\"transactionId\""

Get Updates on the Splunk Community!

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

The University of Nevada, Las Vegas (UNLV) is another premier research institution helping to shape the next ...

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...