Hi, I would like to extract a field from a JSON logs which is in a prettier format already.
I would like to extract a field named "clientTransactionId" from below sample data.
{ [-]
@timestamp: 2024-09-05T10:59:34.826855417+10:00
appName: TestApp
environment: UAT
ivUser: Ashish
level: INFO
logger: com.app.login
message: New user state created - state_id: XXXX-YYYYYY, key_id: twoFactorAuth, key_value: {"tamSessionIndex":"1d1ad722-XXXX-11ef-8a2b-005056b70cf5","devicePrint":"DDDDDDDDDDD","createdAt":"2099-09-05T00:59:34.734404799Z","updatedAt":"2099-09-05T00:59:34.734404799Z","clientSessionId":"ppppppppppppp","sessionId":"WWWWWWWWW","clientTransactionId":"8fd2353d-d609-XXXX-52i6-2e1dc12359m4","transactionId":"9285-:f18c10db191:XXXXXXXX_TRX","twoFaResult":"CHALLENGE","newDevice":true,"newLocation":false,"overseas":true} with TTL: 46825
parentId:
spanId: 14223cXXXX6d63d5
tamSessionIndex: 1d1ad722-6b22-11ef-8a2b-XXXXXXX
thread: https-jsse-nio-XXXX-exec-6
traceId: 66d90275ecc565aa61XXXXXXXX02f5815
}
Hi @Codie,
If your _raw value looks like this:
{
"@timestamp": "2024-09-05T10:59:34.826855417+10:00",
"appName": "TestApp",
"environment": "UAT",
"ivUser": "Ashish",
"level": "INFO",
"logger": "com.app.login",
"message": "New user state created - state_id: XXXX-YYYYYY, key_id: twoFactorAuth, key_value: {\"tamSessionIndex\":\"1d1ad722-XXXX-11ef-8a2b-005056b70cf5\",\"devicePrint\":\"DDDDDDDDDDD\",\"createdAt\":\"2099-09-05T00:59:34.734404799Z\",\"updatedAt\":\"2099-09-05T00:59:34.734404799Z\",\"clientSessionId\":\"ppppppppppppp\",\"sessionId\":\"WWWWWWWWW\",\"clientTransactionId\":\"8fd2353d-d609-XXXX-52i6-2e1dc12359m4\",\"transactionId\":\"9285-:f18c10db191:XXXXXXXX_TRX\",\"twoFaResult\":\"CHALLENGE\",\"newDevice\":true,\"newLocation\":false,\"overseas\":true} with TTL: 46825",
"parentId": "",
"spanId": "14223cXXXX6d63d5",
"tamSessionIndex": "1d1ad722-6b22-11ef-8a2b-XXXXXXX",
"thread": "https-jsse-nio-XXXX-exec-6",
"traceId": "66d90275ecc565aa61XXXXXXXX02f5815"
}
You should have a message field with value:
New user state created - state_id: XXXX-YYYYYY, key_id: twoFactorAuth, key_value: {"tamSessionIndex":"1d1ad722-XXXX-11ef-8a2b-005056b70cf5","devicePrint":"DDDDDDDDDDD","createdAt":"2099-09-05T00:59:34.734404799Z","updatedAt":"2099-09-05T00:59:34.734404799Z","clientSessionId":"ppppppppppppp","sessionId":"WWWWWWWWW","clientTransactionId":"8fd2353d-d609-XXXX-52i6-2e1dc12359m4","transactionId":"9285-:f18c10db191:XXXXXXXX_TRX","twoFaResult":"CHALLENGE","newDevice":true,"newLocation":false,"overseas":true} with TTL: 46825
The key_value data may vary, and you'll need to adjust the regular expression as needed, but as a starting point, you can extract key_value (as message_key_value) and clientTransactionId cleanly in SPL using:
| rex field=message "key_value: (?<message_key_value>\{.*\}) with TTL:"
| spath input=message_key_value
or
| rex field=message "key_value: (?<message_key_value>\{.*\}) with TTL:"
| eval clientTransactionId=json_extract(json(message_key_value), "clientTransactionId")
or
| eval clientTransactionId=json_extract(json(replace(message, ".* key_value: (\{.*\}) with TTL: .*", "\\1")), "clientTransactionId")
or other variations.
First and foremost - this is not a json within a json. This is a json object embedded within something that resembles json but is syntactically incorrect. I suspect you're getting that data with some filebeat, logstash or similar tool. I'd try to fix the format to be a proper well-formed json. Then it "just works".
EDIT: Ok, that's what you get when you're not posting raw data, but rather the preformatted output from webui. Still the key_value part should be a proper object containing key-value pairs, not an embedded string. That makes no sense. Fix your data.
Still the key_value part should be a proper object containing key-value pairs, not an embedded string. That makes no sense. Fix your data.
This is incredibly common and in most cases, outside the control of the destination. In the Logstash/Elasticsearch world, I'd parse the message field with a grok filter/processor followed by a json filter/processor to parse key_value into a JSON object. ("Elastic" translates to "overhead," but it's really just a trade-off relative to how Lucene works.) In the Splunk world, I'd leave it as is and use search-time field extractions, field aliases, etc. and accelerated data models.
I know. You are receiving what they send you. But you can often just talk with the sending party 🙂
Anyway, since it looks like there is something ELK-like in the middle, it could be worthwhile to check the ingestion process architecture - why are there middle men? Are we ingesting into multiple desitnations from single source? Maybe we could drop the extra stuff and not only lower our license consumption but also make our data compatible with existing TAs?
So the short-time soluion is of course to extract the string from one field of the json and run spath on it (there is no way I know of to do it automatically unless you want to get messy with regexes on this - another reason for getting your data tidy). But long-term solution IMO is to get the data right.
Hi @Codie,
If your _raw value looks like this:
{
"@timestamp": "2024-09-05T10:59:34.826855417+10:00",
"appName": "TestApp",
"environment": "UAT",
"ivUser": "Ashish",
"level": "INFO",
"logger": "com.app.login",
"message": "New user state created - state_id: XXXX-YYYYYY, key_id: twoFactorAuth, key_value: {\"tamSessionIndex\":\"1d1ad722-XXXX-11ef-8a2b-005056b70cf5\",\"devicePrint\":\"DDDDDDDDDDD\",\"createdAt\":\"2099-09-05T00:59:34.734404799Z\",\"updatedAt\":\"2099-09-05T00:59:34.734404799Z\",\"clientSessionId\":\"ppppppppppppp\",\"sessionId\":\"WWWWWWWWW\",\"clientTransactionId\":\"8fd2353d-d609-XXXX-52i6-2e1dc12359m4\",\"transactionId\":\"9285-:f18c10db191:XXXXXXXX_TRX\",\"twoFaResult\":\"CHALLENGE\",\"newDevice\":true,\"newLocation\":false,\"overseas\":true} with TTL: 46825",
"parentId": "",
"spanId": "14223cXXXX6d63d5",
"tamSessionIndex": "1d1ad722-6b22-11ef-8a2b-XXXXXXX",
"thread": "https-jsse-nio-XXXX-exec-6",
"traceId": "66d90275ecc565aa61XXXXXXXX02f5815"
}
You should have a message field with value:
New user state created - state_id: XXXX-YYYYYY, key_id: twoFactorAuth, key_value: {"tamSessionIndex":"1d1ad722-XXXX-11ef-8a2b-005056b70cf5","devicePrint":"DDDDDDDDDDD","createdAt":"2099-09-05T00:59:34.734404799Z","updatedAt":"2099-09-05T00:59:34.734404799Z","clientSessionId":"ppppppppppppp","sessionId":"WWWWWWWWW","clientTransactionId":"8fd2353d-d609-XXXX-52i6-2e1dc12359m4","transactionId":"9285-:f18c10db191:XXXXXXXX_TRX","twoFaResult":"CHALLENGE","newDevice":true,"newLocation":false,"overseas":true} with TTL: 46825
The key_value data may vary, and you'll need to adjust the regular expression as needed, but as a starting point, you can extract key_value (as message_key_value) and clientTransactionId cleanly in SPL using:
| rex field=message "key_value: (?<message_key_value>\{.*\}) with TTL:"
| spath input=message_key_value
or
| rex field=message "key_value: (?<message_key_value>\{.*\}) with TTL:"
| eval clientTransactionId=json_extract(json(message_key_value), "clientTransactionId")
or
| eval clientTransactionId=json_extract(json(replace(message, ".* key_value: (\{.*\}) with TTL: .*", "\\1")), "clientTransactionId")
or other variations.
@tscroggins Perfect. Worked smoothly.
I took a long way as follows:
| spath
| rename message as _raw
| extract
| rex "\"sessionId\"\:\"(?<SessionID>.*?)\"\,\"clientTransactionId\"\:\"(?<ClientTransactionId>.*?)\"\,\"transactionId\""