Splunk Search

Help with Regex extraction

siksaw33
Path Finder
2023-01-09T16:46:00.780076351Z app_name=default-java environment=e3 ns=one pod_container=default-java pod_name=default stream=stdout message={"name":"com","timestamp":"2023-01-09T16:46:00.779Z","level":"info","schemaVersion":"0.1","application":{"name":"com ","version":"1.2.5"},"request":{"address":{"uri":"Read/1.2.5"},"metadata":{"one-data-correlation-id":"d5d3 ","one-data-trace-id":"0be"}},"message":"Parent Function Address: Read, Request identifier: d5d35c6e-3661-4445-bbe4-f5a3f382d035, REQUEST-RECEIVED: {\"requestIdentifier\""d5 \",\"clientIdentifier\""CUST \",\"locale\""en-US\",\"userId\""lkapla\",\"accountNumber\""1234\",\"treatmentsFilter\":[\"targeted\",\"messages\"],\"callerType\""ADDTL\",\"cancelType\""\",\"handle\""gsp00a79e6b_b610_3407_90fa_11d5417c0b7f\",\"callTimeStamp\""1/9/2023 9:46:00 AM\",\"callIdentifier\""01091\",\"geoTelIdentifier\""04ba\"}, "}

 

I want to extract the time, userid and  clientIdentifier in a table?

 

Labels (3)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

Similar to your other question, please post JSON objects in code blocks because some combinations turn into smileys.  As I said there, try not to treat JSON objects like text strings.  Use SPL's built-in capabilities to deal with structured data.

With your raw logs, Splunk should have extracted the field "message".  Inside message, there's a JSON node named "message".  Somehow spath cannot work well with duplicate names.  So, we'll rename the Splunk field "message" first.

 

 

| rename message AS data
| spath input=data
| eval REQUEST_RECEIVED = replace(message, ".*, REQUEST-RECEIVED: ", "")
| spath input=REQUEST_RECEIVED
| fields - REQUEST_RECEIVED data message

 

 

Your sample data - after correction for smileys, would give this output that contains multiple time fields as well as other data about the request.

accountNumberapp_nameapplication.nameapplication.versioncallIdentifiercallTimetampcallerTypecancelTypeclientIdentifierenvironmentgeoTelIdentifierhandlelevellocalenamenspod_containerpod_namerequest.address.urirequest.metadata.one-data-correlation-idrequest.metadata.one-data-trace-idrequestIdentifierschemaVersionstreamtimestamp
treatmentsFilter{}
userId
1234default-javacom1.2.5010911/9/2023 9:46:00 AMADDTL CUSTe304bagsp00a79e6b_b610_3407_90fa_11d5417c0b7finfoen-UScomonedefault-javadefaultRead/1.2.5d5d30bed50.1stdout2023-01-09T16:46:00.779Z
targeted
messages
lkapla

 

View solution in original post

Tags (1)

yuanliu
SplunkTrust
SplunkTrust

Similar to your other question, please post JSON objects in code blocks because some combinations turn into smileys.  As I said there, try not to treat JSON objects like text strings.  Use SPL's built-in capabilities to deal with structured data.

With your raw logs, Splunk should have extracted the field "message".  Inside message, there's a JSON node named "message".  Somehow spath cannot work well with duplicate names.  So, we'll rename the Splunk field "message" first.

 

 

| rename message AS data
| spath input=data
| eval REQUEST_RECEIVED = replace(message, ".*, REQUEST-RECEIVED: ", "")
| spath input=REQUEST_RECEIVED
| fields - REQUEST_RECEIVED data message

 

 

Your sample data - after correction for smileys, would give this output that contains multiple time fields as well as other data about the request.

accountNumberapp_nameapplication.nameapplication.versioncallIdentifiercallTimetampcallerTypecancelTypeclientIdentifierenvironmentgeoTelIdentifierhandlelevellocalenamenspod_containerpod_namerequest.address.urirequest.metadata.one-data-correlation-idrequest.metadata.one-data-trace-idrequestIdentifierschemaVersionstreamtimestamp
treatmentsFilter{}
userId
1234default-javacom1.2.5010911/9/2023 9:46:00 AMADDTL CUSTe304bagsp00a79e6b_b610_3407_90fa_11d5417c0b7finfoen-UScomonedefault-javadefaultRead/1.2.5d5d30bed50.1stdout2023-01-09T16:46:00.779Z
targeted
messages
lkapla

 

Tags (1)

siksaw33
Path Finder

Thank you so much @yuanliu 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @siksaw33,

this seems to be a json file, so at first try to use the spath command (https://docs.splunk.com/Documentation/Splunk/9.0.3/SearchReference/Spath) that automatically extracts all the fields.

Otherwise, you can use this regex:

| rex "^(?<time>[^ ]+).*clientIdentifier\\\":(?<clientIdentifier>[^,]+).*userId\\\":(?<userId>[^,]+)"

that you can test at https://regex101.com/r/Mb2Z3z/1

Ciao.

Giuseppe

siksaw33
Path Finder

FYI I used rex field=_raw "userId\\\\\":\\\\\"(?<userId>[a-z]+)"  for this

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...