Splunk Search

Help with Regex extraction

siksaw33
Path Finder
2023-01-09T16:46:00.780076351Z app_name=default-java environment=e3 ns=one pod_container=default-java pod_name=default stream=stdout message={"name":"com","timestamp":"2023-01-09T16:46:00.779Z","level":"info","schemaVersion":"0.1","application":{"name":"com ","version":"1.2.5"},"request":{"address":{"uri":"Read/1.2.5"},"metadata":{"one-data-correlation-id":"d5d3 ","one-data-trace-id":"0be"}},"message":"Parent Function Address: Read, Request identifier: d5d35c6e-3661-4445-bbe4-f5a3f382d035, REQUEST-RECEIVED: {\"requestIdentifier\""d5 \",\"clientIdentifier\""CUST \",\"locale\""en-US\",\"userId\""lkapla\",\"accountNumber\""1234\",\"treatmentsFilter\":[\"targeted\",\"messages\"],\"callerType\""ADDTL\",\"cancelType\""\",\"handle\""gsp00a79e6b_b610_3407_90fa_11d5417c0b7f\",\"callTimeStamp\""1/9/2023 9:46:00 AM\",\"callIdentifier\""01091\",\"geoTelIdentifier\""04ba\"}, "}

 

I want to extract the time, userid and  clientIdentifier in a table?

 

Labels (3)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

Similar to your other question, please post JSON objects in code blocks because some combinations turn into smileys.  As I said there, try not to treat JSON objects like text strings.  Use SPL's built-in capabilities to deal with structured data.

With your raw logs, Splunk should have extracted the field "message".  Inside message, there's a JSON node named "message".  Somehow spath cannot work well with duplicate names.  So, we'll rename the Splunk field "message" first.

 

 

| rename message AS data
| spath input=data
| eval REQUEST_RECEIVED = replace(message, ".*, REQUEST-RECEIVED: ", "")
| spath input=REQUEST_RECEIVED
| fields - REQUEST_RECEIVED data message

 

 

Your sample data - after correction for smileys, would give this output that contains multiple time fields as well as other data about the request.

accountNumberapp_nameapplication.nameapplication.versioncallIdentifiercallTimetampcallerTypecancelTypeclientIdentifierenvironmentgeoTelIdentifierhandlelevellocalenamenspod_containerpod_namerequest.address.urirequest.metadata.one-data-correlation-idrequest.metadata.one-data-trace-idrequestIdentifierschemaVersionstreamtimestamp
treatmentsFilter{}
userId
1234default-javacom1.2.5010911/9/2023 9:46:00 AMADDTL CUSTe304bagsp00a79e6b_b610_3407_90fa_11d5417c0b7finfoen-UScomonedefault-javadefaultRead/1.2.5d5d30bed50.1stdout2023-01-09T16:46:00.779Z
targeted
messages
lkapla

 

View solution in original post

Tags (1)

yuanliu
SplunkTrust
SplunkTrust

Similar to your other question, please post JSON objects in code blocks because some combinations turn into smileys.  As I said there, try not to treat JSON objects like text strings.  Use SPL's built-in capabilities to deal with structured data.

With your raw logs, Splunk should have extracted the field "message".  Inside message, there's a JSON node named "message".  Somehow spath cannot work well with duplicate names.  So, we'll rename the Splunk field "message" first.

 

 

| rename message AS data
| spath input=data
| eval REQUEST_RECEIVED = replace(message, ".*, REQUEST-RECEIVED: ", "")
| spath input=REQUEST_RECEIVED
| fields - REQUEST_RECEIVED data message

 

 

Your sample data - after correction for smileys, would give this output that contains multiple time fields as well as other data about the request.

accountNumberapp_nameapplication.nameapplication.versioncallIdentifiercallTimetampcallerTypecancelTypeclientIdentifierenvironmentgeoTelIdentifierhandlelevellocalenamenspod_containerpod_namerequest.address.urirequest.metadata.one-data-correlation-idrequest.metadata.one-data-trace-idrequestIdentifierschemaVersionstreamtimestamp
treatmentsFilter{}
userId
1234default-javacom1.2.5010911/9/2023 9:46:00 AMADDTL CUSTe304bagsp00a79e6b_b610_3407_90fa_11d5417c0b7finfoen-UScomonedefault-javadefaultRead/1.2.5d5d30bed50.1stdout2023-01-09T16:46:00.779Z
targeted
messages
lkapla

 

Tags (1)

siksaw33
Path Finder

Thank you so much @yuanliu 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @siksaw33,

this seems to be a json file, so at first try to use the spath command (https://docs.splunk.com/Documentation/Splunk/9.0.3/SearchReference/Spath) that automatically extracts all the fields.

Otherwise, you can use this regex:

| rex "^(?<time>[^ ]+).*clientIdentifier\\\":(?<clientIdentifier>[^,]+).*userId\\\":(?<userId>[^,]+)"

that you can test at https://regex101.com/r/Mb2Z3z/1

Ciao.

Giuseppe

siksaw33
Path Finder

FYI I used rex field=_raw "userId\\\\\":\\\\\"(?<userId>[a-z]+)"  for this

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Best Practices: Splunk auto adjust pipeline queue

When you enable autoAdjustQueue in Splunk, maxSize should be understood as the queue size Splunk starts with ...

Request for Professional Development: Attending .conf26

Winning Over the Boss: Your Pass to .conf26 conf26 is going to be here before you know it. If don't already ...

Casting Call: Compete in Cyber Games

Lights, Camera, SecOps: Apply to Compete in Cyber Games     Think you have what it takes to beat the clock? ...