Solved: Help with Regex extraction

siksaw33 · ‎01-09-2023

2023-01-09T16:46:00.780076351Z app_name=default-java environment=e3 ns=one pod_container=default-java pod_name=default stream=stdout message={"name":"com","timestamp":"2023-01-09T16:46:00.779Z","level":"info","schemaVersion":"0.1","application":{"name":"com ","version":"1.2.5"},"request":{"address":{"uri":"Read/1.2.5"},"metadata":{"one-data-correlation-id":"d5d3 ","one-data-trace-id":"0be"}},"message":"Parent Function Address: Read, Request identifier: d5d35c6e-3661-4445-bbe4-f5a3f382d035, REQUEST-RECEIVED: {\"requestIdentifier\""d5 \",\"clientIdentifier\""CUST \",\"locale\""en-US\",\"userId\""lkapla\",\"accountNumber\""1234\",\"treatmentsFilter\":[\"targeted\",\"messages\"],\"callerType\""ADDTL\",\"cancelType\""\",\"handle\""gsp00a79e6b_b610_3407_90fa_11d5417c0b7f\",\"callTimeStamp\""1/9/2023 9:46:00 AM\",\"callIdentifier\""01091\",\"geoTelIdentifier\""04ba\"}, "}

I want to extract the time, userid and clientIdentifier in a table?

yuanliu · ‎01-10-2023

Similar to your other question, please post JSON objects in code blocks because some combinations turn into smileys. As I said there, try not to treat JSON objects like text strings. Use SPL's built-in capabilities to deal with structured data.

With your raw logs, Splunk should have extracted the field "message". Inside message, there's a JSON node named "message". Somehow spath cannot work well with duplicate names. So, we'll rename the Splunk field "message" first.

| rename message AS data
| spath input=data
| eval REQUEST_RECEIVED = replace(message, ".*, REQUEST-RECEIVED: ", "")
| spath input=REQUEST_RECEIVED
| fields - REQUEST_RECEIVED data message

Your sample data - after correction for smileys, would give this output that contains multiple time fields as well as other data about the request.

accountNumber

app_name

application.name

application.version

callIdentifier

callTimetamp

callerType

cancelType

clientIdentifier

environment

geoTelIdentifier

handle

level

locale

name

ns

pod_container

pod_name

request.address.uri

request.metadata.one-data-correlation-id

request.metadata.one-data-trace-id

requestIdentifier

schemaVersion

stream

timestamp

treatmentsFilter{}

userId

1234

default-java

com

1.2.5

01091

1/9/2023 9:46:00 AM

ADDTL

CUST

e3

04ba

gsp00a79e6b_b610_3407_90fa_11d5417c0b7f

info

en-US

com

one

default-java

default

Read/1.2.5

d5d3

0be

d5

0.1

stdout

2023-01-09T16:46:00.779Z

targeted

messages

lkapla

View solution in original post

yuanliu · ‎01-10-2023

Similar to your other question, please post JSON objects in code blocks because some combinations turn into smileys. As I said there, try not to treat JSON objects like text strings. Use SPL's built-in capabilities to deal with structured data.

With your raw logs, Splunk should have extracted the field "message". Inside message, there's a JSON node named "message". Somehow spath cannot work well with duplicate names. So, we'll rename the Splunk field "message" first.

| rename message AS data
| spath input=data
| eval REQUEST_RECEIVED = replace(message, ".*, REQUEST-RECEIVED: ", "")
| spath input=REQUEST_RECEIVED
| fields - REQUEST_RECEIVED data message

Your sample data - after correction for smileys, would give this output that contains multiple time fields as well as other data about the request.

accountNumber

app_name

application.name

application.version

callIdentifier

callTimetamp

callerType

cancelType

clientIdentifier

environment

geoTelIdentifier

handle

level

locale

name

ns

pod_container

pod_name

request.address.uri

request.metadata.one-data-correlation-id

request.metadata.one-data-trace-id

requestIdentifier

schemaVersion

stream

timestamp

treatmentsFilter{}

userId

1234

default-java

com

1.2.5

01091

1/9/2023 9:46:00 AM

ADDTL

CUST

e3

04ba

gsp00a79e6b_b610_3407_90fa_11d5417c0b7f

info

en-US

com

one

default-java

default

Read/1.2.5

d5d3

0be

d5

0.1

stdout

2023-01-09T16:46:00.779Z

targeted

messages

lkapla

siksaw33 · ‎01-12-2023

Thank you so much @yuanliu

gcusello · ‎01-10-2023

Hi @siksaw33,

this seems to be a json file, so at first try to use the spath command (https://docs.splunk.com/Documentation/Splunk/9.0.3/SearchReference/Spath) that automatically extracts all the fields.

Otherwise, you can use this regex:

| rex "^(?<time>[^ ]+).*clientIdentifier\\\":(?<clientIdentifier>[^,]+).*userId\\\":(?<userId>[^,]+)"

that you can test at https://regex101.com/r/Mb2Z3z/1

Ciao.

Giuseppe

siksaw33 · ‎01-09-2023

FYI I used rex field=_raw "userId\\\\\":\\\\\"(?<userId>[a-z]+)" for this

Help with Regex extraction

field extraction

regex

rex

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms