Splunk Search

How to extract data into a data model attribute regex?

robertpenberthy
Explorer

I'm trying to extract data into a Data Model Attribute Regex. The data I'm trying to extract from the events get logged in a couple of ways. I've been at this a while trying to just extract the data I want and have just about given up on figuring it out myself.

2014-07-17 18:58:00,781 UTC [somedata] LEVEL java.class.information - data I want

2014-07-17 18:58:00,781 UTC [somedata] LEVEL java.class.information - auto extracted fields | data I want

While the java class information would be great to include, I want to come up with an regex that will get the data I want from both ways of logging... excluding the auto extracted fields and the pipe that are present in some of the events. I have come up with regex expressions that handle each situation separately, but not one that will handle both situations and put them into the same field.

The latest idea that I was working on was to do a negative lookahead and extract the data that doesn't come before a pipe character.

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(?<message>.*(?!|))

Every combination that I try either matches everything, nothing, or half of what I want. So hopefully some regex master can assist me.

In response to the request for actual data from the log files...

2014-07-17 21:29:43,620 UTC [http-apr-8080-exec-143] ERROR c.s.b.b.s.impl.HttpRequestLogFilter - Apps="UNKNOWN" ReqIP="1.1.1.1" ReqProt="https" | FAILED: 500 POST /something/v1/something/v1/group elapsed:14

2014-07-17 21:29:42,797 UTC [persistentScheduler_Worker-6] INFO c.s.b.s.b.svc.impl.DocumentIndexJob - data source UNKNOWN_163_2 (Customer Information), customer 1, institution 1 is still indexing

I would love it if the regular expression would return the following values from the two lines:

c.s.b.b.s.impl.HttpRequestLogFilter - FAILED: 500 POST /something/v1/something/v1/group elapsed:14

c.s.b.s.b.svc.impl.DocumentIndexJob - data source UNKNOWN_163_2 (Customer Information), customer 1, institution 1 is still indexing

Aside from my attempt to have the one statement work, these different regex I have used for each case individually:

Pulling everything after the pipe character:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+.*\|(?<message>.*)

Pulling everything without a pipe character:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(.*||(?<message>.*))

1 Solution

robertpenberthy
Explorer

I got an answer on Stack Overflow (or at least a satisfactory one). The provided regex got me everything I wanted, but what appears between the log level and the hyphen. Here it is:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(\S+ - )(?:.+\| )?(?<message>.*)

View solution in original post

robertpenberthy
Explorer

I got an answer on Stack Overflow (or at least a satisfactory one). The provided regex got me everything I wanted, but what appears between the log level and the hyphen. Here it is:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(\S+ - )(?:.+\| )?(?<message>.*)

robertpenberthy
Explorer

Hopefully provided the information you requested.

0 Karma

somesoni2
Revered Legend

Could you provide more sample data, if possible actual data (just mask sensitive parts) and the regex you currently have for each of the case.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Event Series: Splunk Observability Metrics Cost Optimization

Balancing Scale and Spend: Gaining Control Over High-Volume Metrics in Splunk Observability Cloud As ...

Kick the Tires Before You Commit: A Hands-On Tour of the Splunk Observability Cloud ...

Evaluating an enterprise observability platform usually goes like this: fill out a form, get a free trial with ...

Deep insights, no barriers: Splunk Observability Cloud Free Edition

As software delivery cycles continue to accelerate, observability shouldn’t be a luxury — it should be a ...