Splunk Search

How to extract data into a data model attribute regex?

robertpenberthy
Explorer

I'm trying to extract data into a Data Model Attribute Regex. The data I'm trying to extract from the events get logged in a couple of ways. I've been at this a while trying to just extract the data I want and have just about given up on figuring it out myself.

2014-07-17 18:58:00,781 UTC [somedata] LEVEL java.class.information - data I want

2014-07-17 18:58:00,781 UTC [somedata] LEVEL java.class.information - auto extracted fields | data I want

While the java class information would be great to include, I want to come up with an regex that will get the data I want from both ways of logging... excluding the auto extracted fields and the pipe that are present in some of the events. I have come up with regex expressions that handle each situation separately, but not one that will handle both situations and put them into the same field.

The latest idea that I was working on was to do a negative lookahead and extract the data that doesn't come before a pipe character.

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(?<message>.*(?!|))

Every combination that I try either matches everything, nothing, or half of what I want. So hopefully some regex master can assist me.

In response to the request for actual data from the log files...

2014-07-17 21:29:43,620 UTC [http-apr-8080-exec-143] ERROR c.s.b.b.s.impl.HttpRequestLogFilter - Apps="UNKNOWN" ReqIP="1.1.1.1" ReqProt="https" | FAILED: 500 POST /something/v1/something/v1/group elapsed:14

2014-07-17 21:29:42,797 UTC [persistentScheduler_Worker-6] INFO c.s.b.s.b.svc.impl.DocumentIndexJob - data source UNKNOWN_163_2 (Customer Information), customer 1, institution 1 is still indexing

I would love it if the regular expression would return the following values from the two lines:

c.s.b.b.s.impl.HttpRequestLogFilter - FAILED: 500 POST /something/v1/something/v1/group elapsed:14

c.s.b.s.b.svc.impl.DocumentIndexJob - data source UNKNOWN_163_2 (Customer Information), customer 1, institution 1 is still indexing

Aside from my attempt to have the one statement work, these different regex I have used for each case individually:

Pulling everything after the pipe character:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+.*\|(?<message>.*)

Pulling everything without a pipe character:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(.*||(?<message>.*))

1 Solution

robertpenberthy
Explorer

I got an answer on Stack Overflow (or at least a satisfactory one). The provided regex got me everything I wanted, but what appears between the log level and the hyphen. Here it is:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(\S+ - )(?:.+\| )?(?<message>.*)

View solution in original post

robertpenberthy
Explorer

I got an answer on Stack Overflow (or at least a satisfactory one). The provided regex got me everything I wanted, but what appears between the log level and the hyphen. Here it is:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(\S+ - )(?:.+\| )?(?<message>.*)

robertpenberthy
Explorer

Hopefully provided the information you requested.

0 Karma

somesoni2
Revered Legend

Could you provide more sample data, if possible actual data (just mask sensitive parts) and the regex you currently have for each of the case.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...