Splunk Search

How to extract data into a data model attribute regex?

robertpenberthy
Explorer

I'm trying to extract data into a Data Model Attribute Regex. The data I'm trying to extract from the events get logged in a couple of ways. I've been at this a while trying to just extract the data I want and have just about given up on figuring it out myself.

2014-07-17 18:58:00,781 UTC [somedata] LEVEL java.class.information - data I want

2014-07-17 18:58:00,781 UTC [somedata] LEVEL java.class.information - auto extracted fields | data I want

While the java class information would be great to include, I want to come up with an regex that will get the data I want from both ways of logging... excluding the auto extracted fields and the pipe that are present in some of the events. I have come up with regex expressions that handle each situation separately, but not one that will handle both situations and put them into the same field.

The latest idea that I was working on was to do a negative lookahead and extract the data that doesn't come before a pipe character.

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(?<message>.*(?!|))

Every combination that I try either matches everything, nothing, or half of what I want. So hopefully some regex master can assist me.

In response to the request for actual data from the log files...

2014-07-17 21:29:43,620 UTC [http-apr-8080-exec-143] ERROR c.s.b.b.s.impl.HttpRequestLogFilter - Apps="UNKNOWN" ReqIP="1.1.1.1" ReqProt="https" | FAILED: 500 POST /something/v1/something/v1/group elapsed:14

2014-07-17 21:29:42,797 UTC [persistentScheduler_Worker-6] INFO c.s.b.s.b.svc.impl.DocumentIndexJob - data source UNKNOWN_163_2 (Customer Information), customer 1, institution 1 is still indexing

I would love it if the regular expression would return the following values from the two lines:

c.s.b.b.s.impl.HttpRequestLogFilter - FAILED: 500 POST /something/v1/something/v1/group elapsed:14

c.s.b.s.b.svc.impl.DocumentIndexJob - data source UNKNOWN_163_2 (Customer Information), customer 1, institution 1 is still indexing

Aside from my attempt to have the one statement work, these different regex I have used for each case individually:

Pulling everything after the pipe character:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+.*\|(?<message>.*)

Pulling everything without a pipe character:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(.*||(?<message>.*))

1 Solution

robertpenberthy
Explorer

I got an answer on Stack Overflow (or at least a satisfactory one). The provided regex got me everything I wanted, but what appears between the log level and the hyphen. Here it is:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(\S+ - )(?:.+\| )?(?<message>.*)

View solution in original post

robertpenberthy
Explorer

I got an answer on Stack Overflow (or at least a satisfactory one). The provided regex got me everything I wanted, but what appears between the log level and the hyphen. Here it is:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(\S+ - )(?:.+\| )?(?<message>.*)

robertpenberthy
Explorer

Hopefully provided the information you requested.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Could you provide more sample data, if possible actual data (just mask sensitive parts) and the regex you currently have for each of the case.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...