Splunk Search

How to extract data into a data model attribute regex?

robertpenberthy
Explorer

I'm trying to extract data into a Data Model Attribute Regex. The data I'm trying to extract from the events get logged in a couple of ways. I've been at this a while trying to just extract the data I want and have just about given up on figuring it out myself.

2014-07-17 18:58:00,781 UTC [somedata] LEVEL java.class.information - data I want

2014-07-17 18:58:00,781 UTC [somedata] LEVEL java.class.information - auto extracted fields | data I want

While the java class information would be great to include, I want to come up with an regex that will get the data I want from both ways of logging... excluding the auto extracted fields and the pipe that are present in some of the events. I have come up with regex expressions that handle each situation separately, but not one that will handle both situations and put them into the same field.

The latest idea that I was working on was to do a negative lookahead and extract the data that doesn't come before a pipe character.

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(?<message>.*(?!|))

Every combination that I try either matches everything, nothing, or half of what I want. So hopefully some regex master can assist me.

In response to the request for actual data from the log files...

2014-07-17 21:29:43,620 UTC [http-apr-8080-exec-143] ERROR c.s.b.b.s.impl.HttpRequestLogFilter - Apps="UNKNOWN" ReqIP="1.1.1.1" ReqProt="https" | FAILED: 500 POST /something/v1/something/v1/group elapsed:14

2014-07-17 21:29:42,797 UTC [persistentScheduler_Worker-6] INFO c.s.b.s.b.svc.impl.DocumentIndexJob - data source UNKNOWN_163_2 (Customer Information), customer 1, institution 1 is still indexing

I would love it if the regular expression would return the following values from the two lines:

c.s.b.b.s.impl.HttpRequestLogFilter - FAILED: 500 POST /something/v1/something/v1/group elapsed:14

c.s.b.s.b.svc.impl.DocumentIndexJob - data source UNKNOWN_163_2 (Customer Information), customer 1, institution 1 is still indexing

Aside from my attempt to have the one statement work, these different regex I have used for each case individually:

Pulling everything after the pipe character:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+.*\|(?<message>.*)

Pulling everything without a pipe character:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(.*||(?<message>.*))

1 Solution

robertpenberthy
Explorer

I got an answer on Stack Overflow (or at least a satisfactory one). The provided regex got me everything I wanted, but what appears between the log level and the hyphen. Here it is:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(\S+ - )(?:.+\| )?(?<message>.*)

View solution in original post

robertpenberthy
Explorer

I got an answer on Stack Overflow (or at least a satisfactory one). The provided regex got me everything I wanted, but what appears between the log level and the hyphen. Here it is:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(\S+ - )(?:.+\| )?(?<message>.*)

robertpenberthy
Explorer

Hopefully provided the information you requested.

0 Karma

somesoni2
Revered Legend

Could you provide more sample data, if possible actual data (just mask sensitive parts) and the regex you currently have for each of the case.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...