Splunk Search

How to extract data into a data model attribute regex?

robertpenberthy
Explorer

I'm trying to extract data into a Data Model Attribute Regex. The data I'm trying to extract from the events get logged in a couple of ways. I've been at this a while trying to just extract the data I want and have just about given up on figuring it out myself.

2014-07-17 18:58:00,781 UTC [somedata] LEVEL java.class.information - data I want

2014-07-17 18:58:00,781 UTC [somedata] LEVEL java.class.information - auto extracted fields | data I want

While the java class information would be great to include, I want to come up with an regex that will get the data I want from both ways of logging... excluding the auto extracted fields and the pipe that are present in some of the events. I have come up with regex expressions that handle each situation separately, but not one that will handle both situations and put them into the same field.

The latest idea that I was working on was to do a negative lookahead and extract the data that doesn't come before a pipe character.

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(?<message>.*(?!|))

Every combination that I try either matches everything, nothing, or half of what I want. So hopefully some regex master can assist me.

In response to the request for actual data from the log files...

2014-07-17 21:29:43,620 UTC [http-apr-8080-exec-143] ERROR c.s.b.b.s.impl.HttpRequestLogFilter - Apps="UNKNOWN" ReqIP="1.1.1.1" ReqProt="https" | FAILED: 500 POST /something/v1/something/v1/group elapsed:14

2014-07-17 21:29:42,797 UTC [persistentScheduler_Worker-6] INFO c.s.b.s.b.svc.impl.DocumentIndexJob - data source UNKNOWN_163_2 (Customer Information), customer 1, institution 1 is still indexing

I would love it if the regular expression would return the following values from the two lines:

c.s.b.b.s.impl.HttpRequestLogFilter - FAILED: 500 POST /something/v1/something/v1/group elapsed:14

c.s.b.s.b.svc.impl.DocumentIndexJob - data source UNKNOWN_163_2 (Customer Information), customer 1, institution 1 is still indexing

Aside from my attempt to have the one statement work, these different regex I have used for each case individually:

Pulling everything after the pipe character:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+.*\|(?<message>.*)

Pulling everything without a pipe character:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(.*||(?<message>.*))

1 Solution

robertpenberthy
Explorer

I got an answer on Stack Overflow (or at least a satisfactory one). The provided regex got me everything I wanted, but what appears between the log level and the hyphen. Here it is:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(\S+ - )(?:.+\| )?(?<message>.*)

View solution in original post

robertpenberthy
Explorer

I got an answer on Stack Overflow (or at least a satisfactory one). The provided regex got me everything I wanted, but what appears between the log level and the hyphen. Here it is:

\S* \S* \S* \[.*\]\s+[A-Z]+\s+(\S+ - )(?:.+\| )?(?<message>.*)

robertpenberthy
Explorer

Hopefully provided the information you requested.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Could you provide more sample data, if possible actual data (just mask sensitive parts) and the regex you currently have for each of the case.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...