I'm trying to extract data into a Data Model Attribute Regex. The data I'm trying to extract from the events get logged in a couple of ways. I've been at this a while trying to just extract the data I want and have just about given up on figuring it out myself.
2014-07-17 18:58:00,781 UTC [somedata] LEVEL java.class.information - data I want
2014-07-17 18:58:00,781 UTC [somedata] LEVEL java.class.information - auto extracted fields | data I want
While the java class information would be great to include, I want to come up with an regex that will get the data I want from both ways of logging... excluding the auto extracted fields and the pipe that are present in some of the events. I have come up with regex expressions that handle each situation separately, but not one that will handle both situations and put them into the same field.
The latest idea that I was working on was to do a negative lookahead and extract the data that doesn't come before a pipe character.
\S* \S* \S* \[.*\]\s+[A-Z]+\s+(?<message>.*(?!|))
Every combination that I try either matches everything, nothing, or half of what I want. So hopefully some regex master can assist me.
In response to the request for actual data from the log files...
2014-07-17 21:29:43,620 UTC [http-apr-8080-exec-143] ERROR c.s.b.b.s.impl.HttpRequestLogFilter - Apps="UNKNOWN" ReqIP="1.1.1.1" ReqProt="https" | FAILED: 500 POST /something/v1/something/v1/group elapsed:14
2014-07-17 21:29:42,797 UTC [persistentScheduler_Worker-6] INFO c.s.b.s.b.svc.impl.DocumentIndexJob - data source UNKNOWN_163_2 (Customer Information), customer 1, institution 1 is still indexing
I would love it if the regular expression would return the following values from the two lines:
c.s.b.b.s.impl.HttpRequestLogFilter - FAILED: 500 POST /something/v1/something/v1/group elapsed:14
c.s.b.s.b.svc.impl.DocumentIndexJob - data source UNKNOWN_163_2 (Customer Information), customer 1, institution 1 is still indexing
Aside from my attempt to have the one statement work, these different regex I have used for each case individually:
Pulling everything after the pipe character:
\S* \S* \S* \[.*\]\s+[A-Z]+\s+.*\|(?<message>.*)
Pulling everything without a pipe character:
\S* \S* \S* \[.*\]\s+[A-Z]+\s+(.*||(?<message>.*))
I got an answer on Stack Overflow (or at least a satisfactory one). The provided regex got me everything I wanted, but what appears between the log level and the hyphen. Here it is:
\S* \S* \S* \[.*\]\s+[A-Z]+\s+(\S+ - )(?:.+\| )?(?<message>.*)
I got an answer on Stack Overflow (or at least a satisfactory one). The provided regex got me everything I wanted, but what appears between the log level and the hyphen. Here it is:
\S* \S* \S* \[.*\]\s+[A-Z]+\s+(\S+ - )(?:.+\| )?(?<message>.*)
Hopefully provided the information you requested.
Could you provide more sample data, if possible actual data (just mask sensitive parts) and the regex you currently have for each of the case.