Splunk Search

How do I simulate LINE_BREAKER at search time?

Jason
Motivator

I have some data that has been ingested quickly/badly, so there are multiple lines per event. Rather than reindex it, is there a way I can quickly break it up in a search, similar to LINE_BREAKER at parse time?

This is one event. I would like five:

20.30.40 this is an event -- The date is 6 January, 2017 --
20.30.41 this is another event
20.30.42 this is a multiline event?
  see?
  it's really a multiline event
  I promise.
20.30.43 this is a fourth event
20.30.44 and a final one
0 Karma
1 Solution

Jason
Motivator

Yes, you can use rex and multivalued fields to select new "raw" texts and expand them into new events. The fun is getting the regular expression right so you can use it like LINE_BREAKER. Here I am using a few things:

(?sm) -- s for "dot-all", so .*? matches newlines, and m for multiline, so ^ matches the beginning of a line instead of the beginning of an event
\d\d\.\d\d\.\d\d\s -- this would be the unique string, appearing in LINE_BREAKER after the () capturing group (space between events), that shows a new event is starting
.*? -- get the rest of the content for that event
((?=[\r\n]+\d\d\.\d\d\.\d\d\s)|(?!.)) -- this says collect content... until you find a spot that, after it, comes a newline followed by the above unique string. This means the event is ending. But, put an OR in there, because the last event won't have another event after it - so don't match anything.

Here is the full search string:

| rex field=_raw max_match=99 "(?sm)^(?<raw>\d\d\.\d\d\.\d\d\s.*?)((?=[\r\n]+\d\d\.\d\d\.\d\d\s)|(?!.))" | mvexpand raw | rename raw as _raw

Note: this only breaks up the raw text. Only up to 99 events will be extracted. Timestamps, field extractions, and event types will not be calculated from the "new" events.

View solution in original post

0 Karma

Jason
Motivator

Yes, you can use rex and multivalued fields to select new "raw" texts and expand them into new events. The fun is getting the regular expression right so you can use it like LINE_BREAKER. Here I am using a few things:

(?sm) -- s for "dot-all", so .*? matches newlines, and m for multiline, so ^ matches the beginning of a line instead of the beginning of an event
\d\d\.\d\d\.\d\d\s -- this would be the unique string, appearing in LINE_BREAKER after the () capturing group (space between events), that shows a new event is starting
.*? -- get the rest of the content for that event
((?=[\r\n]+\d\d\.\d\d\.\d\d\s)|(?!.)) -- this says collect content... until you find a spot that, after it, comes a newline followed by the above unique string. This means the event is ending. But, put an OR in there, because the last event won't have another event after it - so don't match anything.

Here is the full search string:

| rex field=_raw max_match=99 "(?sm)^(?<raw>\d\d\.\d\d\.\d\d\s.*?)((?=[\r\n]+\d\d\.\d\d\.\d\d\s)|(?!.))" | mvexpand raw | rename raw as _raw

Note: this only breaks up the raw text. Only up to 99 events will be extracted. Timestamps, field extractions, and event types will not be calculated from the "new" events.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...