Can you do event breaking at search time using a r...

Lowell · ‎04-29-2010

Is there a way to split the text of an event into multiple events (preferably using a regular expression) at search-time? I'm guess I'm looking for a hybrid between multikv and rex.

Normally you want to get all the event-breaking logic setup in props.conf, but sometimes there is no perfect event-breaking logic that works all the time for a given sourcetype. Other times you just want to index the whole file at once, like with full events coming from an [fschange:] input or with a config_file sourcetypes. In these cases it would be helpful to have a feature that allows individual events to be broken apart for analysis on smaller parts.

Here are two examples that I've run into where it would be helpful to break apart events:

Example 1: Breaking apart a CSV event

Say you have a CSV file indexed as a single event, like so:

65.54.81.126,col.stb.s-msn.com,0.0156030654907,1272415236.01
192.221.110.126,ads1.msn.com,0.0161349773407,1272415235.84
74.125.93.102,toolbar.google.com,0.0170810222626,1272415240.26
74.125.93.113,www.google-analytics.com,0.0155410766602,1272415240.86
74.125.101.36,safebrowsing-cache.google.com,0.000797033309937,1272415413.17

I would like to be able to do us a search something like this to pull out individual values: (I made up the name regexsplit)

| regexsplit "[\r\n]+" | rex "(?<ip>[^,]+),(?<hostname>[^,]+),[^,]+,(?<timestamp>\d+)"

Example 2: Pulling key/value pairs from an INI file.

[057101]
LOG_LEVEL = 2

[105508]
LOG_LEVEL = 3

[992746]
REBOOT_FLAG = True
LOG_LEVEL = 2

If the event has pre-determined stanza names or unique key (setting) names than it is possible to use a regex to extract certain values that you are looking for. (For example, you can easily get the value for REBOOT_FLAG since it only occurs once, but obviously getting the value for LOG_LEVEL is more difficult. If you are looking for a specific stanza name and named key, you could use a multiline regex, but that's a very limiting approach.) Without a way to spit up this event, it's difficult, if not impossible, to extract the key/value pairs by stanza.

Let's say I wanted to lookup the log level of id 105508, I would like to be able to do it like so:

| regexsplit "(\[\w+\])" | rex "^\[(?<stanza>\w+)\]" | extract kvdelim="=" | search stanza=105508 | fields LOG_LEVEL

Anyone know if there's an existing way to handle these kinds of searching requirements in splunk? Or am I looking at a custom search script?

If there isn't anything like this in splunk, I would like to make the case that there should be. From my perspective, it seems like Splunk provides some really good ways to combine related events (such as transaction and stats), but there seem to be fewer options when trying to break events into smaller pieces. I suspect this is mostly because in the normal case, the right solution is to change the event breaking logic. However, I think the case can be made that there are other situations where it would be really helpful to be able to re-split (or re-break) events at search time.

Lowell · ‎04-29-2010

Well, this works for the CSV example. Even if it is a bit ugly....

source=/path/to/file.csv | rex mode=sed "s/\r?\n/--BREAKER--/g" | eval raw_lines=split(_raw, "--BREAKER--") | mvexpand raw_lines | rex field=raw_lines "(?<csv_1>[^,]+),(?<csv_2>[^,]+)" | fields + csv_1, csv_2

gkanapathy · ‎04-29-2010

Well, you could almost do it with | multikv, but not with commas.