Is there a way to split the text of an event into multiple events (preferably using a regular expression) at search-time? I'm guess I'm looking for a hybrid between multikv
and rex
.
Normally you want to get all the event-breaking logic setup in props.conf
, but sometimes there is no perfect event-breaking logic that works all the time for a given sourcetype. Other times you just want to index the whole file at once, like with full events coming from an [fschange:]
input or with a config_file
sourcetypes. In these cases it would be helpful to have a feature that allows individual events to be broken apart for analysis on smaller parts.
Here are two examples that I've run into where it would be helpful to break apart events:
Say you have a CSV file indexed as a single event, like so:
65.54.81.126,col.stb.s-msn.com,0.0156030654907,1272415236.01 192.221.110.126,ads1.msn.com,0.0161349773407,1272415235.84 74.125.93.102,toolbar.google.com,0.0170810222626,1272415240.26 74.125.93.113,www.google-analytics.com,0.0155410766602,1272415240.86 74.125.101.36,safebrowsing-cache.google.com,0.000797033309937,1272415413.17
I would like to be able to do us a search something like this to pull out individual values: (I made up the name regexsplit
)
| regexsplit "[\r\n]+" | rex "(?<ip>[^,]+),(?<hostname>[^,]+),[^,]+,(?<timestamp>\d+)"
[057101] LOG_LEVEL = 2 [105508] LOG_LEVEL = 3 [992746] REBOOT_FLAG = True LOG_LEVEL = 2
If the event has pre-determined stanza names or unique key (setting) names than it is possible to use a regex to extract certain values that you are looking for. (For example, you can easily get the value for REBOOT_FLAG
since it only occurs once, but obviously getting the value for LOG_LEVEL
is more difficult. If you are looking for a specific stanza name and named key, you could use a multiline regex, but that's a very limiting approach.) Without a way to spit up this event, it's difficult, if not impossible, to extract the key/value pairs by stanza.
Let's say I wanted to lookup the log level of id 105508, I would like to be able to do it like so:
| regexsplit "(\[\w+\])" | rex "^\[(?<stanza>\w+)\]" | extract kvdelim="=" | search stanza=105508 | fields LOG_LEVEL
Anyone know if there's an existing way to handle these kinds of searching requirements in splunk? Or am I looking at a custom search script?
If there isn't anything like this in splunk, I would like to make the case that there should be. From my perspective, it seems like Splunk provides some really good ways to combine related events (such as transaction
and stats
), but there seem to be fewer options when trying to break events into smaller pieces. I suspect this is mostly because in the normal case, the right solution is to change the event breaking logic. However, I think the case can be made that there are other situations where it would be really helpful to be able to re-split (or re-break) events at search time.
Well, this works for the CSV example. Even if it is a bit ugly....
source=/path/to/file.csv | rex mode=sed "s/\r?\n/--BREAKER--/g" | eval raw_lines=split(_raw, "--BREAKER--") | mvexpand raw_lines | rex field=raw_lines "(?<csv_1>[^,]+),(?<csv_2>[^,]+)" | fields + csv_1, csv_2
Well, you could almost do it with | multikv
, but not with commas.