Hopefully this is just a stupid regex error:
I'm using SplunkLightForwarder on AIX to send a few .sh_history logs to an indexer on Windows. Unfortunately ksh uses nulls as deliminators between commands--and it sometimes throws an extra null in for no apparent reason. This makes the Splunk events look something like this:
Event 1
cd /etc
\x00\x00ls
Event 2
mkdir test
\x00cd test
In other words, multiple events are incorrectly merged, and nulls are sprinkled throughout the logs. I spent a good deal of time trying to solve this (line merge/break settings, transforms, etc.). I ended up with the following in props.conf on my indexer:
[sourcetype]
LINE_BREAKER=(\\x00+)
This works beautifully, except when I exit the shell after testing this out, what shows up in Splunk?
eit
I can't figure out how in the world my regex is matching the x in exit. I later changed it to
LINE_BREAKER=((?:\\x00)+)
but it still eats the first 'x' in every event (axbxcx becomes abxcx). I've verified that there are no nulls adjacent to the x in the source.
Thanks in advance for your help!
Example data, zipped: http://www.mediafire.com/file/wwckoeo36v8p0v6/ksh-history-example.zip
$ tr "\000" "@" < ksh-history-example
mkdir -p test1/test2/test3
@cd test1
@ls
@cd test2
@ls
@cd test3
@ls
@cd ..
@@cd ..
@@ls
@cd ..
@@pwd
@@
You can strip them using SEDCMD, instead of using LINE_BREAKER to break on the nulls:
[mysourcetype]
NO_BINARY_CHECK = true
SEDCMD-stripnull = s/\\x00//g
EDIT:
There's only limited room for comments. You can use this SEDCMD to replace with linebreaks:
[mysourcetype]
NO_BINARY_CHECK = true
SEDCMD-stripnull = s/\\x00/\n/g
Awesome, glad you were able to get it to work! Next time you need to use SEDCMD, keep in mind that you can use multiple sed's with a single SEDCMD. For instance:
[nulls]
NO_BINARY_CHECK = true
SEDCMD-stripnull = s/\\x00/\n/g s/\n{2,}/\n/g s/^[\n]*$//g
In addition to replacing nulls with \n's, this should strip any lines that contain all \n's, as well as convert any multiple \n's into singles. (Posted to illustrate 3 sed's in 1)
Cheers
Awesome, glad you were able to get it to work! Next time you need to use SEDCMD, keep in mind that you can use multiple sed's with a single SEDCMD. For instance:
[nulls]
NO_BINARY_CHECK = true
SEDCMD-stripnull = s/\\x00/\n/g s/\n{2,}/\n/g s/^[\n]*$//g
In addition to replacing nulls with \n's, this should strip any lines that contain all \n's, as well as convert any multiple \n's into singles. (Posted to illustrate 3 sed's in 1)
Cheers
Posting here due to the limited comment space:
Thanks for all your help ron! Replacing with \n is ALMOST perfect. If that's all that's in the stanza, the events are still not split. All of the following set ups DO split the events, but there are newlines at the start of some events, which throws a wrench into trying to match those events up later on. Any ideas on how to get rid of the newlines?
[mysourcetype]
NO_BINARY_CHECK=true
SEDCMD-stripnull=s/\\x00/\n/g
SHOULD_LINEMERGE=false
[mysourcetype]
NO_BINARY_CHECK=true
SEDCMD-stripnull=s/\\x00/\n/g
SHOULD_LINEMERGE=true
LINE_BREAKER=([\n]+)
BREAK_ONLY_BEFORE_DATE=false
[mysourcetype]
NO_BINARY_CHECK=true
SEDCMD-stripnewline=s/[\r\n]+//g
SEDCMD-stripnull=s/\\x00/\n/g
SHOULD_LINEMERGE=true
LINE_BREAKER=([\n]+)
BREAK_ONLY_BEFORE_DATE=false
Side note: After quite a bit of testing, I can say for certain that changing SEDCMD (and possibly other settings) in props.conf on the indexer shows up immediately in btool output, but it is not applied to forwarded input until Splunk is restarted! Frustrating.
Just had to change the sed to replace multiple nulls with one \n: SEDCMD-stripnull = s/(?:\x00)+/\n/g. Thanks again ron!
You can strip them using SEDCMD, instead of using LINE_BREAKER to break on the nulls:
[mysourcetype]
NO_BINARY_CHECK = true
SEDCMD-stripnull = s/\\x00//g
EDIT:
There's only limited room for comments. You can use this SEDCMD to replace with linebreaks:
[mysourcetype]
NO_BINARY_CHECK = true
SEDCMD-stripnull = s/\\x00/\n/g
(Posted reply as an answer.)
It appears that | extract reload=true is insufficient--the SEDCMD is working as expected after restarting splunk. I'll see if I can use it to set up a custom linebreak as you suggested.
Whether they're merged seems to depend on the timing: when lines are added to the source rapidly (1/sec or so) they're merged, otherwise each entry is a separate event.
I now have:
[sourcetype]
NO_BINARY_CHECK = true
SEDCMD-stripnull = s/\\x00/ZzZ/g
I've verified with btool that this is being applied to my sourcetype, but after creating new events, there are no ZzZ strings showing up... Am I misunderstanding something?
Interesting. I'm using a Mac, and stripping the nulls allows it to break the lines properly. At least with SEDCMD, you could substitute a newline or custom linebreak for the nulls.
Thanks for the help! Your solution both removes the nulls and doesn't touch 'x's, but multiple commands are now merged into one event, even when adding SHOULD_LINEMERGE=false...
I appended some nulls and 'exit' and some more nulls to your sample data. The SEDCMD seems to do the job.