Splunk Search

How do I remove \x00 characters from my log message?

Jason
Motivator

I have a log message which (thanks, M$) has been littered with \x00 text - originally null bytes. They appear every other character, making it almost impossible to read. Can Splunk automatically remove these for me?

Tags (2)
1 Solution

Jason
Motivator

Yes, Splunk can. You can use SEDCMD- to rewrite the events to remove the \x00s, which by the time the data hits an indexer are already the text "\x00" - they're no longer the null byte.

On the search bar:

| rex mode=sed "s/\\\\x00//g"

Automatically at parsing ("indexing") time for any new data, in props.conf:

[yoursourcetype]
SEDCMD-remove_nulls = s/\\x00//g
LINE_BREAKER = ((?:[\r\n](?:\\x00)?)+)

Special LINE_BREAKER was added because Splunk was interpreting the null bytes between \r and \n (the two halves of the Windows newline, in the file I was working on) as additional lines and adding them to the event. It says use "(one newline character optionally followed by the text \ x 0 0) one or more times" as the breaker (thrown away) between events.

View solution in original post

clintla
Contributor

What is the /g part? What if I just wanted to delete the characters and/or just swap them w/ nothing?

xxx-xxx-xxxx is now xxxxxxxxxx

0 Karma

Jason
Motivator

It is best to not post additional questions in the answer section. Post them as a question so they get proper visibility.

/g means globally - it will replace every instance of the subject that it finds, not just the first one.

s/-//g would swap app dash with nothing. but if you did s/-// with no g, you would end up with xxxxxx-xxxx.

0 Karma

Jason
Motivator

I tried the UTF-16LE as mentioned here but it did not work. But now that I think about it, I might have put the config on the indexer, not the universal forwarder. Oops. Config below still works when put on the indexer.

JSapienza
Contributor

Have a look at SEDCMD - Admin Manual - Props.conf

Adding this to your props.conf should work:

SEDCMD-StripNULL= s/\x00//g

jonuwz
Influencer

This sounds like a character encoding problem to me.

If the log is encoded as UTF-16, only contains UTF-8 and is being read as UTF-8, then there'll be extra \x00 between each character.

Find out what character encoding the messages use, then set the charset in splunk

Jason
Motivator

Yes, Splunk can. You can use SEDCMD- to rewrite the events to remove the \x00s, which by the time the data hits an indexer are already the text "\x00" - they're no longer the null byte.

On the search bar:

| rex mode=sed "s/\\\\x00//g"

Automatically at parsing ("indexing") time for any new data, in props.conf:

[yoursourcetype]
SEDCMD-remove_nulls = s/\\x00//g
LINE_BREAKER = ((?:[\r\n](?:\\x00)?)+)

Special LINE_BREAKER was added because Splunk was interpreting the null bytes between \r and \n (the two halves of the Windows newline, in the file I was working on) as additional lines and adding them to the event. It says use "(one newline character optionally followed by the text \ x 0 0) one or more times" as the breaker (thrown away) between events.

View solution in original post

saccam447
Explorer

this helped me out. thank you.

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.