Splunk Search

How do I remove \x00 characters from my log message?

Jason
Motivator

I have a log message which (thanks, M$) has been littered with \x00 text - originally null bytes. They appear every other character, making it almost impossible to read. Can Splunk automatically remove these for me?

Tags (2)
1 Solution

Jason
Motivator

Yes, Splunk can. You can use SEDCMD- to rewrite the events to remove the \x00s, which by the time the data hits an indexer are already the text "\x00" - they're no longer the null byte.

On the search bar:

| rex mode=sed "s/\\\\x00//g"

Automatically at parsing ("indexing") time for any new data, in props.conf:

[yoursourcetype]
SEDCMD-remove_nulls = s/\\x00//g
LINE_BREAKER = ((?:[\r\n](?:\\x00)?)+)

Special LINE_BREAKER was added because Splunk was interpreting the null bytes between \r and \n (the two halves of the Windows newline, in the file I was working on) as additional lines and adding them to the event. It says use "(one newline character optionally followed by the text \ x 0 0) one or more times" as the breaker (thrown away) between events.

View solution in original post

clintla
Contributor

What is the /g part? What if I just wanted to delete the characters and/or just swap them w/ nothing?

xxx-xxx-xxxx is now xxxxxxxxxx

0 Karma

Jason
Motivator

It is best to not post additional questions in the answer section. Post them as a question so they get proper visibility.

/g means globally - it will replace every instance of the subject that it finds, not just the first one.

s/-//g would swap app dash with nothing. but if you did s/-// with no g, you would end up with xxxxxx-xxxx.

0 Karma

Jason
Motivator

I tried the UTF-16LE as mentioned here but it did not work. But now that I think about it, I might have put the config on the indexer, not the universal forwarder. Oops. Config below still works when put on the indexer.

JSapienza
Contributor

Have a look at SEDCMD - Admin Manual - Props.conf

Adding this to your props.conf should work:

SEDCMD-StripNULL= s/\x00//g

jonuwz
Influencer

This sounds like a character encoding problem to me.

If the log is encoded as UTF-16, only contains UTF-8 and is being read as UTF-8, then there'll be extra \x00 between each character.

Find out what character encoding the messages use, then set the charset in splunk

Jason
Motivator

Yes, Splunk can. You can use SEDCMD- to rewrite the events to remove the \x00s, which by the time the data hits an indexer are already the text "\x00" - they're no longer the null byte.

On the search bar:

| rex mode=sed "s/\\\\x00//g"

Automatically at parsing ("indexing") time for any new data, in props.conf:

[yoursourcetype]
SEDCMD-remove_nulls = s/\\x00//g
LINE_BREAKER = ((?:[\r\n](?:\\x00)?)+)

Special LINE_BREAKER was added because Splunk was interpreting the null bytes between \r and \n (the two halves of the Windows newline, in the file I was working on) as additional lines and adding them to the event. It says use "(one newline character optionally followed by the text \ x 0 0) one or more times" as the breaker (thrown away) between events.

saccam447
Explorer

this helped me out. thank you.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...