Splunk Search

How do I remove \x00 characters from my log message?

Jason
Motivator

I have a log message which (thanks, M$) has been littered with \x00 text - originally null bytes. They appear every other character, making it almost impossible to read. Can Splunk automatically remove these for me?

Tags (2)
1 Solution

Jason
Motivator

Yes, Splunk can. You can use SEDCMD- to rewrite the events to remove the \x00s, which by the time the data hits an indexer are already the text "\x00" - they're no longer the null byte.

On the search bar:

| rex mode=sed "s/\\\\x00//g"

Automatically at parsing ("indexing") time for any new data, in props.conf:

[yoursourcetype]
SEDCMD-remove_nulls = s/\\x00//g
LINE_BREAKER = ((?:[\r\n](?:\\x00)?)+)

Special LINE_BREAKER was added because Splunk was interpreting the null bytes between \r and \n (the two halves of the Windows newline, in the file I was working on) as additional lines and adding them to the event. It says use "(one newline character optionally followed by the text \ x 0 0) one or more times" as the breaker (thrown away) between events.

View solution in original post

clintla
Contributor

What is the /g part? What if I just wanted to delete the characters and/or just swap them w/ nothing?

xxx-xxx-xxxx is now xxxxxxxxxx

0 Karma

Jason
Motivator

It is best to not post additional questions in the answer section. Post them as a question so they get proper visibility.

/g means globally - it will replace every instance of the subject that it finds, not just the first one.

s/-//g would swap app dash with nothing. but if you did s/-// with no g, you would end up with xxxxxx-xxxx.

0 Karma

Jason
Motivator

I tried the UTF-16LE as mentioned here but it did not work. But now that I think about it, I might have put the config on the indexer, not the universal forwarder. Oops. Config below still works when put on the indexer.

JSapienza
Contributor

Have a look at SEDCMD - Admin Manual - Props.conf

Adding this to your props.conf should work:

SEDCMD-StripNULL= s/\x00//g

jonuwz
Influencer

This sounds like a character encoding problem to me.

If the log is encoded as UTF-16, only contains UTF-8 and is being read as UTF-8, then there'll be extra \x00 between each character.

Find out what character encoding the messages use, then set the charset in splunk

Jason
Motivator

Yes, Splunk can. You can use SEDCMD- to rewrite the events to remove the \x00s, which by the time the data hits an indexer are already the text "\x00" - they're no longer the null byte.

On the search bar:

| rex mode=sed "s/\\\\x00//g"

Automatically at parsing ("indexing") time for any new data, in props.conf:

[yoursourcetype]
SEDCMD-remove_nulls = s/\\x00//g
LINE_BREAKER = ((?:[\r\n](?:\\x00)?)+)

Special LINE_BREAKER was added because Splunk was interpreting the null bytes between \r and \n (the two halves of the Windows newline, in the file I was working on) as additional lines and adding them to the event. It says use "(one newline character optionally followed by the text \ x 0 0) one or more times" as the breaker (thrown away) between events.

saccam447
Explorer

this helped me out. thank you.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...