Splunk Search

Is it possible to convert the hex data into ascii without affecting the ascii data?

xiaoming
New Member

Hi all, 

I am attempting to convert data extracted as a field containing combination of hex and ascii data. Was wondering if it is possible to convert the hex data into ascii without affecting the ascii data?

 

Thanks in advance 

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

It depends if the hex data is delimited in some way. Can you share some anonymised examples (preferably in code block </> format)?

0 Karma

xiaoming
New Member

Sample log:

The attachment comes in hex and ascii. Wondering if it is possible to split the AttachmentDetails field into ascii field and hex field 

Log1: 

sender=test@test.com recipient=user@user.com subject='report 2023\\r\\n this is a\\r\\n test' AttachmentDetails={'report notes.pdf': {'BodyScanner': {}}}

Log2:

sender=test@test.com recipient=user@user.com subject='report 2023\\r\\n this is a\\r\\n test' AttachmentDetails={'\x4e\x00\x4e\x8c\x4e\x09\x56\xdb.pdf': {'BodyScanner': {}}}

 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Here is a stab at converting what appears to be UCS2 - big endian Unicode CJK characters for 1234. U+4E00 U+4E8C, U+4E09, U+56DB

You can run this example.

| makeresults 
| eval text="sender=test@test.com recipient=user@user.com subject='report 2023\\r\\n this is a\\r\\n test' AttachmentDetails={'\x4e\x00\x4e\x8c\x4e\x09\x56\xdb.pdf': {'BodyScanner': {}}}"
| rex field=text max_match=0 "\\\\x(?<c1>[0-9a-f]{2})\\\\x(?<c2>[0-9a-f]{2})"
| rex field=text max_match=0 "(?<unicode_hex>\\\\x[0-9a-f]{2}\\\\x[0-9a-f]{2})"
| eval c=mvzip(c1, c2, "")
| eval unicode_char=mvmap(c, printf("%c", tonumber(c, 16)))
| eval unicode_hex=mvmap(unicode_hex, replace(unicode_hex, "\\\\", "\\\\\\\\"))
| foreach 0 1 2 3 4 5 6 7 8 9 10 [ eval text_<<FIELD>>=replace(text, mvindex(unicode_hex, <<FIELD>>), mvindex(unicode_char, <<FIELD>>)),
                                        text=if(isnull(text_<<FIELD>>), text, text_<<FIELD>>) 
                                   | fields - text_<<FIELD>> ]
| fields - c c1 c2 unicode_*

this will parse out the \xx\yy pairs into the 16 bit chars (c) and then make their converted representation (printf).

It then makes a replacement map of the original pairs \xx\yy (it has to double the \ character to make the replace work).

The it will process up to 11 characters (foreach loop) to replace each \xx\yy sequence with the appropriate character.

This is a real hack, but functional - you can increase the foreach numbers to allow for as many as you need.

It converts to

sender=test@test.com recipient=user@user.com subject='report 2023\r\n this is a\r\n test' AttachmentDetails={'一二三四.pdf': {'BodyScanner': {}}}

0 Karma
Get Updates on the Splunk Community!

Fastest way to demo Observability

I’ve been having a lot of fun learning about Kubernetes and Observability. I set myself an interesting ...

September Community Champions: A Shoutout to Our Contributors!

As we close the books on another fantastic month, we want to take a moment to celebrate the people who are the ...

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

It’s Monday morning, and your phone is buzzing with alert escalations – your customer-facing portal is running ...