Splunk Search

Is it possible to convert the hex data into ascii without affecting the ascii data?

xiaoming
New Member

Hi all, 

I am attempting to convert data extracted as a field containing combination of hex and ascii data. Was wondering if it is possible to convert the hex data into ascii without affecting the ascii data?

 

Thanks in advance 

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

It depends if the hex data is delimited in some way. Can you share some anonymised examples (preferably in code block </> format)?

0 Karma

xiaoming
New Member

Sample log:

The attachment comes in hex and ascii. Wondering if it is possible to split the AttachmentDetails field into ascii field and hex field 

Log1: 

sender=test@test.com recipient=user@user.com subject='report 2023\\r\\n this is a\\r\\n test' AttachmentDetails={'report notes.pdf': {'BodyScanner': {}}}

Log2:

sender=test@test.com recipient=user@user.com subject='report 2023\\r\\n this is a\\r\\n test' AttachmentDetails={'\x4e\x00\x4e\x8c\x4e\x09\x56\xdb.pdf': {'BodyScanner': {}}}

 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Here is a stab at converting what appears to be UCS2 - big endian Unicode CJK characters for 1234. U+4E00 U+4E8C, U+4E09, U+56DB

You can run this example.

| makeresults 
| eval text="sender=test@test.com recipient=user@user.com subject='report 2023\\r\\n this is a\\r\\n test' AttachmentDetails={'\x4e\x00\x4e\x8c\x4e\x09\x56\xdb.pdf': {'BodyScanner': {}}}"
| rex field=text max_match=0 "\\\\x(?<c1>[0-9a-f]{2})\\\\x(?<c2>[0-9a-f]{2})"
| rex field=text max_match=0 "(?<unicode_hex>\\\\x[0-9a-f]{2}\\\\x[0-9a-f]{2})"
| eval c=mvzip(c1, c2, "")
| eval unicode_char=mvmap(c, printf("%c", tonumber(c, 16)))
| eval unicode_hex=mvmap(unicode_hex, replace(unicode_hex, "\\\\", "\\\\\\\\"))
| foreach 0 1 2 3 4 5 6 7 8 9 10 [ eval text_<<FIELD>>=replace(text, mvindex(unicode_hex, <<FIELD>>), mvindex(unicode_char, <<FIELD>>)),
                                        text=if(isnull(text_<<FIELD>>), text, text_<<FIELD>>) 
                                   | fields - text_<<FIELD>> ]
| fields - c c1 c2 unicode_*

this will parse out the \xx\yy pairs into the 16 bit chars (c) and then make their converted representation (printf).

It then makes a replacement map of the original pairs \xx\yy (it has to double the \ character to make the replace work).

The it will process up to 11 characters (foreach loop) to replace each \xx\yy sequence with the appropriate character.

This is a real hack, but functional - you can increase the foreach numbers to allow for as many as you need.

It converts to

sender=test@test.com recipient=user@user.com subject='report 2023\r\n this is a\r\n test' AttachmentDetails={'一二三四.pdf': {'BodyScanner': {}}}

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...