Splunk Search

Is it possible to convert the hex data into ascii without affecting the ascii data?

xiaoming
New Member

Hi all, 

I am attempting to convert data extracted as a field containing combination of hex and ascii data. Was wondering if it is possible to convert the hex data into ascii without affecting the ascii data?

 

Thanks in advance 

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

It depends if the hex data is delimited in some way. Can you share some anonymised examples (preferably in code block </> format)?

0 Karma

xiaoming
New Member

Sample log:

The attachment comes in hex and ascii. Wondering if it is possible to split the AttachmentDetails field into ascii field and hex field 

Log1: 

sender=test@test.com recipient=user@user.com subject='report 2023\\r\\n this is a\\r\\n test' AttachmentDetails={'report notes.pdf': {'BodyScanner': {}}}

Log2:

sender=test@test.com recipient=user@user.com subject='report 2023\\r\\n this is a\\r\\n test' AttachmentDetails={'\x4e\x00\x4e\x8c\x4e\x09\x56\xdb.pdf': {'BodyScanner': {}}}

 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Here is a stab at converting what appears to be UCS2 - big endian Unicode CJK characters for 1234. U+4E00 U+4E8C, U+4E09, U+56DB

You can run this example.

| makeresults 
| eval text="sender=test@test.com recipient=user@user.com subject='report 2023\\r\\n this is a\\r\\n test' AttachmentDetails={'\x4e\x00\x4e\x8c\x4e\x09\x56\xdb.pdf': {'BodyScanner': {}}}"
| rex field=text max_match=0 "\\\\x(?<c1>[0-9a-f]{2})\\\\x(?<c2>[0-9a-f]{2})"
| rex field=text max_match=0 "(?<unicode_hex>\\\\x[0-9a-f]{2}\\\\x[0-9a-f]{2})"
| eval c=mvzip(c1, c2, "")
| eval unicode_char=mvmap(c, printf("%c", tonumber(c, 16)))
| eval unicode_hex=mvmap(unicode_hex, replace(unicode_hex, "\\\\", "\\\\\\\\"))
| foreach 0 1 2 3 4 5 6 7 8 9 10 [ eval text_<<FIELD>>=replace(text, mvindex(unicode_hex, <<FIELD>>), mvindex(unicode_char, <<FIELD>>)),
                                        text=if(isnull(text_<<FIELD>>), text, text_<<FIELD>>) 
                                   | fields - text_<<FIELD>> ]
| fields - c c1 c2 unicode_*

this will parse out the \xx\yy pairs into the 16 bit chars (c) and then make their converted representation (printf).

It then makes a replacement map of the original pairs \xx\yy (it has to double the \ character to make the replace work).

The it will process up to 11 characters (foreach loop) to replace each \xx\yy sequence with the appropriate character.

This is a real hack, but functional - you can increase the foreach numbers to allow for as many as you need.

It converts to

sender=test@test.com recipient=user@user.com subject='report 2023\r\n this is a\r\n test' AttachmentDetails={'一二三四.pdf': {'BodyScanner': {}}}

0 Karma
Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

  &#x1f680; Your data just got a serious AI upgrade — are you ready? Say hello to the Agentic Era with the ...

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...