Hi all,
I am attempting to convert data extracted as a field containing combination of hex and ascii data. Was wondering if it is possible to convert the hex data into ascii without affecting the ascii data?
Thanks in advance
It depends if the hex data is delimited in some way. Can you share some anonymised examples (preferably in code block </> format)?
Sample log:
The attachment comes in hex and ascii. Wondering if it is possible to split the AttachmentDetails field into ascii field and hex field
Log1:
sender=test@test.com recipient=user@user.com subject='report 2023\\r\\n this is a\\r\\n test' AttachmentDetails={'report notes.pdf': {'BodyScanner': {}}}
Log2:
sender=test@test.com recipient=user@user.com subject='report 2023\\r\\n this is a\\r\\n test' AttachmentDetails={'\x4e\x00\x4e\x8c\x4e\x09\x56\xdb.pdf': {'BodyScanner': {}}}
Here is a stab at converting what appears to be UCS2 - big endian Unicode CJK characters for 1234. U+4E00 U+4E8C, U+4E09, U+56DB
You can run this example.
| makeresults
| eval text="sender=test@test.com recipient=user@user.com subject='report 2023\\r\\n this is a\\r\\n test' AttachmentDetails={'\x4e\x00\x4e\x8c\x4e\x09\x56\xdb.pdf': {'BodyScanner': {}}}"
| rex field=text max_match=0 "\\\\x(?<c1>[0-9a-f]{2})\\\\x(?<c2>[0-9a-f]{2})"
| rex field=text max_match=0 "(?<unicode_hex>\\\\x[0-9a-f]{2}\\\\x[0-9a-f]{2})"
| eval c=mvzip(c1, c2, "")
| eval unicode_char=mvmap(c, printf("%c", tonumber(c, 16)))
| eval unicode_hex=mvmap(unicode_hex, replace(unicode_hex, "\\\\", "\\\\\\\\"))
| foreach 0 1 2 3 4 5 6 7 8 9 10 [ eval text_<<FIELD>>=replace(text, mvindex(unicode_hex, <<FIELD>>), mvindex(unicode_char, <<FIELD>>)),
text=if(isnull(text_<<FIELD>>), text, text_<<FIELD>>)
| fields - text_<<FIELD>> ]
| fields - c c1 c2 unicode_*
this will parse out the \xx\yy pairs into the 16 bit chars (c) and then make their converted representation (printf).
It then makes a replacement map of the original pairs \xx\yy (it has to double the \ character to make the replace work).
The it will process up to 11 characters (foreach loop) to replace each \xx\yy sequence with the appropriate character.
This is a real hack, but functional - you can increase the foreach numbers to allow for as many as you need.
It converts to
sender=test@test.com recipient=user@user.com subject='report 2023\r\n this is a\r\n test' AttachmentDetails={'一二三四.pdf': {'BodyScanner': {}}}