Getting Data In

Extract and Transform Custom Event

jagadeeshm
Contributor

I have events with the following format -

[Thread-2505_GOOGLE_INT_20170424155901301f9e61-1493049600619-NSRLM_2_1_RTL_39088504_2_R_PCLN,PCLN] 2017-04-24 12:00:02 : T:0.047 secs S:XXXX-SSSS-SSSSSS A:Availability M:INT_20170424155901301f9e61-1493049600619-NSRLM_2_1_RTL_39088504_2_R_1 CMD: [<?xml version="1.0" ?><AvailabilityRequestV2 xmlns="http://xml.google.com" siteid="1470249" apikey="SFGSDGSDFSDFGSFG" async="false" waittime="5"><Type>4</Type><Id>460573</Id><Radius>0</Radius><Latitude>0.0</Latitude><Longitude>0.0</Longitude><CheckIn>2017-10-01</CheckIn><CheckOut>2017-10-17</CheckOut><Rooms>1</Rooms><Adults>2</Adults><Children>0</Children><Language>en-us</Language><Currency>000</Currency></AvailabilityRequestV2>]

Notice couple of things - event starts with a thread name in square brackets, followed by date/time, followed by an unwanted ":", followed by several key/value pairs separated by a ":", and finally at the end you have content in XML inside square brackets.

I want to extract/transform this into the following so it is easy to search and run spath to extract fields from xml -

 Thread = Thread-2505_GOOGLE_INT_20170424155901301f9e61-1493049600619-NSRLM_2_1_RTL_39088504_2_R_PCLN,PCLN
DateTime = 2017-04-24 12:00:02
ResponseTime = 0.047 secs 
S = XXXX-SSSS-SSSSSS 
A = Availability 
M = INT_20170424155901301f9e61-1493049600619-NSRLM_2_1_RTL_39088504_2_R_1 
Action = CMD
XML = <?xml version="1.0" ?><AvailabilityRequestV2 xmlns="http://xml.google.com" siteid="1470249" apikey="SFGSDGSDFSDFGSFG" async="false" waittime="5"><Type>4</Type><Id>460573</Id><Radius>0</Radius><Latitude>0.0</Latitude><Longitude>0.0</Longitude><CheckIn>2017-10-01</CheckIn><CheckOut>2017-10-17</CheckOut><Rooms>1</Rooms><Adults>2</Adults><Children>0</Children><Language>en-us</Language><Currency>000</Currency></AvailabilityRequestV2>

Notice that the last key CMD is put into a new key "Action" and value is put inside XML. I was able to do some clean up like removing square brackets, removing extra ":" etc with the following SED Command -

SEDCMD-remove-xml-header = s/\<\?xml[^\>]*\>//g
SEDCMD-remove-square-brackets = s/\[|\]//g
SEDCMD-remove-colon = s/ : / /g
SECCMD-reame-threadname = s/Thread-/Thread:/1

How do I achieve the rest? Is a key value transformer?

0 Karma
1 Solution

beatus
Communicator

Here's a regex that should work for you:
\[(?<Thread>.*?)\]\s(?<TimeStamp>(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})?)\s:\s(?:T:(?< ResponseTime>\S+\s+\S+))?(.)*(?<Action>(\w{3})):\s\[(?<XML>.*?)\]

This should handle the timeouts either existing in the log or not and only create the "timeout" field when they are there. Hope this helps!

View solution in original post

0 Karma

beatus
Communicator

Here's a regex that should work for you:
\[(?<Thread>.*?)\]\s(?<TimeStamp>(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})?)\s:\s(?:T:(?< ResponseTime>\S+\s+\S+))?(.)*(?<Action>(\w{3})):\s\[(?<XML>.*?)\]

This should handle the timeouts either existing in the log or not and only create the "timeout" field when they are there. Hope this helps!

0 Karma

jagadeeshm
Contributor

Thanks beatus 🙂

0 Karma

jagadeeshm
Contributor

Any splunk transformers 🙂 around?

0 Karma
Get Updates on the Splunk Community!

Improve Your Security Posture

Watch NowImprove Your Security PostureCustomers are at the center of everything we do at Splunk and security ...

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...