Getting Data In

Extract and Transform Custom Event

jagadeeshm
Contributor

I have events with the following format -

[Thread-2505_GOOGLE_INT_20170424155901301f9e61-1493049600619-NSRLM_2_1_RTL_39088504_2_R_PCLN,PCLN] 2017-04-24 12:00:02 : T:0.047 secs S:XXXX-SSSS-SSSSSS A:Availability M:INT_20170424155901301f9e61-1493049600619-NSRLM_2_1_RTL_39088504_2_R_1 CMD: [<?xml version="1.0" ?><AvailabilityRequestV2 xmlns="http://xml.google.com" siteid="1470249" apikey="SFGSDGSDFSDFGSFG" async="false" waittime="5"><Type>4</Type><Id>460573</Id><Radius>0</Radius><Latitude>0.0</Latitude><Longitude>0.0</Longitude><CheckIn>2017-10-01</CheckIn><CheckOut>2017-10-17</CheckOut><Rooms>1</Rooms><Adults>2</Adults><Children>0</Children><Language>en-us</Language><Currency>000</Currency></AvailabilityRequestV2>]

Notice couple of things - event starts with a thread name in square brackets, followed by date/time, followed by an unwanted ":", followed by several key/value pairs separated by a ":", and finally at the end you have content in XML inside square brackets.

I want to extract/transform this into the following so it is easy to search and run spath to extract fields from xml -

 Thread = Thread-2505_GOOGLE_INT_20170424155901301f9e61-1493049600619-NSRLM_2_1_RTL_39088504_2_R_PCLN,PCLN
DateTime = 2017-04-24 12:00:02
ResponseTime = 0.047 secs 
S = XXXX-SSSS-SSSSSS 
A = Availability 
M = INT_20170424155901301f9e61-1493049600619-NSRLM_2_1_RTL_39088504_2_R_1 
Action = CMD
XML = <?xml version="1.0" ?><AvailabilityRequestV2 xmlns="http://xml.google.com" siteid="1470249" apikey="SFGSDGSDFSDFGSFG" async="false" waittime="5"><Type>4</Type><Id>460573</Id><Radius>0</Radius><Latitude>0.0</Latitude><Longitude>0.0</Longitude><CheckIn>2017-10-01</CheckIn><CheckOut>2017-10-17</CheckOut><Rooms>1</Rooms><Adults>2</Adults><Children>0</Children><Language>en-us</Language><Currency>000</Currency></AvailabilityRequestV2>

Notice that the last key CMD is put into a new key "Action" and value is put inside XML. I was able to do some clean up like removing square brackets, removing extra ":" etc with the following SED Command -

SEDCMD-remove-xml-header = s/\<\?xml[^\>]*\>//g
SEDCMD-remove-square-brackets = s/\[|\]//g
SEDCMD-remove-colon = s/ : / /g
SECCMD-reame-threadname = s/Thread-/Thread:/1

How do I achieve the rest? Is a key value transformer?

0 Karma
1 Solution

beatus
Communicator

Here's a regex that should work for you:
\[(?<Thread>.*?)\]\s(?<TimeStamp>(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})?)\s:\s(?:T:(?< ResponseTime>\S+\s+\S+))?(.)*(?<Action>(\w{3})):\s\[(?<XML>.*?)\]

This should handle the timeouts either existing in the log or not and only create the "timeout" field when they are there. Hope this helps!

View solution in original post

0 Karma

beatus
Communicator

Here's a regex that should work for you:
\[(?<Thread>.*?)\]\s(?<TimeStamp>(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})?)\s:\s(?:T:(?< ResponseTime>\S+\s+\S+))?(.)*(?<Action>(\w{3})):\s\[(?<XML>.*?)\]

This should handle the timeouts either existing in the log or not and only create the "timeout" field when they are there. Hope this helps!

0 Karma

jagadeeshm
Contributor

Thanks beatus 🙂

0 Karma

jagadeeshm
Contributor

Any splunk transformers 🙂 around?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...