Getting Data In

Extract and Transform Custom Event

jagadeeshm
Contributor

I have events with the following format -

[Thread-2505_GOOGLE_INT_20170424155901301f9e61-1493049600619-NSRLM_2_1_RTL_39088504_2_R_PCLN,PCLN] 2017-04-24 12:00:02 : T:0.047 secs S:XXXX-SSSS-SSSSSS A:Availability M:INT_20170424155901301f9e61-1493049600619-NSRLM_2_1_RTL_39088504_2_R_1 CMD: [<?xml version="1.0" ?><AvailabilityRequestV2 xmlns="http://xml.google.com" siteid="1470249" apikey="SFGSDGSDFSDFGSFG" async="false" waittime="5"><Type>4</Type><Id>460573</Id><Radius>0</Radius><Latitude>0.0</Latitude><Longitude>0.0</Longitude><CheckIn>2017-10-01</CheckIn><CheckOut>2017-10-17</CheckOut><Rooms>1</Rooms><Adults>2</Adults><Children>0</Children><Language>en-us</Language><Currency>000</Currency></AvailabilityRequestV2>]

Notice couple of things - event starts with a thread name in square brackets, followed by date/time, followed by an unwanted ":", followed by several key/value pairs separated by a ":", and finally at the end you have content in XML inside square brackets.

I want to extract/transform this into the following so it is easy to search and run spath to extract fields from xml -

 Thread = Thread-2505_GOOGLE_INT_20170424155901301f9e61-1493049600619-NSRLM_2_1_RTL_39088504_2_R_PCLN,PCLN
DateTime = 2017-04-24 12:00:02
ResponseTime = 0.047 secs 
S = XXXX-SSSS-SSSSSS 
A = Availability 
M = INT_20170424155901301f9e61-1493049600619-NSRLM_2_1_RTL_39088504_2_R_1 
Action = CMD
XML = <?xml version="1.0" ?><AvailabilityRequestV2 xmlns="http://xml.google.com" siteid="1470249" apikey="SFGSDGSDFSDFGSFG" async="false" waittime="5"><Type>4</Type><Id>460573</Id><Radius>0</Radius><Latitude>0.0</Latitude><Longitude>0.0</Longitude><CheckIn>2017-10-01</CheckIn><CheckOut>2017-10-17</CheckOut><Rooms>1</Rooms><Adults>2</Adults><Children>0</Children><Language>en-us</Language><Currency>000</Currency></AvailabilityRequestV2>

Notice that the last key CMD is put into a new key "Action" and value is put inside XML. I was able to do some clean up like removing square brackets, removing extra ":" etc with the following SED Command -

SEDCMD-remove-xml-header = s/\<\?xml[^\>]*\>//g
SEDCMD-remove-square-brackets = s/\[|\]//g
SEDCMD-remove-colon = s/ : / /g
SECCMD-reame-threadname = s/Thread-/Thread:/1

How do I achieve the rest? Is a key value transformer?

0 Karma
1 Solution

beatus
Communicator

Here's a regex that should work for you:
\[(?<Thread>.*?)\]\s(?<TimeStamp>(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})?)\s:\s(?:T:(?< ResponseTime>\S+\s+\S+))?(.)*(?<Action>(\w{3})):\s\[(?<XML>.*?)\]

This should handle the timeouts either existing in the log or not and only create the "timeout" field when they are there. Hope this helps!

View solution in original post

0 Karma

beatus
Communicator

Here's a regex that should work for you:
\[(?<Thread>.*?)\]\s(?<TimeStamp>(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})?)\s:\s(?:T:(?< ResponseTime>\S+\s+\S+))?(.)*(?<Action>(\w{3})):\s\[(?<XML>.*?)\]

This should handle the timeouts either existing in the log or not and only create the "timeout" field when they are there. Hope this helps!

0 Karma

jagadeeshm
Contributor

Thanks beatus 🙂

0 Karma

jagadeeshm
Contributor

Any splunk transformers 🙂 around?

0 Karma
Get Updates on the Splunk Community!

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...