Getting Data In

Extract and Transform Custom Event

jagadeeshm
Contributor

I have events with the following format -

[Thread-2505_GOOGLE_INT_20170424155901301f9e61-1493049600619-NSRLM_2_1_RTL_39088504_2_R_PCLN,PCLN] 2017-04-24 12:00:02 : T:0.047 secs S:XXXX-SSSS-SSSSSS A:Availability M:INT_20170424155901301f9e61-1493049600619-NSRLM_2_1_RTL_39088504_2_R_1 CMD: [<?xml version="1.0" ?><AvailabilityRequestV2 xmlns="http://xml.google.com" siteid="1470249" apikey="SFGSDGSDFSDFGSFG" async="false" waittime="5"><Type>4</Type><Id>460573</Id><Radius>0</Radius><Latitude>0.0</Latitude><Longitude>0.0</Longitude><CheckIn>2017-10-01</CheckIn><CheckOut>2017-10-17</CheckOut><Rooms>1</Rooms><Adults>2</Adults><Children>0</Children><Language>en-us</Language><Currency>000</Currency></AvailabilityRequestV2>]

Notice couple of things - event starts with a thread name in square brackets, followed by date/time, followed by an unwanted ":", followed by several key/value pairs separated by a ":", and finally at the end you have content in XML inside square brackets.

I want to extract/transform this into the following so it is easy to search and run spath to extract fields from xml -

 Thread = Thread-2505_GOOGLE_INT_20170424155901301f9e61-1493049600619-NSRLM_2_1_RTL_39088504_2_R_PCLN,PCLN
DateTime = 2017-04-24 12:00:02
ResponseTime = 0.047 secs 
S = XXXX-SSSS-SSSSSS 
A = Availability 
M = INT_20170424155901301f9e61-1493049600619-NSRLM_2_1_RTL_39088504_2_R_1 
Action = CMD
XML = <?xml version="1.0" ?><AvailabilityRequestV2 xmlns="http://xml.google.com" siteid="1470249" apikey="SFGSDGSDFSDFGSFG" async="false" waittime="5"><Type>4</Type><Id>460573</Id><Radius>0</Radius><Latitude>0.0</Latitude><Longitude>0.0</Longitude><CheckIn>2017-10-01</CheckIn><CheckOut>2017-10-17</CheckOut><Rooms>1</Rooms><Adults>2</Adults><Children>0</Children><Language>en-us</Language><Currency>000</Currency></AvailabilityRequestV2>

Notice that the last key CMD is put into a new key "Action" and value is put inside XML. I was able to do some clean up like removing square brackets, removing extra ":" etc with the following SED Command -

SEDCMD-remove-xml-header = s/\<\?xml[^\>]*\>//g
SEDCMD-remove-square-brackets = s/\[|\]//g
SEDCMD-remove-colon = s/ : / /g
SECCMD-reame-threadname = s/Thread-/Thread:/1

How do I achieve the rest? Is a key value transformer?

0 Karma
1 Solution

beatus
Communicator

Here's a regex that should work for you:
\[(?<Thread>.*?)\]\s(?<TimeStamp>(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})?)\s:\s(?:T:(?< ResponseTime>\S+\s+\S+))?(.)*(?<Action>(\w{3})):\s\[(?<XML>.*?)\]

This should handle the timeouts either existing in the log or not and only create the "timeout" field when they are there. Hope this helps!

View solution in original post

0 Karma

beatus
Communicator

Here's a regex that should work for you:
\[(?<Thread>.*?)\]\s(?<TimeStamp>(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})?)\s:\s(?:T:(?< ResponseTime>\S+\s+\S+))?(.)*(?<Action>(\w{3})):\s\[(?<XML>.*?)\]

This should handle the timeouts either existing in the log or not and only create the "timeout" field when they are there. Hope this helps!

0 Karma

jagadeeshm
Contributor

Thanks beatus 🙂

0 Karma

jagadeeshm
Contributor

Any splunk transformers 🙂 around?

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...