Splunk Search

XML Field extraction from Syslog messages

rsreese
Explorer

I am receiving XML formated messages via Logstash which are then forwarded to splunk over syslog. xmlkv allows for parsing of all of the fields during search but need all of the fields to always be parsed. Currently the source host is 172.16.0.100 and type is syslog. The source host should be 172.16.0.150 or LAB-EPO and type should be whatever allows for all of the fields in the XML schema starting with to be parsed. What is the ideal method to perform field extraction as well as changing the source host and type to something like EPOEvents?

I was thinking the sourcetype could be associated with something unique such as ePO or Logstash and then spath or similar would pull out the XML portion for field extraction.

Aug  5 01:24:25 172.16.0.100 Aug 05 04:23:16 172.16.0.150 LOGSTASH[-]: <29>1 2017-08-05T04:24:40.0Z LAB-EPO EPOEvents - EventFwd [agentInfo@3401 tenantId="1"] <?xml version="1.0" encoding="UTF-8"?><EPOevent><MachineInfo><MachineName>LAB-WIN7-01</MachineName><AgentGUID>{dac2633a-7863-11e7-20ed-000c29ad4867}</AgentGUID><IPAddress>192.168.115.101</IPAddress><OSName>Windows 7</OSName><UserName>SYSTEM</UserName><TimeZoneBias>240</TimeZoneBias><RawMACAddress>000c29ad4867</RawMACAddress></MachineInfo><SoftwareInfo ProductName="McAfee Endpoint Security" ProductVersion="10.5.0" ProductFamily="TVD"><CommonFields><Analyzer>ENDP_AM_1050</Analyzer><AnalyzerName>McAfee Endpoint Security</AnalyzerName><AnalyzerVersion>10.5.0</AnalyzerVersion><AnalyzerHostName>LAB-WIN7-01</AnalyzerHostName><AnalyzerEngineVersion>5900.7806</AnalyzerEngineVersion><AnalyzerDetectionMethod>On-Access Scan</AnalyzerDetectionMethod><AnalyzerDATVersion>3062.0</AnalyzerDATVersion></CommonFields><Event><EventID>1278</EventID><Severity>3</Severity><GMTTime>2017-08-05T04:22:38</GMTTime><CommonFields><ThreatCategory>av.detect</ThreatCategory><ThreatEventID>1278</ThreatEventID><ThreatSeverity>2</ThreatSeverity><ThreatName>EICAR test file</ThreatName><ThreatType>test</ThreatType><DetectedUTC>2017-08-05T04:22:38Z</DetectedUTC><ThreatActionTaken>IDS_ALERT_ACT_TAK_DEL</ThreatActionTaken><ThreatHandled>True</ThreatHandled><SourceHostName>LAB-WIN7-01</SourceHostName><SourceProcessName>C:\Program Files (x86)\Google\Chrome\application\chrome.exe</SourceProcessName><TargetHostName>LAB-WIN7-01</TargetHostName><TargetUserName>LAB-WIN7-01\xadmin</TargetUserName><TargetFileName>C:\USERS\XADMIN\DOWNLOADS\Unconfirmed 888545.crdownload</TargetFileName></CommonFields><CustomFields target="EPExtendedEventMT"><BladeName>IDS_BLADE_NAME_SPB</BladeName><AnalyzerContentCreationDate>2017-08-03T12:35:00Z</AnalyzerContentCreationDate><AnalyzerGTIQuery>False</AnalyzerGTIQuery><ThreatDetectedOnCreation>False</ThreatDetectedOnCreation><TargetName>Unconfirmed 888545.crdownload</TargetName><TargetPath>C:\USERS\XADMIN\DOWNLOADS</TargetPath><TargetHash>44d88612fea8a8f36de82e1278abb02f</TargetHash><TargetFileSize>68</TargetFileSize><TargetModifyTime>2017-08-05T04:22:38Z</TargetModifyTime><TargetAccessTime>2017-08-05T04:22:38Z</TargetAccessTime><TargetCreateTime>2017-08-05T04:22:38Z</TargetCreateTime><Cleanable>False</Cleanable><TaskName>IDS_OAS_TASK_NAME</TaskName><FirstAttemptedAction>IDS_ALERT_THACT_ATT_CLE</FirstAttemptedAction><FirstActionStatus>False</FirstActionStatus><SecondAttemptedAction>IDS_ALERT_THACT_ATT_DEL</SecondAttemptedAction><SecondActionStatus>True</SecondActionStatus><AttackVectorType>4</AttackVectorType><DurationBeforeDetection>0</DurationBeforeDetection><NaturalLangDescription>IDS_NATURAL_LANG_OAS_DETECTION_DEL|TargetName=Unconfirmed 888545.crdownload|TargetPath=C:\USERS\XADMIN\DOWNLOADS|ThreatName=EICAR test file|SourceProcessName=C:\Program Files (x86)\Google\Chrome\application\chrome.exe|ThreatType=test|TargetUserName=LAB-WIN7-01\xadmin</NaturalLangDescription><AccessRequested></AccessRequested><DetectionMessage>IDS_OAS_DEFAULT_THREAT_MESSAGE</DetectionMessage><AMCoreContentVersion>3062.0</AMCoreContentVersion></CustomFields></Event></SoftwareInfo></EPOevent>
0 Karma
1 Solution

cpetterborg
SplunkTrust
SplunkTrust

You can filter out (remove) the header portion from the events with props.conf using the SEDCMD parameter for your sourcetype if the data is pretty much just like this. If it is sometimes more complicated, then you can at least do it with props.conf and transforms.conf. Since the timestamp parsing is done before the SEDCMD and TRANSFORMS, your timestamp information will still be used at indexing for the event timestamp. Once the header is removed, then the KV_MODE parameter can be set to XML and your data should be read in as XML just fine. There are other ways to do this, but this seems to me to be just about the easiest.

View solution in original post

cpetterborg
SplunkTrust
SplunkTrust

You can filter out (remove) the header portion from the events with props.conf using the SEDCMD parameter for your sourcetype if the data is pretty much just like this. If it is sometimes more complicated, then you can at least do it with props.conf and transforms.conf. Since the timestamp parsing is done before the SEDCMD and TRANSFORMS, your timestamp information will still be used at indexing for the event timestamp. Once the header is removed, then the KV_MODE parameter can be set to XML and your data should be read in as XML just fine. There are other ways to do this, but this seems to me to be just about the easiest.

rsreese
Explorer

Can you please promote your recommendation to use SED_CMD at the indexer in order to parse the XML event so I may accept it as an answer?

0 Karma

rsreese
Explorer

This is what I am looking for. I am going to instead write the output to disk and have the universal forwarder push it to an indexer. Would I apply your recommendations at the universal forwarder or indexer?

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

In this case the props.conf configuration would be on the indexers.

rsreese
Explorer

Can you promote this and your SED_CMD as an answer so I may accept it as that worked for extracting the XML message?

0 Karma

woodcock
Esteemed Legend

I would send the XML to HTTP Event Collector instead, which can easily be told to automatically process the XML and create fields at index-time.

rsreese
Explorer

How would I go about extracting the XML portion from the SYSLOG/LOGSTASH header and then sending that to the HTTP Event Collector and telling it to parse as XML?

0 Karma

woodcock
Esteemed Legend

Do not send it to a syslog collector, send it to the Indexer tier (probably through a load-balancer) using the HEC feature. If it is coherent/proper XML, you should not need to do much. Or are you saying that the XML payload is encrusted in a bunch of other packaging?

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

I have used the instructions found within this article to do syslog-to-splunk through HEC:

https://www.rfaircloth.com/building-high-performance-low-latency-rsyslog-splunk/

There are several advantages to doing syslog collection this way, not the least of which would be to use the HEC in this manner as woodcock has suggested. The instructions in the article do have a few holes, but are mostly complete.

0 Karma

rsreese
Explorer

Are you saying that I should forward from Logstash, to rsyslog, and then push to splunk using this feature? I am failing to see how this is going to help parse the XML portion from the message for field extraction? Assume the XML still needs to be parsed from the syslog header which from what I have read would require spath or something as well as defining a new sourcetype.

0 Karma

woodcock
Esteemed Legend

HEC eats both JSON and XML by design.

0 Karma

rsreese
Explorer

The message is not pure XML. At a minimum, the message contains the following header before the XML tag:

<29>1 2017-08-05T03:49:40.0Z LAB-EPO EPOEvents - EventFwd [agentInfo@3401 tenantId="1"]
0 Karma

rsreese
Explorer

The original message is sent from a system running McAfee ePO. ePO will only send events over syslog via SSL and the only way I have been able to successful terminate (receive those events) is having Logstash act as the receiver. Logstash then forwards those messages to Splunk over syslog.

I was thinking the sourcetype could be associated with something unique such as ePO or Logstash and then spath or similar would pull out the XML portion for field extraction. I have tried a few different things but am quite new to Splunk.

0 Karma

FlorianScho
Path Finder

Hi rsreese, 

i know this post is some years old already but maybe it can help someone in the future. The McAfee ePO or now called Trellix Orchestrator can only sent data to tcp ports via SSL. 

So switch the input from [tcp://514] to [tcp-ssl:514]. Be sure to fulfill the configuration requirements for tcp-ssl inputs. 

0 Karma
Get Updates on the Splunk Community!

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...