Knowledge Management

How to achieve auto field extraction of nested JSON with xml in front using props.conf/transforms.conf on search head?

abhisplunk1
Explorer
 
I want to extract fields from events similar to following event, through props.conf using regualr expression. The challange is that the event is XML formatted but it has Json data embeded in it.
 
 
I am trying to find solution similar to the solution stated in this post:https://community.splunk.com/t5/Getting-Data-In/Sed-command-Large-XML-values-in-JSON-events-makes-re...
 
 
This is how my events look like:(example event)
 
 
 
<25>1 2023-04-03T13:12:32.0Z AH-1249259-001 EPOEvents - EventFwd [agentInfo@3401 tenantId="1" bpsId="1" tenantGUID="{00000000-0000-0000-0000-000000000000}" tenantNodePath="1\2"]
<?xml version="1.0" encoding="utf-8"?>
<EPOEvent><MachineInfo><AgentGUID>{8396cab6-ec77-11ea-2747-3448edc44e42}</AgentGUID><MachineName>KB89A2AEBECBD</MachineName>
<RawMACAddress>12345</RawMACAddress>
<IPAddress>12345</IPAddress>
<AgentVersion>5.7.5.504</AgentVersion>
<OSName>Windows 10</OSName>
<TimeZoneBias>300</TimeZoneBias>
<UserName>chill</UserName>
</MachineInfo>
<SoftwareInfo ProductName="BeyondTrust Privilege Management" ProductVersion="23.1.0.259" ProductFamily="Secure">
<Event>
<EventID>202256</EventID>
<Severity>0</Severity>
<GMTTime>2023-04-03T13:10:36</GMTTime>
<LocalTime>2023-04-03T08:10:36</LocalTime>
<CustomFields target="AvectoReportingEvents">
     <Data>{&quot;Header&quot; :
{&quot;AgentVersion&quot; : &quot;23.1.259.0&quot;,
&quot;Code&quot; : &quot;106&quot;,
&quot;EndpointType&quot; : &quot;MicrosoftWindows&quot;,
&quot;HostDomainName&quot;: &quot;my.com&quot;,
&quot;RuleScriptStatus&quot;: &quot;&quot;, &quot;AuthMethods&quot;: [], &quot;IdPAuthenticationUserName&quot;: &quot;&quot;,
&quot;ConfigurationID&quot;: &quot;be94d460-c4cb-4827-8f3b-5572727c54e6&quot;, &quot;UACTriggered&quot;: 0 }}
    </Data>
<EventId>106</EventId>
<SentTime>2023-04-03T13:10:36Z</SentTime>
<Version>23.1.0.259</Version></CustomFields></Event></SoftwareInfo></EPOEvent>
 
Labels (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

You can't do that. Autimatic key-value extractions can be either json-based or xml-based, not both at once.

0 Karma

woodcock
Esteemed Legend

Like this:

|makeresults
| eval _raw="<25>1 2023-04-03T13:12:32.0Z AH-1249259-001 EPOEvents - EventFwd [agentInfo@3401 tenantId=\"1\" bpsId=\"1\" tenantGUID=\"{00000000-0000-0000-0000-000000000000}\" tenantNodePath=\"1\2\"] <?xml version=\"1.0\" encoding=\"utf-8\"?> <EPOEvent><MachineInfo><AgentGUID>{8396cab6-ec77-11ea-2747-3448edc44e42}</AgentGUID><MachineName>KB89A2AEBECBD</MachineName> <RawMACAddress>12345</RawMACAddress> <IPAddress>12345</IPAddress> <AgentVersion>5.7.5.504</AgentVersion> <OSName>Windows 10</OSName> <TimeZoneBias>300</TimeZoneBias> <UserName>chill</UserName> </MachineInfo> <SoftwareInfo ProductName=\"BeyondTrust Privilege Management\" ProductVersion=\"23.1.0.259\" ProductFamily=\"Secure\"> <Event> <EventID>202256</EventID> <Severity>0</Severity> <GMTTime>2023-04-03T13:10:36</GMTTime> <LocalTime>2023-04-03T08:10:36</LocalTime> <CustomFields target=\"AvectoReportingEvents\">      <Data>{&quot;Header&quot; : {&quot;AgentVersion&quot; : &quot;23.1.259.0&quot;, &quot;Code&quot; : &quot;106&quot;, &quot;EndpointType&quot; : &quot;MicrosoftWindows&quot;, &quot;HostDomainName&quot;: &quot;my.com&quot;, &quot;RuleScriptStatus&quot;: &quot;&quot;, &quot;AuthMethods&quot;: [], &quot;IdPAuthenticationUserName&quot;: &quot;&quot;, &quot;ConfigurationID&quot;: &quot;be94d460-c4cb-4827-8f3b-5572727c54e6&quot;, &quot;UACTriggered&quot;: 0 }}     </Data> <EventId>106</EventId> <SentTime>2023-04-03T13:10:36Z</SentTime> <Version>23.1.0.259</Version></CustomFields></Event></SoftwareInfo></EPOEvent>"
| xmlkv
| rename Data AS _raw
| rex mode=sed "s/&quot;/\"/g"
| kv

 

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

@abhisplunk1 

Can you please try this?

YOUR_SEARCH
| rex field=_raw "<Data>(?<json_data>(?!<\/Data>)[\s\S]*)<\/Data>" max_match=0 
| eval json_data = replace(json_data,"&quot;","\""), json_data = replace(json_data,"\n","")
|table json_data
| spath input=json_data

 

Screenshot 2023-04-06 at 10.42.39 AM.png

I hope this will help you.

Thanks
KV
If any of my replies help you to solve the problem Or gain knowledge, an upvote would be appreciated.

 

0 Karma

abhisplunk1
Explorer

Is there a way to do this on search head instead of working in search, cause the events are not getting parsed. I want them to be parsed.

 

Tags (1)
0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

@abhisplunk1 

Can you please try this?

YOUR_SEARCH
| rex field=_raw "<Data>(?<json_data>(?!<\/Data>)[\s\S]*)<\/Data>" max_match=0 
| eval json_data = replace(json_data,"&quot;","\""), json_data = replace(json_data,"\n","")
|table json_data
| spath input=json_data

 

My Sample Search :

| makeresults | eval _raw="<25>1 2023-04-03T13:12:32.0Z AH-1249259-001 EPOEvents - EventFwd [agentInfo@3401 tenantId=\"1\" bpsId=\"1\" tenantGUID=\"{00000000-0000-0000-0000-000000000000}\" tenantNodePath=\"1\2\"]
<?xml version=\"1.0\" encoding=\"utf-8\"?>
<EPOEvent><MachineInfo><AgentGUID>{8396cab6-ec77-11ea-2747-3448edc44e42}</AgentGUID><MachineName>KB89A2AEBECBD</MachineName>
<RawMACAddress>12345</RawMACAddress>
<IPAddress>12345</IPAddress>
<AgentVersion>5.7.5.504</AgentVersion>
<OSName>Windows 10</OSName>
<TimeZoneBias>300</TimeZoneBias>
<UserName>chill</UserName>
</MachineInfo>
<SoftwareInfo ProductName=\"BeyondTrust Privilege Management\" ProductVersion=\"23.1.0.259\" ProductFamily=\"Secure\">
<Event>
<EventID>202256</EventID>
<Severity>0</Severity>
<GMTTime>2023-04-03T13:10:36</GMTTime>
<LocalTime>2023-04-03T08:10:36</LocalTime>
<CustomFields target=\"AvectoReportingEvents\">
     <Data>{&quot;Header&quot; :
{&quot;AgentVersion&quot; : &quot;23.1.259.0&quot;,
&quot;Code&quot; : &quot;106&quot;,
&quot;EndpointType&quot; : &quot;MicrosoftWindows&quot;,
&quot;HostDomainName&quot;: &quot;my.com&quot;,
&quot;RuleScriptStatus&quot;: &quot;&quot;,
&quot;AuthMethods&quot;: [],
&quot;IdPAuthenticationUserName&quot;: &quot;&quot;,
&quot;ConfigurationID&quot;: &quot;be94d460-c4cb-4827-8f3b-5572727c54e6&quot;, &quot;UACTriggered&quot;: 0 }}
    </Data>
<EventId>106</EventId>
<SentTime>2023-04-03T13:10:36Z</SentTime>
<Version>23.1.0.259</Version></CustomFields></Event></SoftwareInfo></EPOEvent>"
| rex field=_raw "<Data>(?<json_data>(?!<\/Data>)[\s\S]*)<\/Data>" max_match=0 
| eval json_data = replace(json_data,"&quot;","\""), json_data = replace(json_data,"\n","")
|table json_data
| spath input=json_data

 

Screenshot 2023-04-06 at 10.35.38 AM.png

I hope this will help you.

 

Thanks
KV
If any of my replies help you to solve the problem Or gain knowledge, an upvote would be appreciated.

0 Karma

abhisplunk1
Explorer

This is one event in xml format but it has json data in the open and closed xml format like: <Data> examplejson data </Data> I should extract all fields in this event. How do we do that?

 

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

@abhisplunk1 

Can you please try this?

YOUR_SEARCH
| rex field=_raw "<Data>(?<json_data>(?!<\/Data>)[\s\S]*)<\/Data>" max_match=0 
| eval json_data = replace(json_data,"&quot;","\""), json_data = replace(json_data,"\n","")
|table json_data
| spath input=json_data

 

My Sample Search :

| makeresults | eval _raw="<25>1 2023-04-03T13:12:32.0Z AH-1249259-001 EPOEvents - EventFwd [agentInfo@3401 tenantId=\"1\" bpsId=\"1\" tenantGUID=\"{00000000-0000-0000-0000-000000000000}\" tenantNodePath=\"1\2\"]
<?xml version=\"1.0\" encoding=\"utf-8\"?>
<EPOEvent><MachineInfo><AgentGUID>{8396cab6-ec77-11ea-2747-3448edc44e42}</AgentGUID><MachineName>KB89A2AEBECBD</MachineName>
<RawMACAddress>12345</RawMACAddress>
<IPAddress>12345</IPAddress>
<AgentVersion>5.7.5.504</AgentVersion>
<OSName>Windows 10</OSName>
<TimeZoneBias>300</TimeZoneBias>
<UserName>chill</UserName>
</MachineInfo>
<SoftwareInfo ProductName=\"BeyondTrust Privilege Management\" ProductVersion=\"23.1.0.259\" ProductFamily=\"Secure\">
<Event>
<EventID>202256</EventID>
<Severity>0</Severity>
<GMTTime>2023-04-03T13:10:36</GMTTime>
<LocalTime>2023-04-03T08:10:36</LocalTime>
<CustomFields target=\"AvectoReportingEvents\">
     <Data>{&quot;Header&quot; :
{&quot;AgentVersion&quot; : &quot;23.1.259.0&quot;,
&quot;Code&quot; : &quot;106&quot;,
&quot;EndpointType&quot; : &quot;MicrosoftWindows&quot;,
&quot;HostDomainName&quot;: &quot;my.com&quot;,
&quot;RuleScriptStatus&quot;: &quot;&quot;,
&quot;AuthMethods&quot;: [],
&quot;IdPAuthenticationUserName&quot;: &quot;&quot;,
&quot;ConfigurationID&quot;: &quot;be94d460-c4cb-4827-8f3b-5572727c54e6&quot;, &quot;UACTriggered&quot;: 0 }}
    </Data>
<EventId>106</EventId>
<SentTime>2023-04-03T13:10:36Z</SentTime>
<Version>23.1.0.259</Version></CustomFields></Event></SoftwareInfo></EPOEvent>"
| rex field=_raw "<Data>(?<json_data>(?!<\/Data>)[\s\S]*)<\/Data>" max_match=0 
| eval json_data = replace(json_data,"&quot;","\""), json_data = replace(json_data,"\n","")
|table json_data
| spath input=json_data

 

Screenshot 2023-04-05 at 10.33.15 AM.png

 

Thanks
KV
If any of my replies help you to solve the problem Or gain knowledge, an upvote would be appreciated. 

 

 

0 Karma
Get Updates on the Splunk Community!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...

Stay Connected: Your Guide to October Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...