Splunk Search

How to convert nested xml block into json formatted nested block ?

damode1
Path Finder

I have the below sample botsv3 sample data set which is sysmon in xml format. I need to convert that into json formatted events. 

Sample Current XML format - code block :1

 

 

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
    <System>
        <Provider Name='Microsoft-Windows-Sysmon' Guid='{5770385F-C22A-43E0-BF4C-06F5698FFBD9}' />
        <EventID>1</EventID>
        <Version>5</Version>
        <Level>4</Level>
        <Task>1</Task>
        <Opcode>0</Opcode>
        <Keywords>0x8000000000000000</Keywords>
        <TimeCreated SystemTime='2023-05-21T13:35:45.561534700Z' />
        <EventRecordID>36885</EventRecordID>
        <Correlation />
        <Execution ProcessID='3204' ThreadID='5508' />
        <Channel>Microsoft-Windows-Sysmon/Operational</Channel>
        <Computer>BGIST-L.froth.ly</Computer>
        <Security UserID='S-1-5-18' />
    </System>
    <EventData>
        <Data Name='UtcTime'>2023-05-21 15:17:59.931</Data>
        <Data Name='ProcessGuid'>{EBF7A186-1A1C-5B59-0000-0010732E0200}</Data>
        <Data Name='ProcessId'>2684</Data>
        <Data Name='Image'>C:\Windows\System32\svchost.exe</Data>
        <Data Name='FileVersion'>10.0.17134.1 (WinBuild.160101.0800)</Data>
        <Data Name='Description'>Host Process for Windows Services</Data>
        <Data Name='Product'>Microsoft® Windows® Operating System</Data>
        <Data Name='Company'>Microsoft Corporation</Data>
        <Data Name='CommandLine'>c:\windows\system32\svchost.exe -k networkservice -p -s CryptSvc</Data>
        <Data Name='CurrentDirectory'>C:\Windows\system32\</Data>
        <Data Name='User'>NT AUTHORITY\NETWORK SERVICE</Data>
        <Data Name='LogonGuid'>{EBF7A186-1A19-5B59-0000-0020E4030000}</Data>
        <Data Name='LogonId'>0x3e4</Data>
        <Data Name='TerminalSessionId'>0</Data>
        <Data Name='IntegrityLevel'>System</Data>
        <Data Name='Hashes'>
            MD5=32569E403279B3FD2EDB7EBD036273FA,SHA256=C9A28DC8004C3E043CBF8E3A194FDA2B756CE90740DF2175488337281B485F69</Data>
        <Data Name='ParentProcessGuid'>{EBF7A186-1A18-5B59-0000-0010CEA80000}</Data>
        <Data Name='ParentProcessId'>608</Data>
        <Data Name='ParentImage'>C:\Windows\System32\services.exe</Data>
        <Data Name='ParentCommandLine'>C:\Windows\system32\services.exe</Data>
    </EventData>
</Event>"

 

 

 

Expected in JSON format as below. I have only included fields that are actually needed. code block : 2

 

 

{
    "ID": 1,
    "Timestamp": "2023-05-18T05:07:59.940594300Z",
    "EventData": {
        "FileVersion": "10.0.17134.1 (WinBuild.160101.0800)",
        "Company": "Microsoft Corporation",
        "TerminalSessionId": 0,
        "UtcTime": "2018-08-20 15:18:59.929",
        "Product": "Microsoft® Windows® Operating System",
        "LogonId": "0x3e7",
        "Description": "Find String (QGREP) Utility",
        "OriginalFileName": "findstr.exe",
        "Hashes": "MD5=BCC8F29B929DABF5489C9BE6587FF66D,SHA256=40F83CE0B6E1C894AB766591574ABD5B6780028C874410F2EC224300DF443C81",
        "ParentProcessId": "5428",
        "ParentCommandLine": "C:\\Windows\\system32\\cmd.exe /c netstat -nao | findstr /r \"LISTENING\"",
        "ProcessGuid": "{EBF7A186-0B7B-5B59-0000-001044A3CD01}",
        "ProcessId": "6236",
        "Image": "C:\\Windows\\System32\\findstr.exe",
        "User": "NT AUTHORITY\\SYSTEM",
        "LogonGuid": "{EBF7A186-AB15-5B58-0000-0020E7030000}",
        "LogonGuid": "{EBF7A186-AB15-5B58-0000-0020E7030000}",
        "IntegrityLevel": "System",
        "ParentProcessGuid": "{EBF7A186-0B7B-5B59-0000-0010249FCD01}",
        "ParentImage": "C:\\Windows\\System32\\cmd.exe",
        "RuleName": "",
        "CommandLine": "findstr  /r \"LISTENING\"",
        "CurrentDirectory": "C:\\Windows\\system32\\"
    },
    "Hostname": "BGIST-L.froth.ly",
}

 

 

 

The key fields are EventID, Computer and entire EventData block

What I have tried so far

  1.  Used | tojson command but it didn't created the nested EventData block. It just extracted all field-value pairs as individual objects
  2. then tried the below spl but it has fields hardcoded which is not desirable. We want them to be dynamically added to EventData block. The below SPL also led to lot of fields with null values as not all Event IDs had the same fields for obvious reasons. 

 

index=main 
| rename EventID as ID Computer as Hostname eventtype as EventType    
| fillnull value=""  
| eval EventData=json_object("FileVersion", FileVersion, "Company", Company, "TerminalSessionId", TerminalSessionId, "UtcTime", UtcTime, "Product", Product,"LogonId",LogonId,"Description",Description,"Hashes",Hashes,"ParentProcessId",ParentProcessId,"ParentCommandLine",ParentCommandLine,"ProcessGuid",ProcessGuid,"ProcessId",ProcessId,"Image",Image,"User",User,"LogonGuid",LogonGuid,"IntegrityLevel",IntegrityLevel,"ParentProcessGuid",ParentProcessGuid,"ParentImage",ParentImage,"CommandLine",CommandLine,"CurrentDirectory",CurrentDirectory),_raw=json_set_exact(json_object(), "ID", ID, "Hostname", Hostname, "EventData", json_extract(EventData))

 

  •  I tried the below query as well

 

index=botsv3 sourcetype=xmlwineventlog EventID=1
|  spath
| tojson

 

Which gave the following result where Eventdata atleast is still in block but both the fields appear as values in Event.EventData.Data field and values in Event.EventData.Data{@Name}. I need the values in Event.EventData.Data{@Name} field to appear as values of corresponding fields which appear as values in the Event.EventData.Data field and all that as part of EventData nested json block. Basically as shown in code block : 2

Screenshot 2023-05-24 130814.png

 

Any help on this would be highly appreciated!

 

 

Labels (3)
Tags (1)
0 Karma
1 Solution

damode1
Path Finder

After spending hours doing trial and error, I finally was able to craft an SPL that gave me the expected result

 

index=botsv3 sourcetype=xmlwineventlog
| spath
| rename Event.EventData.Data as EventDataData, Event.EventData.Data{@Name} as EventDataName EventID as ID Computer as Hostname TimeCreated as Timestamp
| eval EventData=json_object("EventData",json_object(mvindex(EventDataName,0),mvindex(EventDataData,0)))
| eval count=0
| foreach mode=multivalue EventDataName
    [
| eval EventData=json_set(EventData,"EventData."+mvindex(EventDataName,mvcount(<<ITEM>>)+count),mvindex(EventDataData,mvcount(<<ITEM>>)+count)),count=count+1]
| foreach ID Hostname Timestamp
    [
| eval EventData=json_set(EventData,"<<FIELD>>",<<FIELD>>)]
| table EventData

 

View solution in original post

damode1
Path Finder

After spending hours doing trial and error, I finally was able to craft an SPL that gave me the expected result

 

index=botsv3 sourcetype=xmlwineventlog
| spath
| rename Event.EventData.Data as EventDataData, Event.EventData.Data{@Name} as EventDataName EventID as ID Computer as Hostname TimeCreated as Timestamp
| eval EventData=json_object("EventData",json_object(mvindex(EventDataName,0),mvindex(EventDataData,0)))
| eval count=0
| foreach mode=multivalue EventDataName
    [
| eval EventData=json_set(EventData,"EventData."+mvindex(EventDataName,mvcount(<<ITEM>>)+count),mvindex(EventDataData,mvcount(<<ITEM>>)+count)),count=count+1]
| foreach ID Hostname Timestamp
    [
| eval EventData=json_set(EventData,"<<FIELD>>",<<FIELD>>)]
| table EventData

 

yuanliu
SplunkTrust
SplunkTrust

As I commented in another thread, Splunk doesn't have nested notation.  XML and JSON are different enough that SPL has to use two similar but distinct flattened representations.  A lot of what you ask have to be crafted by hand, and like in your other post,  the key is json_object.  Try this.

| rename _raw as temp ``` use this if you want to preserve raw event ```
| eval _raw = mvzip('Event.EventData.Data{@Name}', 'Event.EventData.Data', "=")
| kv
| eval xml2JSON = json_object("ID", 'Event.System.EventID', "Timestamp", 'Event.System.TimeCreated{@SystemTime}',
  "EventData", json_object("FileVersion", FileVersion, "Company", Company, "TerminalSessionId", TerminalSessionId, "UtcTime", UtcTime, "Product", Product,"LogonId",LogonId,"Description",Description,"Hashes",Hashes,"ParentProcessId",ParentProcessId,"ParentCommandLine",ParentCommandLine,"ProcessGuid",ProcessGuid,"ProcessId",ProcessId,"Image",Image,"User",User,"LogonGuid",LogonGuid,"IntegrityLevel",IntegrityLevel,"ParentProcessGuid",ParentProcessGuid,"ParentImage",ParentImage,"CommandLine",CommandLine,"CurrentDirectory",CurrentDirectory),
  "Hostname", 'Event.System.Computer')

 

0 Karma

damode1
Path Finder

Your query is almost same as what I tried in my second attempt which I mentioned above. As mentioned there, the problem with that/your query is the field names are hardcoded which also leads to fields with null values.

We need SPL that will dynamically format the event based on what fields are available.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

If you are prepared to accept some hard-coding of field names, for the Data json you could do something like this

| rename "Event.EventData.Data{@Name}" as name
| rename "Event.EventData.Data" as value
| eval row = mvrange(0,mvcount(name))
| eval json="{".mvjoin(mvmap(row,"\"".mvindex(name,row)."\":\"".trim(mvindex(value,row))."\""),",")."}"
0 Karma

damode1
Path Finder

Thanks for sharing that, however, those fields still appear as individual json objects. I needed those fields nested in EventData json object

0 Karma
Get Updates on the Splunk Community!

Build Scalable Security While Moving to Cloud - Guide From Clayton Homes

 Clayton Homes faced the increased challenge of strengthening their security posture as they went through ...

Mission Control | Explore the latest release of Splunk Mission Control (2.3)

We’re happy to announce the release of Mission Control 2.3 which includes several new and exciting features ...

Cloud Platform | Migrating your Splunk Cloud deployment to Python 3.7

Python 2.7, the last release of Python 2, reached End of Life back on January 1, 2020. As part of our larger ...