I have the below sample botsv3 sample data set which is sysmon in xml format. I need to convert that into json formatted events.
Sample Current XML format - code block :1
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
<System>
<Provider Name='Microsoft-Windows-Sysmon' Guid='{5770385F-C22A-43E0-BF4C-06F5698FFBD9}' />
<EventID>1</EventID>
<Version>5</Version>
<Level>4</Level>
<Task>1</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime='2023-05-21T13:35:45.561534700Z' />
<EventRecordID>36885</EventRecordID>
<Correlation />
<Execution ProcessID='3204' ThreadID='5508' />
<Channel>Microsoft-Windows-Sysmon/Operational</Channel>
<Computer>BGIST-L.froth.ly</Computer>
<Security UserID='S-1-5-18' />
</System>
<EventData>
<Data Name='UtcTime'>2023-05-21 15:17:59.931</Data>
<Data Name='ProcessGuid'>{EBF7A186-1A1C-5B59-0000-0010732E0200}</Data>
<Data Name='ProcessId'>2684</Data>
<Data Name='Image'>C:\Windows\System32\svchost.exe</Data>
<Data Name='FileVersion'>10.0.17134.1 (WinBuild.160101.0800)</Data>
<Data Name='Description'>Host Process for Windows Services</Data>
<Data Name='Product'>Microsoft® Windows® Operating System</Data>
<Data Name='Company'>Microsoft Corporation</Data>
<Data Name='CommandLine'>c:\windows\system32\svchost.exe -k networkservice -p -s CryptSvc</Data>
<Data Name='CurrentDirectory'>C:\Windows\system32\</Data>
<Data Name='User'>NT AUTHORITY\NETWORK SERVICE</Data>
<Data Name='LogonGuid'>{EBF7A186-1A19-5B59-0000-0020E4030000}</Data>
<Data Name='LogonId'>0x3e4</Data>
<Data Name='TerminalSessionId'>0</Data>
<Data Name='IntegrityLevel'>System</Data>
<Data Name='Hashes'>
MD5=32569E403279B3FD2EDB7EBD036273FA,SHA256=C9A28DC8004C3E043CBF8E3A194FDA2B756CE90740DF2175488337281B485F69</Data>
<Data Name='ParentProcessGuid'>{EBF7A186-1A18-5B59-0000-0010CEA80000}</Data>
<Data Name='ParentProcessId'>608</Data>
<Data Name='ParentImage'>C:\Windows\System32\services.exe</Data>
<Data Name='ParentCommandLine'>C:\Windows\system32\services.exe</Data>
</EventData>
</Event>"
Expected in JSON format as below. I have only included fields that are actually needed. code block : 2
{
"ID": 1,
"Timestamp": "2023-05-18T05:07:59.940594300Z",
"EventData": {
"FileVersion": "10.0.17134.1 (WinBuild.160101.0800)",
"Company": "Microsoft Corporation",
"TerminalSessionId": 0,
"UtcTime": "2018-08-20 15:18:59.929",
"Product": "Microsoft® Windows® Operating System",
"LogonId": "0x3e7",
"Description": "Find String (QGREP) Utility",
"OriginalFileName": "findstr.exe",
"Hashes": "MD5=BCC8F29B929DABF5489C9BE6587FF66D,SHA256=40F83CE0B6E1C894AB766591574ABD5B6780028C874410F2EC224300DF443C81",
"ParentProcessId": "5428",
"ParentCommandLine": "C:\\Windows\\system32\\cmd.exe /c netstat -nao | findstr /r \"LISTENING\"",
"ProcessGuid": "{EBF7A186-0B7B-5B59-0000-001044A3CD01}",
"ProcessId": "6236",
"Image": "C:\\Windows\\System32\\findstr.exe",
"User": "NT AUTHORITY\\SYSTEM",
"LogonGuid": "{EBF7A186-AB15-5B58-0000-0020E7030000}",
"LogonGuid": "{EBF7A186-AB15-5B58-0000-0020E7030000}",
"IntegrityLevel": "System",
"ParentProcessGuid": "{EBF7A186-0B7B-5B59-0000-0010249FCD01}",
"ParentImage": "C:\\Windows\\System32\\cmd.exe",
"RuleName": "",
"CommandLine": "findstr /r \"LISTENING\"",
"CurrentDirectory": "C:\\Windows\\system32\\"
},
"Hostname": "BGIST-L.froth.ly",
}
The key fields are EventID, Computer and entire EventData block
What I have tried so far
index=main
| rename EventID as ID Computer as Hostname eventtype as EventType
| fillnull value=""
| eval EventData=json_object("FileVersion", FileVersion, "Company", Company, "TerminalSessionId", TerminalSessionId, "UtcTime", UtcTime, "Product", Product,"LogonId",LogonId,"Description",Description,"Hashes",Hashes,"ParentProcessId",ParentProcessId,"ParentCommandLine",ParentCommandLine,"ProcessGuid",ProcessGuid,"ProcessId",ProcessId,"Image",Image,"User",User,"LogonGuid",LogonGuid,"IntegrityLevel",IntegrityLevel,"ParentProcessGuid",ParentProcessGuid,"ParentImage",ParentImage,"CommandLine",CommandLine,"CurrentDirectory",CurrentDirectory),_raw=json_set_exact(json_object(), "ID", ID, "Hostname", Hostname, "EventData", json_extract(EventData))
index=botsv3 sourcetype=xmlwineventlog EventID=1
| spath
| tojson
Which gave the following result where Eventdata atleast is still in block but both the fields appear as values in Event.EventData.Data field and values in Event.EventData.Data{@Name}. I need the values in Event.EventData.Data{@Name} field to appear as values of corresponding fields which appear as values in the Event.EventData.Data field and all that as part of EventData nested json block. Basically as shown in code block : 2
Any help on this would be highly appreciated!
After spending hours doing trial and error, I finally was able to craft an SPL that gave me the expected result
index=botsv3 sourcetype=xmlwineventlog
| spath
| rename Event.EventData.Data as EventDataData, Event.EventData.Data{@Name} as EventDataName EventID as ID Computer as Hostname TimeCreated as Timestamp
| eval EventData=json_object("EventData",json_object(mvindex(EventDataName,0),mvindex(EventDataData,0)))
| eval count=0
| foreach mode=multivalue EventDataName
[
| eval EventData=json_set(EventData,"EventData."+mvindex(EventDataName,mvcount(<<ITEM>>)+count),mvindex(EventDataData,mvcount(<<ITEM>>)+count)),count=count+1]
| foreach ID Hostname Timestamp
[
| eval EventData=json_set(EventData,"<<FIELD>>",<<FIELD>>)]
| table EventData
After spending hours doing trial and error, I finally was able to craft an SPL that gave me the expected result
index=botsv3 sourcetype=xmlwineventlog
| spath
| rename Event.EventData.Data as EventDataData, Event.EventData.Data{@Name} as EventDataName EventID as ID Computer as Hostname TimeCreated as Timestamp
| eval EventData=json_object("EventData",json_object(mvindex(EventDataName,0),mvindex(EventDataData,0)))
| eval count=0
| foreach mode=multivalue EventDataName
[
| eval EventData=json_set(EventData,"EventData."+mvindex(EventDataName,mvcount(<<ITEM>>)+count),mvindex(EventDataData,mvcount(<<ITEM>>)+count)),count=count+1]
| foreach ID Hostname Timestamp
[
| eval EventData=json_set(EventData,"<<FIELD>>",<<FIELD>>)]
| table EventData
As I commented in another thread, Splunk doesn't have nested notation. XML and JSON are different enough that SPL has to use two similar but distinct flattened representations. A lot of what you ask have to be crafted by hand, and like in your other post, the key is json_object. Try this.
| rename _raw as temp ``` use this if you want to preserve raw event ```
| eval _raw = mvzip('Event.EventData.Data{@Name}', 'Event.EventData.Data', "=")
| kv
| eval xml2JSON = json_object("ID", 'Event.System.EventID', "Timestamp", 'Event.System.TimeCreated{@SystemTime}',
"EventData", json_object("FileVersion", FileVersion, "Company", Company, "TerminalSessionId", TerminalSessionId, "UtcTime", UtcTime, "Product", Product,"LogonId",LogonId,"Description",Description,"Hashes",Hashes,"ParentProcessId",ParentProcessId,"ParentCommandLine",ParentCommandLine,"ProcessGuid",ProcessGuid,"ProcessId",ProcessId,"Image",Image,"User",User,"LogonGuid",LogonGuid,"IntegrityLevel",IntegrityLevel,"ParentProcessGuid",ParentProcessGuid,"ParentImage",ParentImage,"CommandLine",CommandLine,"CurrentDirectory",CurrentDirectory),
"Hostname", 'Event.System.Computer')
Your query is almost same as what I tried in my second attempt which I mentioned above. As mentioned there, the problem with that/your query is the field names are hardcoded which also leads to fields with null values.
We need SPL that will dynamically format the event based on what fields are available.
If you are prepared to accept some hard-coding of field names, for the Data json you could do something like this
| rename "Event.EventData.Data{@Name}" as name
| rename "Event.EventData.Data" as value
| eval row = mvrange(0,mvcount(name))
| eval json="{".mvjoin(mvmap(row,"\"".mvindex(name,row)."\":\"".trim(mvindex(value,row))."\""),",")."}"
Thanks for sharing that, however, those fields still appear as individual json objects. I needed those fields nested in EventData json object