I have a field that consists of data separated from a json data field using this search.
index="test-99" sourcetype="csv" | eval AuditData_keys = json_keys(AuditData) ths works perfectly and creates the field called AuditData_keys
The data in field AuditData_keys in unique based on the values in a field called operations. There are 39 unique values, each with its own unique set of fields. I'm trying to export each value of the operations field into distinct fields per value. My initial idea was to have individual eventtypes for each operations value. The issue I'm having is what is the best way to extract the fields as they contain similar fields as well as additional fields for each operation value.
I came up with this search to create a value for each value in the operations field and its relevant data fields.
index="test-99" sourcetype="csv" | eval AuditData_keys = json_keys(AuditData)| table Operations AuditData_keys | dedup AuditData_keys| outputcsv AuditData_extracted_fields_unique.csv
Here is a sample of one operation value and its fields.
Operation value Values(fields) from the AuditData_key
UserLoginFailed | ["CreationTime","Id","Operation","OrganizationId","RecordType","ResultStatus","UserKey","UserType","Version","Workload","ClientIP","ObjectId","UserId","AzureActiveDirectoryEventType","ExtendedProperties","ModifiedProperties","Actor","ActorContextId","ActorIpAddress","InterSystemsId","IntraSystemId","SupportTicketId","Target","TargetContextId","ApplicationId","DeviceProperties","ErrorNumber","LogonError"] |
Short of manually typing the fields for each operation value and using the strings command, there has to be a more efficient way.
This is o365 audit data extracted with powershell as a csv file that has embedded json data.
Thanks in advance
Robert
Ive created a regex on the regex101 site to detect all of the fields and put them in groups. Here is that regex,
\"(.[^\"]+)
This works and detects all of the fields and puts them in groups named automatically. I'd like to use the extracted data as the fieldname without all of the typing. The regex above doesn't work in Splunk, or at least I can't make it work.
Ideally a repeatable process that does what you said, but the json extractions from the main field, AuditData and subsequent embedded json fields and their data can be performed on a scheduled basis.using an eventtype for each embedded json field
This search breaks it out, but there are also json fields that are nested in the results of the first spath command that also contain fields of data, such as these.
source="AuditLog_2021-03-01_2021-05-07.csv" host="DESKTOP-U34OQHV" index="test-99" sourcetype="csv" | spath input=AuditData
DeviceProperties{}.Name
DeviceProperties{}.Value
ExtendedProperties{}.Name
ExtendedProperties{}.Value
ModifiedProperties{}.Name
ModifiedProperties{}.NewValue
ModifiedProperties{}.OldValue
Target{}.ID Target{}.Type
On a side note, this is a process that will be repeated for quite a few o365 tenants, on a scheduled basis.
Can you not use spath to extract the json fields?
That was unsuccessful at least on my attempts. The majority of the data is csv, except for a couple of fields which are nested json.
Can you separate out the field that has the json in, then spath on that field?
You could break out the collections from the json field, mvexpand to create separate event, then spath the collection field. It is unclear exactly what you are trying to achieve, which makes it difficult to give more specific advice.
Ideally something that is repeatable, with options that are defined in a sourcetype while importing the the downloaded csv. The imported file is downloaded on a scheduled basis.
Ive created a regex that works on the regex101 website, \"(.[^\"]+)
This works and detects all of the fields and puts them in groups named automatically. I'd like to use the extracted data as the fieldname without all of the typing. The regex above doesn't work in Splunk, or at least I can't make it work.