Input Event : [so much data exists in the same single line ] ,"Comments": "New alert", "Data": "{\"etype\":\"MalwareFamily\",\"at\":\"2024-06-21T11:34:07.0000000Z\",\"md\":\"2024-06-21T11:34:07.0000000Z\",\"Investigations\":[{\"$id\":\"1\",\"Id\":\"urn:ZappedUrlInvestigation:2cc87ae3\",\"InvestigationStatus\":\"Running\"}],\"InvestigationIds\":[\"urn:ZappedUrlInvestigation:2cc8782d063\"],\"Intent\":\"Probing\",\"ResourceIdentifiers\":[{\"$id\":\"2\",\"AadTenantId\":\"2dfb29-729c918\",\"Type\":\"AAD\"}],\"AzureResourceId\":null,\"WorkspaceId\":null,\"Metadata\":{\"CustomApps\":null,\"GenericInfo\":null},\"Entities\":[{\"$id\":\"3\",\"MailboxPrimaryAddress\":\"abc@gmail.com\",\"Upn\":\"abc@gmail.com\",\"AadId\":\"6eac3b76357\",\"RiskLevel\":\"None\",\"Type\":\"mailbox\",\"Urn\":\"urn:UserEntity:10338af2b6c\",\"Source\":\"TP\",\"FirstSeen\":\"0001-01-01T00:00:00\"}, \"StartTimeUtc\": \"2024-06-21T10:12:37\", \"Status\": \"Investigation Started\"}","EntityType": "MalwareFamily", [so much data exists in the same single line ]
In a single line, there exists so much data,
Ouptut Required : [so much data exists in the same single line ],"Comments": "New alert", "Data": [{"etype":"MalwareFamily", so on,"Status":"Investigation Started"}],"EntityType": "MalwareFamily", [so much data exists in the same single line ]
Trials :
[testing_logs]
SEDCMD-DataJson = s/\\\"/\"/g s/"Data": "{"/"Data": \[{"/g s/("Data": \[{".*})",/$1],/g
INDEXED_EXTRACTIONS = json
KV_MODE = json
I tried it in the multiple steps as mentioned in my above example, but In splunk sedcmd works on the entire _raw value. I shouldnt apply it globally
only issue with the first regex, it shouldnt be applied globally on entire event value, it should be applying only between data dictionary value.
In Splunk , sedcmd works on _raw. There is no option to apply it on a specific field.
Temporary solution : When a Field value is passed as string format instead of list in a json file
Search Time extraction :
| rex mode=sed "s/(\"Data\":\s+)\"/\1[/g s/(\"Data\":\s+\[{.*})\"/\1]/g s/\\\\\"/\"/g"
| extract pairdelim="\"{,}" kvdelim=":"
Index Time extraction :
SEDCMD-o365DataJsonRemoveBackSlash = s/(\\)+"/"/g s/(\"Data\":\s+)\"/\1[/g s/(\"Data\":\s+\[{.*})\"/\1]/g
1. Actual Data looks like below. Data in string format " { } "
2. From UI using the below worked to some extent. Data string to list [ { } ]
| rex mode=sed "s/(\"Data\":\s+)\"/\1[/g s/(\"Data\":\s+\[{.*})\"/\1]/g s/\\\\\"/\"/g"
Issue now is it is not automatically identifying the key value pairs inside the Data Dictionary, irrespective of the setting kv_mode =json.
| rex mode=sed "s/(\"Data\":\s+)\"/\1[/g s/(\"Data\":\s+\[{.*})\"/\1]/g s/\\\\\"/\"/g"
| extract pairdelim="\"{,}" kvdelim=":"
Thankyou for your help, the above worked, but I want it to be implemented at index time , not at search time.
1. https://regex101.com/r/jPZ4yy/1
2. https://regex101.com/r/PmwS2C/1
3. https://regex101.com/r/SBMRme/1
- first regex, I have provided sample of 3 events, ( EntityValue, Name, Ids, anything in json format comes)
- thrid regex, sed works on _raw but it should work only between Data dictionary value. Example see (\"Comments\": \"New alert\", ) is also changed, nothing else should be formated.
Try something like this
| rex mode=sed "s/(Data\": )\"/\1[/g s/}\"(, \"EntityType)/}]\1]/g s/\\\\\"/\"/g"
It is not that you will always have Entity Value next to data. It is random.
It is unlikely to be random, since it is generated by a system. There is likely to be some pattern to it, but if you do not share that information, it is unlikely that we will be able to guess it, and therefore would be wasting our time attempting to provide a solution until you provide sufficient relevant details.