Getting Data In

IndexTimeExtraction - Regex Substitue only on a specific group - sedcmd (SplunkCloud)-props.conf

vn_g
Path Finder
Input Event : [so much data exists in the same single line ] ,"Comments": "New alert", "Data": "{\"etype\":\"MalwareFamily\",\"at\":\"2024-06-21T11:34:07.0000000Z\",\"md\":\"2024-06-21T11:34:07.0000000Z\",\"Investigations\":[{\"$id\":\"1\",\"Id\":\"urn:ZappedUrlInvestigation:2cc87ae3\",\"InvestigationStatus\":\"Running\"}],\"InvestigationIds\":[\"urn:ZappedUrlInvestigation:2cc8782d063\"],\"Intent\":\"Probing\",\"ResourceIdentifiers\":[{\"$id\":\"2\",\"AadTenantId\":\"2dfb29-729c918\",\"Type\":\"AAD\"}],\"AzureResourceId\":null,\"WorkspaceId\":null,\"Metadata\":{\"CustomApps\":null,\"GenericInfo\":null},\"Entities\":[{\"$id\":\"3\",\"MailboxPrimaryAddress\":\"abc@gmail.com\",\"Upn\":\"abc@gmail.com\",\"AadId\":\"6eac3b76357\",\"RiskLevel\":\"None\",\"Type\":\"mailbox\",\"Urn\":\"urn:UserEntity:10338af2b6c\",\"Source\":\"TP\",\"FirstSeen\":\"0001-01-01T00:00:00\"}, \"StartTimeUtc\": \"2024-06-21T10:12:37\", \"Status\": \"Investigation Started\"}","EntityType": "MalwareFamily", [so much data exists in the same single line ]

In a single line, there exists so much data,

  1. I want to substitue(\") with (") only that falls between Data dictionary value, nothing before and nothing after. sample regex : https://regex101.com/r/Gsfaay/1 ( highlighted data only in group 4 should be modified.)
  2. And the Dictionary value is enclosed between quotes(as string) want it to be replaced by []braces as list ( group 3 and 6 )
    Ouptut Required : [so much data exists in the same single line ],"Comments": "New alert", "Data": [{"etype":"MalwareFamily", so on,"Status":"Investigation Started"}],"EntityType": "MalwareFamily", [so much data exists in the same single line ]

     

    Trials : 

    [testing_logs]
    SEDCMD-DataJson = s/\\\"/\"/g s/"Data": "{"/"Data": \[{"/g s/("Data": \[{".*})",/$1],/g
    INDEXED_EXTRACTIONS = json
    KV_MODE = json

    I tried it in the multiple steps as mentioned in my above example, but In splunk sedcmd works on the entire _raw value. I shouldnt apply it globally

    1. regex101.com/r/0g2bcL/1 

    2. regex101.com/r/o3eFgJ/1

     3. regex101.com/r/D7Of0v/1 

    only issue with the first regex, it shouldnt be applied globally on entire event value, it should be applying only between data dictionary value.

Labels (3)
0 Karma

vn_g
Path Finder

In Splunk , sedcmd works on _raw. There is no option to apply it on a specific field.

Temporary solution : When a Field value is passed as string format instead of list in a json file

Search Time extraction :

| rex mode=sed "s/(\"Data\":\s+)\"/\1[/g s/(\"Data\":\s+\[{.*})\"/\1]/g s/\\\\\"/\"/g"
| extract pairdelim="\"{,}" kvdelim=":"

 

Index Time extraction :

SEDCMD-o365DataJsonRemoveBackSlash = s/(\\)+"/"/g s/(\"Data\":\s+)\"/\1[/g s/(\"Data\":\s+\[{.*})\"/\1]/g
0 Karma

vn_g
Path Finder

1. Actual Data looks like below. Data in string format " { } "
Actual json data.png

 

2. From UI using the below worked to some extent. Data string to list [ { } ]
| rex mode=sed "s/(\"Data\":\s+)\"/\1[/g s/(\"Data\":\s+\[{.*})\"/\1]/g s/\\\\\"/\"/g"
Issue now is it is not automatically identifying the key value pairs inside the Data Dictionary, irrespective of the setting kv_mode =json.

working but automatic kv isnot getting detected..png

 

0 Karma

vn_g
Path Finder
| rex mode=sed "s/(\"Data\":\s+)\"/\1[/g s/(\"Data\":\s+\[{.*})\"/\1]/g s/\\\\\"/\"/g"
| extract pairdelim="\"{,}" kvdelim=":"

 Thankyou for your help, the above worked, but I want it to be implemented at index time , not at search time.

0 Karma

vn_g
Path Finder


1. https://regex101.com/r/jPZ4yy/1
2. https://regex101.com/r/PmwS2C/1
3. https://regex101.com/r/SBMRme/1

- first regex, I have provided sample of 3 events, ( EntityValue, Name, Ids, anything in json format comes)
- thrid regex, sed works on _raw but it should work only between Data dictionary value. Example see (\"Comments\": \"New alert\", ) is also changed, nothing else should be formated.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Try something like this

| rex mode=sed "s/(Data\": )\"/\1[/g s/}\"(, \"EntityType)/}]\1]/g s/\\\\\"/\"/g"
0 Karma

vn_g
Path Finder

It is not that you will always have Entity Value next to data. It is random.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

It is unlikely to be random, since it is generated by a system. There is likely to be some pattern to it, but if you do not share that information, it is unlikely that we will be able to guess it, and therefore would be wasting our time attempting to provide a solution until you provide sufficient relevant details.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...