Getting Data In

IndexTimeExtraction - Regex Substitue only on a specific group - sedcmd (SplunkCloud)-props.conf

vn_g
Path Finder
Input Event : [so much data exists in the same single line ] ,"Comments": "New alert", "Data": "{\"etype\":\"MalwareFamily\",\"at\":\"2024-06-21T11:34:07.0000000Z\",\"md\":\"2024-06-21T11:34:07.0000000Z\",\"Investigations\":[{\"$id\":\"1\",\"Id\":\"urn:ZappedUrlInvestigation:2cc87ae3\",\"InvestigationStatus\":\"Running\"}],\"InvestigationIds\":[\"urn:ZappedUrlInvestigation:2cc8782d063\"],\"Intent\":\"Probing\",\"ResourceIdentifiers\":[{\"$id\":\"2\",\"AadTenantId\":\"2dfb29-729c918\",\"Type\":\"AAD\"}],\"AzureResourceId\":null,\"WorkspaceId\":null,\"Metadata\":{\"CustomApps\":null,\"GenericInfo\":null},\"Entities\":[{\"$id\":\"3\",\"MailboxPrimaryAddress\":\"abc@gmail.com\",\"Upn\":\"abc@gmail.com\",\"AadId\":\"6eac3b76357\",\"RiskLevel\":\"None\",\"Type\":\"mailbox\",\"Urn\":\"urn:UserEntity:10338af2b6c\",\"Source\":\"TP\",\"FirstSeen\":\"0001-01-01T00:00:00\"}, \"StartTimeUtc\": \"2024-06-21T10:12:37\", \"Status\": \"Investigation Started\"}","EntityType": "MalwareFamily", [so much data exists in the same single line ]

In a single line, there exists so much data,

  1. I want to substitue(\") with (") only that falls between Data dictionary value, nothing before and nothing after. sample regex : https://regex101.com/r/Gsfaay/1 ( highlighted data only in group 4 should be modified.)
  2. And the Dictionary value is enclosed between quotes(as string) want it to be replaced by []braces as list ( group 3 and 6 )
    Ouptut Required : [so much data exists in the same single line ],"Comments": "New alert", "Data": [{"etype":"MalwareFamily", so on,"Status":"Investigation Started"}],"EntityType": "MalwareFamily", [so much data exists in the same single line ]

     

    Trials : 

    [testing_logs]
    SEDCMD-DataJson = s/\\\"/\"/g s/"Data": "{"/"Data": \[{"/g s/("Data": \[{".*})",/$1],/g
    INDEXED_EXTRACTIONS = json
    KV_MODE = json

    I tried it in the multiple steps as mentioned in my above example, but In splunk sedcmd works on the entire _raw value. I shouldnt apply it globally

    1. regex101.com/r/0g2bcL/1 

    2. regex101.com/r/o3eFgJ/1

     3. regex101.com/r/D7Of0v/1 

    only issue with the first regex, it shouldnt be applied globally on entire event value, it should be applying only between data dictionary value.

Labels (3)
0 Karma

vn_g
Path Finder

In Splunk , sedcmd works on _raw. There is no option to apply it on a specific field.

Temporary solution : When a Field value is passed as string format instead of list in a json file

Search Time extraction :

| rex mode=sed "s/(\"Data\":\s+)\"/\1[/g s/(\"Data\":\s+\[{.*})\"/\1]/g s/\\\\\"/\"/g"
| extract pairdelim="\"{,}" kvdelim=":"

 

Index Time extraction :

SEDCMD-o365DataJsonRemoveBackSlash = s/(\\)+"/"/g s/(\"Data\":\s+)\"/\1[/g s/(\"Data\":\s+\[{.*})\"/\1]/g
0 Karma

vn_g
Path Finder

1. Actual Data looks like below. Data in string format " { } "
Actual json data.png

 

2. From UI using the below worked to some extent. Data string to list [ { } ]
| rex mode=sed "s/(\"Data\":\s+)\"/\1[/g s/(\"Data\":\s+\[{.*})\"/\1]/g s/\\\\\"/\"/g"
Issue now is it is not automatically identifying the key value pairs inside the Data Dictionary, irrespective of the setting kv_mode =json.

working but automatic kv isnot getting detected..png

 

0 Karma

vn_g
Path Finder
| rex mode=sed "s/(\"Data\":\s+)\"/\1[/g s/(\"Data\":\s+\[{.*})\"/\1]/g s/\\\\\"/\"/g"
| extract pairdelim="\"{,}" kvdelim=":"

 Thankyou for your help, the above worked, but I want it to be implemented at index time , not at search time.

0 Karma

vn_g
Path Finder


1. https://regex101.com/r/jPZ4yy/1
2. https://regex101.com/r/PmwS2C/1
3. https://regex101.com/r/SBMRme/1

- first regex, I have provided sample of 3 events, ( EntityValue, Name, Ids, anything in json format comes)
- thrid regex, sed works on _raw but it should work only between Data dictionary value. Example see (\"Comments\": \"New alert\", ) is also changed, nothing else should be formated.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Try something like this

| rex mode=sed "s/(Data\": )\"/\1[/g s/}\"(, \"EntityType)/}]\1]/g s/\\\\\"/\"/g"
0 Karma

vn_g
Path Finder

It is not that you will always have Entity Value next to data. It is random.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

It is unlikely to be random, since it is generated by a system. There is likely to be some pattern to it, but if you do not share that information, it is unlikely that we will be able to guess it, and therefore would be wasting our time attempting to provide a solution until you provide sufficient relevant details.

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...