- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have this kind of log:
Mar 18 02:32:19 MachineName python3[948]: DEBUG:root:... Dispatching: {'id': '<id>', 'type': 'threat-detection', 'entity': 'threat', 'origin': '<redacted>', 'nature': 'system', 'user': 'system', 'timestamp': '2025-03-17T19:32:17.974Z', 'threat': {'id': '<redacted_uuid>', 'maGuid': '<redacted_guid>', 'detectionDate': '2025-03-17T19:32:17.974Z', 'eventType': 'Threat Detection Summary', 'threatType': 'non-pe-file', 'threatAttrs': {'name': '<filename>.ps1', 'path': 'C:\\Powershell\\Report\\<filename>.ps1', 'md5': '<redacted_hash>', 'sha1': '<redacted_hash>', 'sha256': '<redacted_hash>'}, 'interpreterFileAttrs': {'name': 'powershell.exe', 'path': 'C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe', 'md5': '097CE5761C89434367598B34FE32893B', 'sha1': '044A0CF1F6BC478A7172BF207EEF1E201A18BA02', 'sha256': 'BA4038FD20E474C047BE8AAD5BFACDB1BFC1DDBE12F803F473B7918D8D819436'}, 'severity': 's1', 'rank': '100', 'score': '50', 'detectionTags': ['@ATA.Discovery', '@ATA.Execution', '@ATE.T1083', '@ATE.T1059.001', '@MSI._apt_file_psgetfiles', '@ATA.CommandAndControl', '@ATE.T1102.003', '@MSI._process_PS_public_repos', '@MSI._process_ps_getchilditem', '@ATE.T1105', '@ATE.T1071.001', '@MSI._process_pswebrequest_remotecopy', '@ATA.DefenseEvasion', '@ATE.T1112', '@MSI._reg_ep0029_intranet'], 'contentVersion': None}, 'firstDetected': '2025-03-17T19:32:17.974Z', 'lastDetected': '2025-03-17T19:32:17.974Z', 'tenant-id': '<redacted_tenant_id>', 'transaction-id': '<redacted_transaction_id>'}
The "Dispatching" I want it to be a required text, so only log that have this keywork would I apply transforming.
I want to parse the JSON part so I can use its fields, like json_data.threatAttrs.name.
Any suggestions? I tried the regex editor UI, but it broke down since it couldn't differentiate the "name" fields, since the same field name appeared. So I am thinking of using props.conf and transforms.conf, but I don't know how.
Any help would be appreciated!
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
One option would be to :
1 - Get rid of whatever data is being before the valid JSON
For the example you posted, we can ask Splunk to delete this :
Mar 18 02:32:19 MachineName python3[948]: DEBUG:root:... Dispatching:
I'd use this in a props.conf :
SEDCMD-removeheader=s/.*DEBUG:root:\.\.\. Dispatching: //g
2- Replace simple quotes with double quotes.
Still in the props.conf :
SEDCMD-replace_simple_quotes=s/'/"/gs
3- Activate the kv_mode=auto in order to extract the JSON fields:
KV_MODE=json
The props.conf could look like this :
[custom_sourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
category=Custom
pulldown_type=true
SEDCMD-removeheader=s/.*DEBUG:root:\.\.\. Dispatching: //g
SEDCMD-replace_simple_quotes=s/'/"/g
KV_MODE=json
It works in my lab.
Best,
Ch.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Unfortunately, your data is of the "ugly" kind - json content with additional non-json elements. So you cannot use native json parsing.
There is an idea - https://ideas.splunk.com/ideas/EID-I-208 being in a "future prospect" state so we can hope this behaviour will be changed and there will be a possibility to easily manipulate such data. But for now you have more or less three possibilities of handling such data:
1) Strip the non-json part so that the whole of the event you have left is a full well-formed json structure. (kinda what @gargantua suggested). Of course this way you're bound to lose some data
2) Do manual regex-based extractions. That's rarely a good idea to hack with regex at structured data. Usually sooner or later ends with tears
3) Use explicit SPL to parse out the json part to a field and then throw spath at this field so the json is getting parsed. Unfortunately it complicates your search and makes it way worse performancewise since you have to parse all events to find those matching.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
One option would be to :
1 - Get rid of whatever data is being before the valid JSON
For the example you posted, we can ask Splunk to delete this :
Mar 18 02:32:19 MachineName python3[948]: DEBUG:root:... Dispatching:
I'd use this in a props.conf :
SEDCMD-removeheader=s/.*DEBUG:root:\.\.\. Dispatching: //g
2- Replace simple quotes with double quotes.
Still in the props.conf :
SEDCMD-replace_simple_quotes=s/'/"/gs
3- Activate the kv_mode=auto in order to extract the JSON fields:
KV_MODE=json
The props.conf could look like this :
[custom_sourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
category=Custom
pulldown_type=true
SEDCMD-removeheader=s/.*DEBUG:root:\.\.\. Dispatching: //g
SEDCMD-replace_simple_quotes=s/'/"/g
KV_MODE=json
It works in my lab.
Best,
Ch.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oh, unless you can make very strong assumptions about your data, you're in for a treat.
1. You will replace any escaped single quotes which might be in the original data. (and no, doing single backslash negative lookback will not cut it).
2. You will not replace any unescaped double quotes from the original data (and again - finding them and properly escaping is not so easy in general case - see p.1.
Long story short - don't manipulate structured data with regexes!
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
JSON is valid, you can proceed with extracting the required fields in Splunk using spath
| makeresults
| eval _raw="Mar 18 02:32:19 MachineName python3[948]: DEBUG:root:... Dispatching: {'id': '<id>', 'type': 'threat-detection', 'entity': 'threat', 'origin': '<redacted>', 'nature': 'system', 'user': 'system', 'timestamp': '2025-03-17T19:32:17.974Z', 'threat': {'id': '<redacted_uuid>', 'maGuid': '<redacted_guid>', 'detectionDate': '2025-03-17T19:32:17.974Z', 'eventType': 'Threat Detection Summary', 'threatType': 'non-pe-file', 'threatAttrs': {'name': '<filename>.ps1', 'path': 'C:\\Powershell\\Report\\<filename>.ps1', 'md5': '<redacted_hash>', 'sha1': '<redacted_hash>', 'sha256': '<redacted_hash>'}, 'interpreterFileAttrs': {'name': 'powershell.exe', 'path': 'C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe', 'md5': '097CE5761C89434367598B34FE32893B', 'sha1': '044A0CF1F6BC478A7172BF207EEF1E201A18BA02', 'sha256': 'BA4038FD20E474C047BE8AAD5BFACDB1BFC1DDBE12F803F473B7918D8D819436'}, 'severity': 's1', 'rank': '100', 'score': '50', 'detectionTags': ['@ATA.Discovery', '@ATA.Execution'], 'contentVersion': null}, 'firstDetected': '2025-03-17T19:32:17.974Z', 'lastDetected': '2025-03-17T19:32:17.974Z', 'tenant-id': '<redacted_tenant_id>', 'transaction-id': '<redacted_transaction_id>'}"
| rex field=_raw "Dispatching:\s*(?<json_data>{.*})"
| eval json_data = replace(json_data, "'", "\"")
| eval json_data = replace(json_data, "\\\\", "\\\\\\\\")
| spath input=json_data path=threat.threatAttrs.name output=threat_filename
| spath input=json_data path=threat.threatAttrs.path output=threat_filepath
| spath input=json_data path=threat.severity output=threat_severity
| spath input=json_data path=threat.score output=threat_score
| table threat_filename, threat_filepath, threat_severity, threat_score
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @kiran_panchavat ,
Your solution works for extracting the data, but can this be scaled broader by using props.conf and transforms.conf. With this approach, if I want to extract all the fields, I will need the same number of lines in a search, which may work, but looks really long.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, you can definitely use props.conf and transforms.conf to scale this broader and make your field extractions more manageable.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you know how to do that? I just know I can, I don't know how
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Check if json_data is Correctly Extracted
| makeresults
| eval _raw="Mar 18 02:32:19 MachineName python3[948]: DEBUG:root:... Dispatching: {'id': '<id>', 'type': 'threat-detection', 'entity': 'threat', 'origin': '<redacted>', 'nature': 'system', 'user': 'system', 'timestamp': '2025-03-17T19:32:17.974Z', 'threat': {'id': '<redacted_uuid>', 'maGuid': '<redacted_guid>', 'detectionDate': '2025-03-17T19:32:17.974Z', 'eventType': 'Threat Detection Summary', 'threatType': 'non-pe-file', 'threatAttrs': {'name': '<filename>.ps1', 'path': 'C:\\Powershell\\Report\\<filename>.ps1', 'md5': '<redacted_hash>', 'sha1': '<redacted_hash>', 'sha256': '<redacted_hash>'}, 'interpreterFileAttrs': {'name': 'powershell.exe', 'path': 'C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe', 'md5': '097CE5761C89434367598B34FE32893B', 'sha1': '044A0CF1F6BC478A7172BF207EEF1E201A18BA02', 'sha256': 'BA4038FD20E474C047BE8AAD5BFACDB1BFC1DDBE12F803F473B7918D8D819436'}, 'severity': 's1', 'rank': '100', 'score': '50', 'detectionTags': ['@ATA.Discovery', '@ATA.Execution'], 'contentVersion': null}, 'firstDetected': '2025-03-17T19:32:17.974Z', 'lastDetected': '2025-03-17T19:32:17.974Z', 'tenant-id': '<redacted_tenant_id>', 'transaction-id': '<redacted_transaction_id>'}"
| rex field=_raw "Dispatching:\s*(?<json_data>{.*})"
| eval json_data = replace(json_data, "'", "\"")
| eval json_data = replace(json_data, "\\\\", "\\\\\\\\")
| eval json_data = replace(json_data, "'null'", "null")
| table json_data
Output:-
{
"id": "<id>",
"type": "threat-detection",
"entity": "threat",
"origin": "<redacted>",
"nature": "system",
"user": "system",
"timestamp": "2025-03-17T19:32:17.974Z",
"threat": {
"id": "<redacted_uuid>",
"maGuid": "<redacted_guid>",
"detectionDate": "2025-03-17T19:32:17.974Z",
"eventType": "Threat Detection Summary",
"threatType": "non-pe-file",
"threatAttrs": {
"name": "<filename>.ps1",
"path": "C:\\Powershell\\Report\\<filename>.ps1",
"md5": "<redacted_hash>",
"sha1": "<redacted_hash>",
"sha256": "<redacted_hash>"
},
"interpreterFileAttrs": {
"name": "powershell.exe",
"path": "C:\\Windows\\System32\\WindowsPowerShell\u000b1.0\\powershell.exe",
"md5": "097CE5761C89434367598B34FE32893B",
"sha1": "044A0CF1F6BC478A7172BF207EEF1E201A18BA02",
"sha256": "BA4038FD20E474C047BE8AAD5BFACDB1BFC1DDBE12F803F473B7918D8D819436"
},
"severity": "s1",
"rank": "100",
"score": "50",
"detectionTags": [
"@ATA.Discovery",
"@ATA.Execution"
],
"contentVersion": null
},
"firstDetected": "2025-03-17T19:32:17.974Z",
"lastDetected": "2025-03-17T19:32:17.974Z",
"tenant-id": "<redacted_tenant_id>",
"transaction-id": "<redacted_transaction_id>"
}
