Splunk Search

Applying multiple regex to different fields (match_type REGEX for lookups)?

wiederkehrc
Explorer
Hi,
we have a Data Model based search that we filter based on a lookup (with match_type WILDCARD) that matches different fields
| tstats count, values(Processes.dest) as dest, dc(Processes.dest) as dest_dc, min(_time) as earliest, max(_time) as latest, values(Processes.user) as user, dc(Processes.user) as user_dc from datamodel=Endpoint.Processes by Processes.process_guid Processes.parent_process_guid Processes.parent_process Processes.parent_process_path Processes.process Processes.process_path Processes.process_hash Processes.user 
| rex field=Processes.process_hash "MD5=(?<process_md5>[A-Z0-9]+)" 
| `drop_dm_object_name(Processes)` 
| lookup sysmon_rules parent_process parent_process_path process process_path process_md5 OUTPUT description score
This works well and saves uf from having multiple searches in place, but it would be great if there was something like a match_type REGEX for lookups.  We could then combine several entries in the lookup to one single line.
For example those 4 lines:
score,description,parent_process_path,parent_process,process_path,process,process_md5
80,Office: Execution MSHTA,C:\Program Files (x86)\Microsoft Office\root\Office*,*,*\mshta.exe,*,*
80,Office: Execution PWSH,C:\Program Files (x86)\Microsoft Office\root\Office*,*,*\powershell.exe,*,*
80,Office: Execution WSCRIPT,C:\Program Files (x86)\Microsoft Office\root\Office*,*,*\wscript.exe,*,*
80,Office: Execution CMD,C:\Program Files (x86)\Microsoft Office\root\Office*,*,*\cmd.exe,*,*
could be combined to:
score,description,parent_process_path,parent_process,process_path,process,process_md5
80,Office: Execution susp child,(?i)C:\Program Files (x86)\Microsoft Office\root\Office.*,.*,(cmd.exe|wscript.exe|powershell.exe|mshta.exe),.*,.*
We want to keep the possibility to match against multiple fields. Is there a trick (using inputlookup, map, ...) to optimize this?We're at a point where the lookup is getting cluttered because of small variations of processes in the endpoint data model we would like to alert on.
 
Hints, tips & help are appreciated.
Chris
Labels (2)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

Have you considered data normalization?  Instead of one lookup, use orthogonal lookups.

sysmon_rules

 

scoredescriptionparent_process_pathparent_processpath_matchprocessprocess_md5
80Office: Execution susp child 1C:\Program Files (x86)\Microsoft Office\root\Office**susp 1**

susp_child

process_pathsusp_child
*\mshta.exesusp 1
*\powershell.exesusp 1
*\wscript.exesusp 1
*\cmd.exesusp 1
*\someother.exesusp 2

 

Then, use this search

 

| tstats count, values(Processes.dest) as dest, dc(Processes.dest) as dest_dc,
 min(_time) as earliest, max(_time) as latest, values(Processes.user) as user,
 dc(Processes.user) as user_dc from datamodel=Endpoint.Processes
 by Processes.process_guid Processes.parent_process_guid Processes.parent_process
 Processes.parent_process_path Processes.process Processes.process_path
 Processes.process_hash Processes.user 
| rex field=Processes.process_hash "MD5=(?<process_md5>[A-Z0-9]+)" 
| `drop_dm_object_name(Processes)`
| lookup susp_child process_path output susp_child
| lookup sysmon_rules parent_process parent_process_path process susp_child process_md5 OUTPUT description score

 

 

wiederkehrc
Explorer

Thank you for your suggestion. This solution works from a technical point of view and reduces some redundant information in the lookup.  The consequence ist, that analysts who maintain and modify the lookups will need to manage multiple lookups which adds a layer of complexity to the search. 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

I thought analysts were such data nerds they could normalize a database with hands tied in the back😉

But OK if they are not.  All you need to do is to design a language that they can work with, then, produce the actual lookup programmatically.

For example,  they can use "|" to represent logical "OR" like in many programming languages, and input the following rule:

descriptionparent_processparent_process_pathprocessprocess_md5process_pathscore
Office: Execution susp child (cmd)*C:\Program Files (x86)\Microsoft Office\root\Office***cmd.exe|wscript.exe|powershell.exe|mshta.exe80

Save this to analyst_rule.csv.  Then run the following:

 

| inputcsv analyst_rule.csv
| eval process_path = split(process_path, "|")
| mvexpand process_path
| eval description = description . " (" . replace(process_path, "\..+", "") . ")" ``` this is perhaps unnecessary ```
| outputlookup real_rule

 

The real_rule table will look like

descriptionparent_processparent_process_pathprocessprocess_md5process_pathscore
Office: Execution susp child (cmd)*C:\Program Files (x86)\Microsoft Office\root\Office***cmd.exe80
Office: Execution susp child (wscript)*C:\Program Files (x86)\Microsoft Office\root\Office***wscript.exe80
Office: Execution susp child (powershell)*C:\Program Files (x86)\Microsoft Office\root\Office***powershell.exe80

Office: Execution susp child (mshta)

*C:\Program Files (x86)\Microsoft Office\root\Office***mshta.exe80

This is just one of possible solutions to improve usability.

0 Karma
Get Updates on the Splunk Community!

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...

Explore the Latest Educational Offerings from Splunk [January 2025 Updates]

At Splunk Education, we are committed to providing a robust learning experience for all users, regardless of ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...