I have a search for which I need to tune out a large number of values (about 25) in a proctitle command field. Currently using:
NOT proctitle IN ("*<proc1>*", "*<proc2>*", ......., "*<proc25>*")
I'm worried about performance on the search head and am looking for ways to lower the CPU and memory burden.
I have two possible solutions:
1) Create a data model and place this search as a constraint.
2) Tag events on ingest with proctitle IN ("*<proc1>*", "*<proc2>*", ......., "*<proc25>*") and use this tag as a constraint in the data model.
I've played with #1. Is #2 possible, and is there a more efficient way to do this?
Thanks in advance.
This worked.......I was able to develop a data model that included the following as a constraint:
NOT (TERM(proc1) OR TERM(proc2) OR ...........OR TERM(procn))
Thanks,
Tom
This will partly depend on what proportion of the total data you are looking to exclude. If the excluded proctitles are a significant proportion of the data, then using a post process where or regex clause may not perform so well, but you will have to play with that.
Setting tags will still involve a search time extraction to evaluate the tag, so under the hood the search is being done.
You might want to look at the TERM directive - see this link
https://conf.splunk.com/files/2020/slides/PLA1089C.pdf
You will need to understand what constitutes a TERM in your data and whether that will work for your use case, but that can significantly improve performance.
When you are looking at this type of performance issue, go look at the job properties in the job inspector - look at scan count values - the more you scan, the more data you are having to check.
You could go down the indexed extraction route where you set a field at index time, but that is somewhat static and if you need to exclude a new proctitle, then that won't help, but it will improve search performance at the cost of index performance and disk space.
Double wild-carded strings are not very efficient. Could you perhaps extract the "proc" values into a field and then use a where command to exclude to events with the undesired values?
Hhmmm......here's my dilemma. My field called proctitle has the entire command in it. One example is where I have proctitle="/bin/chmod 440 /etc/sudoers" and I want to exclude the chmod term. I have 32 such terms I need to exclude.
I'll share with you that I am attempting to develop a Linux auditd detection for Account Manipulation per the Mitre Attack Framework https://attack.mitre.org/techniques/T1098/. This search will look for attempts to modify the sshd_config, passwd, groups, shadow and sudoer file. In examining existing data, I have determined there are legitimate processes (the 32 terms mentioned) in the proctitle field for the event data that will trigger this alert. (It was a tedious effort, but I traced through the parent process IDs to come justify this list.) If I eliminate these 32, my noise is 99% filtered out.
Most of my terms are bounded by major breaks. The example I used is not, but if I use /bin/chmod instead of chmod, it would work. Let me try this and report back.
This worked.......I was able to develop a data model that included the following as a constraint:
NOT (TERM(proc1) OR TERM(proc2) OR ...........OR TERM(procn))
Thanks,
Tom