Can I use a regex in a static lookup table,I want to filter some alerts that trigger frequently like
Substantial Increase In [AC14-2.1] Prevent modification of system files - Caller MD5=41e25e514d90e9c8bc570484dbaff62b Events Substantial Increase In [AC14-2.1] Prevent modification of system files - Caller MD5=41e25e514d90e9c8bc570484dbaff62b Events
I have already created a lookuptable called
signaturecheck.csv and added all common signature so that it wont fire any signature that is seen in
But for this particular signature,the md5 value is changing frequently, so I want to know whether I can add a regex expression in the
lookuptable to filter it.
Here is another technique to do fuzzy matching on multi-row outputs but without using a temp file, to avoid same-file-use for instances where many searches may call the code at once. We are using versions of this for some RBA enrichment macros. Thanks @japger_splunk for a tip on multireport and @woodcock for the map example!
| makeresults | eval src="220.127.116.11;18.104.22.168;22.214.171.124;126.96.36.199;188.8.131.52" | makemv src delim=";" | mvexpand src `comment(" BEGIN ENRICHMENT BLOCK")` `comment("REMEMBER ROW ORDER AND MARK AS ORIGINAL DATA")` | eval original_row=1 | streamstats count AS marker `comment("FORK THE SEARCH: FIRST TO PRESERVE RESULTS, SECOND TO COMPARE EACH ROW AGAINST LOOKUP")` | multireport [ ] `comment("FOR EACH ROW, RUN FUZZY MATCH AGAINST LOOKUP AND SUMMARIZE THE RESULTS")` [| map maxsearches=99999 search=" | inputlookup notable_cache | eval marker=$marker$, src=$src$ | eval match=if(like(raw,\"%\".src.\"%\"), 1, 0) | where match==1 | eval age_days = (now()-info_search_time)/86400 | eval in_notable_7d=if(age_days<=7,1,0), in_notable_30d=if(age_days<=30,1,0) | stats values(marker) AS marker, sum(in_notable_7d) AS in_notable_7d_count, sum(in_notable_30d) AS in_notable_30d_count BY src "] `comment("INTERLEAVE THE ORIGINAL RESULTS WITH THE LOOKUP MATCH RESULTS")` | sort 0 marker, in_notable_30d_count, in_notable_7d_count `comment("TRANSPOSE DATA FROM ABOVE")` | streamstats current=f window=1 last(in_notable_30d_count) AS prev_in_notable_30d_count, last(in_notable_7d_count) AS prev_in_notable_7d_count `comment("GET RID OF THE LOOKUP RESULTS")` | where original_row==1 `comment("CLEAN UP THE DATA")` | rename prev_in_notable_30d_count AS in_notable_30d_count, prev_in_notable_7d_count AS in_notable_7d_count | fillnull value=0 in_notable_30d_count, in_notable_7d_count | fields - original_row, marker
I really hate to destroy such a wonderful, evil answer as @woodcock's, but I hate the
map command more, and as Yoda should have said,
"Another, there is."
You can filter a set of results by building a single composite regex in a subsearch and feeding it back to the search. Here's a working sample that we ran on one of our systems...
index=foo "someuserid" | regex [| noop | stats count AS host | eval host="hostname1,partialhostname2,I haz spaces,I sed \"I haz spaces\"" | makemv delim="," host | mvexpand host | rex mode=sed field=host "s/ /!?!?!?/g" | format "" "" "" "" "|" "" | rex mode=sed field=search "s/host=\"//g s/\" / /g s/ //g s/!\?!\?!\?/ /g s/(.)$/\1)\"/g" | eval search= "host=\"(".search | fields search]
The subsearch in brackets
[ ] returns a single field,
search, that has the value
host="(hostname1|partialhostname2|I haz spaces|I sed \"I haz spaces\")"
...so after substitution, the full line is...
| regex host="(hostname1|partialhostname2|I haz spaces|I sed \"I haz spaces\")"
... and it operates as expected. We've also verified that spaces and internal quotes work as expected.
There is no reason you couldn't build one for your reasonably sized lookup table, as long as you don't run out of characters.
The above demonstrates a workaround
" " ->
" " that preserves internal spaces while reformatting the output of
It takes a little practice to get the regex right, but you can run the subsearch standalone without brackets as many times as you like until you are satisfied with the result.
No, but you can do it "inside-out" by manually iterating with
map like this (assuming
signaturecheck.csv has a field called
RegEx and the events have a field called
MD5; just replace the
<your search here> part with your actual search):
| inputcsv MyTemporaryFile.csv | search NOT [ <your search here> | streamstats count AS serial | outputcsv MyTemporaryFile.csv | stats count AS Dr0pM3 | append [| inputcsv signaturecheck.csv ] | where isnull(Dr0pM3) | map maxsearches=99999 search=" | inputcsv MyTemporaryFile.csv | eval Dr0pM3 = if(match(MD5, \"$RegEx$\"), \"DROP\", null()) | where isnotnull(Dr0pM3) | fields serial | fields - _* " | stats values(serial) AS serial ]
I tested this monstrosity like this:
First generate a signature file like this:
|noop | stats count AS RegEx | eval RegEx="ab,cd" | makemv delim="," RegEx | mvexpand RegEx | outputcsv signaturecheck.csv
It should look (
|inputcsv signaturecheck.csv ) like this:
RegEx ab cd
Verify that the following search (embedded in next search, which is where your original search will go) generates 10 contrived events to filter:
|noop | stats count AS raw | eval raw="a,b,c,ab,bc,cd,abc,bcd,cde,def" | makemv delim="," raw | mvexpand raw | streamstats count AS _time | eval MD5=raw | rename raw AS _raw
It should look like this:
_raw MD5 _time a a 1969-12-31 18:00:01 b b 1969-12-31 18:00:02 c c 1969-12-31 18:00:03 ab ab 1969-12-31 18:00:04 bc bc 1969-12-31 18:00:05 cd cd 1969-12-31 18:00:06 abc abc 1969-12-31 18:00:07 bcd bcd 1969-12-31 18:00:08 cde cde 1969-12-31 18:00:09 def def 1969-12-31 18:00:10
Lastly, put it together like this:
| inputcsv MyTemporaryFile.csv | search NOT [ |noop | stats count AS raw | eval raw="a,b,c,ab,bc,cd,abc,bcd,cde,def" | makemv delim="," raw | mvexpand raw | streamstats count AS _time | eval MD5=raw | rename raw AS _raw | table _time MD5 _raw | streamstats count AS serial | outputcsv MyTemporaryFile.csv | stats count AS Dr0pM3 | append [| inputcsv signaturecheck.csv ] | where isnull(Dr0pM3) | map maxsearches=99999 search=" | inputcsv MyTemporaryFile.csv | eval Dr0pM3 = if(match(MD5, \"$RegEx$\"), \"DROP\", null()) | where isnotnull(Dr0pM3) | fields serial | fields - _* " | stats values(serial) AS serial ]
When tested, I got these correct results:
MD5 _raw _time serial a a 1969-12-31 18:00:01 1 b b 1969-12-31 18:00:02 2 c c 1969-12-31 18:00:03 3 bc bc 1969-12-31 18:00:05 5 def def 1969-12-31 18:00:10 10
This is the most gruesomely horribad Splunk solution that I have ever crafted but I do not see any other way to do it.
What I especially like is the fact that the first part of the search references a file that DOES NOT EVEN EXIST but because
subsearches always run first, it gets created before that part of the search gets a chance to start. Bwahahahahahahahahahahahahahahahaha!!!!!