Hi,
Can I use a regex in a static lookup table,I want to filter some alerts that trigger frequently like
Substantial Increase In [AC14-2.1] Prevent modification of system files - Caller MD5=41e25e514d90e9c8bc570484dbaff62b Events
Substantial Increase In [AC14-2.1] Prevent modification of system files - Caller MD5=41e25e514d90e9c8bc570484dbaff62b Events
I have already created a lookuptable called signaturecheck.csv
and added all common signature so that it wont fire any signature that is seen in signaturecheck.csv
.
But for this particular signature,the md5 value is changing frequently, so I want to know whether I can add a regex expression in the lookuptable
to filter it.
Regards,
Here is another technique to do fuzzy matching on multi-row outputs but without using a temp file, to avoid same-file-use for instances where many searches may call the code at once. We are using versions of this for some RBA enrichment macros. Thanks @japger_splunk for a tip on multireport and @woodcock for the map example!
| makeresults
| eval src="1.1.1.1;2.2.2.2;3.3.3.3;4.4.4.4;5.5.5.5"
| makemv src delim=";"
| mvexpand src `comment("
BEGIN ENRICHMENT BLOCK")`
`comment("REMEMBER ROW ORDER AND MARK AS ORIGINAL DATA")`
| eval original_row=1
| streamstats count AS marker
`comment("FORK THE SEARCH: FIRST TO PRESERVE RESULTS, SECOND TO COMPARE EACH ROW AGAINST LOOKUP")`
| multireport
[ ]
`comment("FOR EACH ROW, RUN FUZZY MATCH AGAINST LOOKUP AND SUMMARIZE THE RESULTS")`
[| map maxsearches=99999 search="
| inputlookup notable_cache
| eval marker=$marker$, src=$src$
| eval match=if(like(raw,\"%\".src.\"%\"), 1, 0)
| where match==1
| eval age_days = (now()-info_search_time)/86400
| eval in_notable_7d=if(age_days<=7,1,0), in_notable_30d=if(age_days<=30,1,0)
| stats values(marker) AS marker, sum(in_notable_7d) AS in_notable_7d_count, sum(in_notable_30d) AS in_notable_30d_count BY src
"]
`comment("INTERLEAVE THE ORIGINAL RESULTS WITH THE LOOKUP MATCH RESULTS")`
| sort 0 marker, in_notable_30d_count, in_notable_7d_count
`comment("TRANSPOSE DATA FROM ABOVE")`
| streamstats current=f window=1 last(in_notable_30d_count) AS prev_in_notable_30d_count, last(in_notable_7d_count) AS prev_in_notable_7d_count
`comment("GET RID OF THE LOOKUP RESULTS")`
| where original_row==1
`comment("CLEAN UP THE DATA")`
| rename prev_in_notable_30d_count AS in_notable_30d_count, prev_in_notable_7d_count AS in_notable_7d_count
| fillnull value=0 in_notable_30d_count, in_notable_7d_count
| fields - original_row, marker
I really hate to destroy such a wonderful, evil answer as @woodcock's, but I hate the map
command more, and as Yoda should have said,
"Another, there is."
You can filter a set of results by building a single composite regex in a subsearch and feeding it back to the search. Here's a working sample that we ran on one of our systems...
index=foo "someuserid"
| regex
[| noop
| stats count AS host
| eval host="hostname1,partialhostname2,I haz spaces,I sed \"I haz spaces\""
| makemv delim="," host
| mvexpand host
| rex mode=sed field=host "s/ /!?!?!?/g"
| format "" "" "" "" "|" ""
| rex mode=sed field=search "s/host=\"//g s/\" / /g s/ //g s/!\?!\?!\?/ /g s/(.)$/\1)\"/g"
| eval search= "host=\"(".search
| fields search]
The subsearch in brackets [ ]
returns a single field, search
, that has the value
host="(hostname1|partialhostname2|I haz spaces|I sed \"I haz spaces\")"
...so after substitution, the full line is...
| regex host="(hostname1|partialhostname2|I haz spaces|I sed \"I haz spaces\")"
... and it operates as expected. We've also verified that spaces and internal quotes work as expected.
There is no reason you couldn't build one for your reasonably sized lookup table, as long as you don't run out of characters.
The above demonstrates a workaround " "
-> "!?!?!?"
-> " "
that preserves internal spaces while reformatting the output of format
.
It takes a little practice to get the regex right, but you can run the subsearch standalone without brackets as many times as you like until you are satisfied with the result.
Unfair. Almost every map
solution can be turned inside-out as a subsearch
solution (and vice-versa) Each method has VERY different pros/cons.
No, but you can do it "inside-out" by manually iterating with map
like this (assuming signaturecheck.csv
has a field called RegEx
and the events have a field called MD5
; just replace the <your search here>
part with your actual search):
| inputcsv MyTemporaryFile.csv
| search NOT [ <your search here> | streamstats count AS serial | outputcsv MyTemporaryFile.csv
| stats count AS Dr0pM3
| append [| inputcsv signaturecheck.csv ]
| where isnull(Dr0pM3)
| map maxsearches=99999 search="
| inputcsv MyTemporaryFile.csv
| eval Dr0pM3 = if(match(MD5, \"$RegEx$\"), \"DROP\", null())
| where isnotnull(Dr0pM3) | fields serial | fields - _*
" | stats values(serial) AS serial ]
I tested this monstrosity like this:
First generate a signature file like this:
|noop | stats count AS RegEx | eval RegEx="ab,cd" | makemv delim="," RegEx | mvexpand RegEx | outputcsv signaturecheck.csv
It should look ( |inputcsv signaturecheck.csv
) like this:
RegEx
ab
cd
Verify that the following search (embedded in next search, which is where your original search will go) generates 10 contrived events to filter:
|noop | stats count AS raw | eval raw="a,b,c,ab,bc,cd,abc,bcd,cde,def" | makemv delim="," raw | mvexpand raw | streamstats count AS _time | eval MD5=raw | rename raw AS _raw
It should look like this:
_raw MD5 _time
a a 1969-12-31 18:00:01
b b 1969-12-31 18:00:02
c c 1969-12-31 18:00:03
ab ab 1969-12-31 18:00:04
bc bc 1969-12-31 18:00:05
cd cd 1969-12-31 18:00:06
abc abc 1969-12-31 18:00:07
bcd bcd 1969-12-31 18:00:08
cde cde 1969-12-31 18:00:09
def def 1969-12-31 18:00:10
Lastly, put it together like this:
| inputcsv MyTemporaryFile.csv
| search NOT [ |noop | stats count AS raw | eval raw="a,b,c,ab,bc,cd,abc,bcd,cde,def" | makemv delim="," raw | mvexpand raw | streamstats count AS _time | eval MD5=raw | rename raw AS _raw | table _time MD5 _raw | streamstats count AS serial | outputcsv MyTemporaryFile.csv
| stats count AS Dr0pM3
| append [| inputcsv signaturecheck.csv ]
| where isnull(Dr0pM3)
| map maxsearches=99999 search="
| inputcsv MyTemporaryFile.csv
| eval Dr0pM3 = if(match(MD5, \"$RegEx$\"), \"DROP\", null())
| where isnotnull(Dr0pM3) | fields serial | fields - _*
" | stats values(serial) AS serial ]
When tested, I got these correct results:
MD5 _raw _time serial
a a 1969-12-31 18:00:01 1
b b 1969-12-31 18:00:02 2
c c 1969-12-31 18:00:03 3
bc bc 1969-12-31 18:00:05 5
def def 1969-12-31 18:00:10 10
This is the most gruesomely horribad Splunk solution that I have ever crafted but I do not see any other way to do it.
What I especially like is the fact that the first part of the search references a file that DOES NOT EVEN EXIST but because subsearches
always run first, it gets created before that part of the search gets a chance to start. Bwahahahahahahahahahahahahahahahaha!!!!!
Upvote for self-commentary, "horribad" and "Bwahahahahahahahahahahahahahahahaha!!!!!"
Really ingenious and evil and mad, mad I tell you... but for SCIENCE!!!!!!
Also, this would be UNNECESSARY if Splunk would enhance match_type
to support a REGEX
option.
P.S. Be careful about the maxsearches
value; it could bite you and may need monitoring/adjustment. For some reason, 0
does not mean unlimited
like most options do.
Also, this might be better writing into KVStore but I haven't learned how to do that yet so it is CSV all the way.