topic Re: Regex in lookuptable in Splunk Search

Regex in lookuptable

benmon — Wed, 30 Mar 2016 06:18:09 GMT

Hi,

Can I use a regex in a static lookup table,I want to filter some alerts that trigger frequently like

Substantial Increase In [AC14-2.1] Prevent modification of system files - Caller MD5=41e25e514d90e9c8bc570484dbaff62b Events
Substantial Increase In [AC14-2.1] Prevent modification of system files - Caller MD5=41e25e514d90e9c8bc570484dbaff62b Events

I have already created a lookuptable called signaturecheck.csv and added all common signature so that it wont fire any signature that is seen in signaturecheck.csv.
But for this particular signature,the md5 value is changing frequently, so I want to know whether I can add a regex expression in the lookuptable to filter it.

Regards,

Re: Regex in lookuptable

woodcock — Sun, 03 Apr 2016 01:08:48 GMT

No, but you can do it "inside-out" by manually iterating with map like this (assuming signaturecheck.csv has a field called RegEx and the events have a field called MD5; just replace the <your search here> part with your actual search):

| inputcsv MyTemporaryFile.csv
| search NOT [ <your search here> | streamstats count AS serial | outputcsv MyTemporaryFile.csv
| stats count AS Dr0pM3
| append [| inputcsv signaturecheck.csv ]
| where isnull(Dr0pM3)
| map maxsearches=99999 search="
   | inputcsv MyTemporaryFile.csv
   | eval Dr0pM3 = if(match(MD5, \"$RegEx$\"), \"DROP\", null())
   | where isnotnull(Dr0pM3) | fields serial | fields - _*
" | stats values(serial) AS serial ]

I tested this monstrosity like this:

First generate a signature file like this:

|noop | stats count AS RegEx | eval RegEx="ab,cd" | makemv delim="," RegEx | mvexpand RegEx | outputcsv signaturecheck.csv

It should look ( |inputcsv signaturecheck.csv ) like this:

RegEx
ab
cd

Verify that the following search (embedded in next search, which is where your original search will go) generates 10 contrived events to filter:

|noop | stats count AS raw | eval raw="a,b,c,ab,bc,cd,abc,bcd,cde,def" | makemv delim="," raw | mvexpand raw | streamstats count AS _time | eval MD5=raw | rename raw AS _raw

It should look like this:

_raw    MD5                   _time
   a      a     1969-12-31 18:00:01
   b      b     1969-12-31 18:00:02
   c      c     1969-12-31 18:00:03
  ab     ab     1969-12-31 18:00:04
  bc     bc     1969-12-31 18:00:05
  cd     cd     1969-12-31 18:00:06
 abc    abc     1969-12-31 18:00:07
 bcd    bcd     1969-12-31 18:00:08
 cde    cde     1969-12-31 18:00:09
 def    def     1969-12-31 18:00:10

Lastly, put it together like this:

| inputcsv MyTemporaryFile.csv
| search NOT [ |noop | stats count AS raw | eval raw="a,b,c,ab,bc,cd,abc,bcd,cde,def" | makemv delim="," raw | mvexpand raw | streamstats count AS _time | eval MD5=raw | rename raw AS _raw | table _time MD5 _raw | streamstats count AS serial | outputcsv MyTemporaryFile.csv
| stats count AS Dr0pM3
| append [| inputcsv signaturecheck.csv ]
| where isnull(Dr0pM3)
| map maxsearches=99999 search="
   | inputcsv MyTemporaryFile.csv
   | eval Dr0pM3 = if(match(MD5, \"$RegEx$\"), \"DROP\", null())
   | where isnotnull(Dr0pM3) | fields serial | fields - _*
" | stats values(serial) AS serial ]

When tested, I got these correct results:

MD5 _raw                   _time    serial
  a       a     1969-12-31 18:00:01         1
  b       b     1969-12-31 18:00:02         2
  c       c     1969-12-31 18:00:03         3
 bc      bc     1969-12-31 18:00:05         5
def     def     1969-12-31 18:00:10        10

This is the most gruesomely horribad Splunk solution that I have ever crafted but I do not see any other way to do it.
What I especially like is the fact that the first part of the search references a file that DOES NOT EVEN EXIST but because subsearches always run first, it gets created before that part of the search gets a chance to start. Bwahahahahahahahahahahahahahahahaha!!!!!

Re: Regex in lookuptable

woodcock — Sun, 03 Apr 2016 02:01:42 GMT

P.S. Be careful about the maxsearches value; it could bite you and may need monitoring/adjustment. For some reason, 0 does not mean unlimited like most options do.

Re: Regex in lookuptable

woodcock — Sun, 03 Apr 2016 03:10:05 GMT

Also, this might be better writing into KVStore but I haven't learned how to do that yet so it is CSV all the way.

Re: Regex in lookuptable

woodcock — Mon, 04 Apr 2016 01:09:11 GMT

Also, this would be UNNECESSARY if Splunk would enhance match_type to support a REGEX option.

Re: Regex in lookuptable

DalJeanis — Fri, 21 Apr 2017 16:53:43 GMT

Upvote for self-commentary, "horribad" and "Bwahahahahahahahahahahahahahahahaha!!!!!"

Really ingenious and evil and mad, mad I tell you... but for SCIENCE!!!!!!

Re: Regex in lookuptable

DalJeanis — Thu, 12 Oct 2017 17:51:28 GMT

I really hate to destroy such a wonderful, evil answer as @woodcock's, but I hate the map command more, and as Yoda should have said,

"Another, there is."

You can filter a set of results by building a single composite regex in a subsearch and feeding it back to the search. Here's a working sample that we ran on one of our systems...

index=foo "someuserid" 
| regex 
    [| noop 
     | stats count AS host 
     | eval host="hostname1,partialhostname2,I haz spaces,I sed \"I haz spaces\""  
     | makemv delim="," host 
     | mvexpand host 
     | rex mode=sed field=host "s/ /!?!?!?/g"
     | format "" "" "" "" "|" "" 
     | rex mode=sed field=search "s/host=\"//g s/\" / /g s/ //g s/!\?!\?!\?/ /g  s/(.)$/\1)\"/g" 
     | eval search= "host=\"(".search 
     | fields search]

The subsearch in brackets [ ] returns a single field, search, that has the value

host="(hostname1|partialhostname2|I haz spaces|I sed \"I haz spaces\")"

...so after substitution, the full line is...

| regex host="(hostname1|partialhostname2|I haz spaces|I sed \"I haz spaces\")"

... and it operates as expected. We've also verified that spaces and internal quotes work as expected.

There is no reason you couldn't build one for your reasonably sized lookup table, as long as you don't run out of characters.

The above demonstrates a workaround " " -> "!?!?!?" -> " " that preserves internal spaces while reformatting the output of format.

It takes a little practice to get the regex right, but you can run the subsearch standalone without brackets as many times as you like until you are satisfied with the result.

Re: Regex in lookuptable

woodcock — Mon, 23 Oct 2017 23:59:33 GMT

Unfair. Almost every map solution can be turned inside-out as a subsearch solution (and vice-versa) Each method has VERY different pros/cons.

Re: Regex in lookuptable

fharding — Thu, 13 Feb 2020 00:05:47 GMT

Here is another technique to do fuzzy matching on multi-row outputs but without using a temp file, to avoid same-file-use for instances where many searches may call the code at once. We are using versions of this for some RBA enrichment macros. Thanks @japger_splunk for a tip on multireport and @woodcock for the map example!

| makeresults 
| eval src="1.1.1.1;2.2.2.2;3.3.3.3;4.4.4.4;5.5.5.5" 
| makemv src delim=";" 
| mvexpand src `comment("

BEGIN ENRICHMENT BLOCK")`
    `comment("REMEMBER ROW ORDER AND MARK AS ORIGINAL DATA")` 
| eval original_row=1 
| streamstats count AS marker 
    `comment("FORK THE SEARCH: FIRST TO PRESERVE RESULTS, SECOND TO COMPARE EACH ROW AGAINST LOOKUP")` 
| multireport 
    [ ] 
    `comment("FOR EACH ROW, RUN FUZZY MATCH AGAINST LOOKUP AND SUMMARIZE THE RESULTS")` 
    [| map maxsearches=99999 search="
    | inputlookup notable_cache 
    | eval marker=$marker$, src=$src$
    | eval match=if(like(raw,\"%\".src.\"%\"), 1, 0) 
    | where match==1 
    | eval age_days = (now()-info_search_time)/86400 
    | eval in_notable_7d=if(age_days<=7,1,0), in_notable_30d=if(age_days<=30,1,0) 
    | stats values(marker) AS marker, sum(in_notable_7d) AS in_notable_7d_count, sum(in_notable_30d) AS in_notable_30d_count BY src
    "] 
    `comment("INTERLEAVE THE ORIGINAL RESULTS WITH THE LOOKUP MATCH RESULTS")` 
| sort 0 marker, in_notable_30d_count, in_notable_7d_count 
    `comment("TRANSPOSE DATA FROM ABOVE")` 
| streamstats current=f window=1 last(in_notable_30d_count) AS prev_in_notable_30d_count, last(in_notable_7d_count) AS prev_in_notable_7d_count 
    `comment("GET RID OF THE LOOKUP RESULTS")` 
| where original_row==1 
    `comment("CLEAN UP THE DATA")` 
| rename prev_in_notable_30d_count AS in_notable_30d_count, prev_in_notable_7d_count AS in_notable_7d_count 
| fillnull value=0 in_notable_30d_count, in_notable_7d_count 
| fields - original_row, marker