Loop inside a loop with lookup data

mwinkel · ‎08-18-2017

Hi,

I'm trying double loop through a csv list of words using the map command. The idea behind it is to perform a search to return events that match two of the keywords in the list, whilst avoiding to return events that contain one of the keywords twice (hence the kw1 != kw2) I already have a search that uses inputlookup and then map to loop through the keywords in the file once. And my idea was to nest the map to loop through it twice. Unfortunately, the following search command does not seem to do the trick.

I get the following error ''Error in 'map': Did not find value for required attribute 'keyword'.

is it possible to do a nested map, is there another way to achieve what I want to do short of creating a CSV with all of the permutations?

Any help would be appreciated.
Thanks in advance

DalJeanis · ‎08-18-2017

Gosh, you have a method there that is O(n^2) at best, where n is the number of different keys you are searching for, and I suspect far worse than that. Worse, because for a trivial 10 words, you are doing 100 searches. I just don't want to think about how ugly that gets as n goes to small nontrivial numbers.

You want all events that have at least two different keywords from the file, and you don't care which ones, as long as they are different, correct?

So, it's simple. (A) search for events that have any of the keywords, and then (B) determine which distinct keywords they have.

Since you have to test against each value, overall, this will be an O(n) solution. (See footnote.)

index=foo  ("key1" OR "key2" OR "key3"....)
| rex field=_raw "(i)(?<keyfound>key1|key2|key3|...)" max_match=0
| eval keyfound=mvdedup(keyfound)
| where mvcount(keyfound)>1

Now, how do you turn your lookup into those two formats?

Both times we will use the format verb.

 |inputlookup keywordstest.csv | table keyword | format

The above produces this

( ( keyword="key1" ) OR  ( keyword="key2" ) OR ( keyword="key3" ) .... )

...so we need to kill the field name and some parenthesis from the results.

 |inputlookup keywordstest.csv | table keyword  | format "(" "" "" "" "OR" ")" | rex mode=sed field=search "s/keyword=//g"

...giving this format... and you should test this for yourself before we insert it in brackets into the original search...

 ("key1" OR "key2" OR "key3"....)

Great! Now we do it again for the rex. We have to change the OR to pipe, get rid of spaces and quotes, put \b before and after to make sure it only counts full words, and so on. That turns out to look like this...

 | inputlookup keywordstest.csv | table keyword 
 | format "(" "" "" "" "|" ")" | rex mode=sed field=search "s/keyword=//g s/\(  \"/\"(i)\\b(<keyfound>/g s/\"  \|  \"/|/g s/\"  \)/)\\b\"/g"

Each chunk of the rex literal, from s/ to /g, is one transformation command, four in all.

Don't trust me... test it. Probably with a | head 5 in there so the result output is short.

Now we put it all together like so...

index=foo  
    [|inputlookup keywordstest.csv | table keyword  | sort 0 - keyword
     | format "(" "" "" "" "OR" ")" 
     | rex mode=sed field=search "s/keyword=//g"
    ]
| rex field=_raw   [| inputlookup keywordstest.csv | table keyword | sort 0 - keyword
     | format "(" "" "" "" "|" ")" 
     | rex mode=sed field=search "s/keyword=//g s/\(  \"/\"(i)\\b(<keyfound>/g s/\"  \|  \"/|/g s/\"  \)/)\\b\"/g"
      ]   max_match=0
| eval keyfound=mvdedup(keyfound)
| where mvcount(keyfound)>1

And, once again, test it; maybe test the whole thing with |head 5 in both inputlookup subsearches to make the test run quick.

After that, you should be good to go.

Caveats: If you do NOT want to stop only at full word boundaries, then .... that would be messy. You'd have to sort your keywords into a useful order, semi-alphabetically but with longer keywords first before their shorter included terms.

Hmmm. Okay, descending alphabetically would be a good first cut, since shorter included words would be second, and all the words that start with a particular letter would be tested and rejected before trying another letter.

Hey, that's elegant. Okay, code above has been updated to do that anyway.

Footnote: Mathematicians and the pedantic may correctly argue that it edges into O(nlogn), but relative to the original, it's just plain flat O(n)... especially since ... well, that requires a beer and a whiteboard...

maciep · ‎08-18-2017

I think you're thinking like a programmer and not a splunk'er...but not sure if I understand the scenario exactly. Can you elaborate on what 'events' you're referring to? Can you provide example events and also what the lookup contains?

That said, something like "... | stats dc(keyword)" might get you what you want, but will need that context about your data to help further.

Loop inside a loop with lookup data

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

Loop inside a loop with lookup data

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits