Splunk Search

Can I save mvexpand when matching a multivalue lookup?

yuanliu
Builder

I have a lookup that can return multivalue for two fields, one of them a timestamp, like this

keytextdate
key1abc|def2021-04-06
key1efg|hij2021-06-04
...  

A lookup on key1 will return

keytextdate
key1

abc|def

efg|hij

2021-04-06

2021-12-31

I then want to match the text field as regex, and perform calculation on the matching date.  For example, match(text, mytext) where mytext = "abc", and compare now() > strptime(date, "%Y-%m-%d").

I saw many mvexpand solutions in the past, and some mvjoin() solution.  mvexpand is expensive for my data, and mvjoin() only works for regex on text.

Is there some way to do this without mvexpand?

Labels (1)
Tags (1)
0 Karma
1 Solution

ITWhisperer
Legend

Here is a runanywhere example - the first part sets up some dummy data with different combinations of text and dates; the second part effectively is the results of the lookup (and should be replaced with the real lookup; and, the last part does the mvzip and mvmap to process the matching to return which entry matched the text and date.

| makeresults 
| eval field=split("abc123,def234,cde345,abcdef987,efg234,abchij",",")
| mvexpand field
| eval time=split("2021-04-05,2021-04-06,2021-06-04,2021-06-05",",")
| mvexpand time


| appendcols 
    [| makeresults 
    | eval _raw="key	text	date
key1	abc|def	2021-04-06
key1	efg|hij	2021-06-04"
    | multikv forceheader=1
    | fields - _* linecount
    | stats list(*) as * by key]
| filldown key text date


| eval comb=mvzip(text,date,",")
| eval match=mvmap(comb,if(match(field,mvindex(split(comb,","),0)),if(time=mvindex(split(comb,","),1),comb,null),null))
| table time field key text date comb match

View solution in original post

Tags (2)
0 Karma

yuanliu
Builder

 

One crazy solution is to use mvzip() to "tie" those potentially multivalue fields together, perform the match against tied string (considerations in not create false matches, too), then split the matched jumbo string to return text, date in the match.

 

| eval crazymv = mvzip(text, date, "|:crazy:|"),
crazymatch = mvfind(crazymv, "abc"),
crazysv = mvindex(crazymv, crazymatch)
crazysplit = split(crazysv, "|:crazy:|"), 1),
text = mvindex(crazysplit, 0),
date = mvindex(crazysplit, 1)

 

 

 

0 Karma

ITWhisperer
Legend

Can you mvzip text and date, then mvmap the zipped field where the function uses the two halves for a match and date comparison?

In what way is mvexpand "expensive"? If you need an alternative to mvexpand, I posted a solution here, although this was more to do with avoiding the limitations of mvexpand and may be just as "expensive" in your case as there are a lot of steps involved to perform the same action, but it does have the benefit of being able to expand more than one field at the same time.

yuanliu
Builder

Yes, mvzip(), then unzip (split()) is what I ended up doing.  The whole operation feels crazy.


@ITWhisperer wrote:

In what way is mvexpand "expensive"? If you need an alternative to mvexpand, I posted a solution here, although this was more to do with avoiding the limitations of mvexpand and may be just as "expensive" in your case as there are a lot of steps involved to perform the same action, but it does have the benefit of being able to expand more than one field at the same time.


mvexpand multiples total number of events down the stream.  In my example, there will be 4x original events.  BTW, that link is broken.

0 Karma

ITWhisperer
Legend
0 Karma

yuanliu
Builder

mvexpand multiples total number of events down the stream.  In my example, there will be 4x original

I had thought that if I did mvexpand in a subsearch that had very few events, or even just synthetic events, but incorporated the lookup, mvexpand would be cheap inside.  But I couldn't figure out how to make it do what I wanted.  The mvzip() method was from a long time ago, but isn't very SPL-like.

0 Karma

ITWhisperer
Legend

Here is a runanywhere example - the first part sets up some dummy data with different combinations of text and dates; the second part effectively is the results of the lookup (and should be replaced with the real lookup; and, the last part does the mvzip and mvmap to process the matching to return which entry matched the text and date.

| makeresults 
| eval field=split("abc123,def234,cde345,abcdef987,efg234,abchij",",")
| mvexpand field
| eval time=split("2021-04-05,2021-04-06,2021-06-04,2021-06-05",",")
| mvexpand time


| appendcols 
    [| makeresults 
    | eval _raw="key	text	date
key1	abc|def	2021-04-06
key1	efg|hij	2021-06-04"
    | multikv forceheader=1
    | fields - _* linecount
    | stats list(*) as * by key]
| filldown key text date


| eval comb=mvzip(text,date,",")
| eval match=mvmap(comb,if(match(field,mvindex(split(comb,","),0)),if(time=mvindex(split(comb,","),1),comb,null),null))
| table time field key text date comb match

View solution in original post

Tags (2)
0 Karma

yuanliu
Builder

So, this solution took two elements  that I wasn't  familiar.

  1. appendcols - compared to mvexpand, it doesn't increase number of events.
  2. mvmap() - an enumerable feature in many languages that skipped my mind when you first mentioned it. Much cleaner than the crazy mvindex(), split() sequence that I had used.

To illustrate in pseudo code,

 

`THE search to return key and textfield for matching lookup`

| appendcols [
  lookup key_text_date key
]
| filldown key text date

| eval comb=mvzip(text, date, ",")
| eval match=mvmap(comb, if(match(textfield, mvindex(split(comb, ","), 0)), if(time=mvindex(split(comb, ","), 1), comb, null), null))

 

This is the  SPLeque solution I am looking for.  Thanks, @ITWhisperer 

0 Karma

yuanliu
Builder

To illustrate in pseudo code,

 

 

...
| eval match=mvmap(comb, if(match(textfield, mvindex(split(comb, ","), 0)), if(time=mvindex(split(comb, ","), 1), comb, null), null))

 

 


Mmm I didn't inspect the code close enough.   The last piece  tries to match a nonexistent "time"  field with  the  date field from lookup.  The real use case of the date field is to be compared with something, not a precise match.   `THE Search to return key and textfield for matching lookup` only returns "key" to match the exact "key" in lookup, and a "textfield" to be matched as regex against "text" in lookup.   Like so

 

 

 

| makeresults count=4
| eval keyfield="key1", textfield=split("abc123,def234,cde345,abcdef987,efg234,abchij",",")
| mvexpand textfield
| streamstats count
| eval _time=_time - count * 864000

 

 

"date" in lookup is not used for matching events, but for comparison or other purposes, e.g., comparison and display.

As such, the general method of using mvzip - mvmap still stands, although another split is warranted to extract matching "date" for display purposes. (I'll use comparison _time <date as matching criterion in this illustration, although it can be any operation.)

 

 

`THE search to return key and textfield for matching lookup`

| lookup key_text_date key

| eval comb=mvzip(text, date, ",")
| eval match=mvmap(comb,
 if(match(textfield, mvindex(split(comb, ","), 0)),
 if(_time < strptime(mvindex(split(comb, ","), 1), "%Y-%m-%d"), comb, null), null)
)
| eval date = mvindex(split(match, ","), 1)

 

 

(I also realize that appendcol is just to simulate lookup; lookup effectively appends columns to search results.)   If I need matching text value for display, another mvindex() will be needed.  The code will look similarly complex, but this solution doesn't require consideration for  the mvjoin() and split() delimiter.

0 Karma

yuanliu
Builder
(I'll use comparison _time <date as matching criterion in this illustration, although it can be any operation.)

 

| eval match=mvmap(comb,
 if(match(textfield, mvindex(split(comb, ","), 0)),
 if(_time < strptime(mvindex(split(comb, ","), 1), "%Y-%m-%d"), comb, null), null)
)
| eval date = mvindex(split(match, ","), 1)

 


Note on application: Sometimes it is better to leave the operation on date out of mvmap() altogether; instead, perform that operation AFTER obtaining single-value "date" field.  This can be important if you are using flexible selectors in a dashboard.  In fact, the resultant code is outright more readable.

 

 

`THE search to return key and textfield for matching lookup`
| lookup key_text_date key

| eval comb=mvzip(text, date, ",")
| eval textmatch=mvmap(comb, if(match(textfield, mvindex(split(comb, ","), 0)), comb, null))
| eval date = mvindex(split(textmatch, ","), 1)
| eval fullmatch=if(_time < strptime(date, "%Y-%m-%d"), "yes", null)

 

 

(One more note on syntax.  I have gotten into the habit of using null() in evals, not realizing that a side effect of SPL syntax makes unquoted token "null" equivalent to null().  Of course, using side effect has its side effects.  The unquoted token may as well be "full" and it will still be interpreted as null(); e.g., the last eval above can be written as

 

| eval fullmatch=if(_time < strptime(date, "%Y-%m-%d"), "yes", yes)

 

and remains semantically identical.)

0 Karma

ITWhisperer
Legend

Yes, the first part of my solution is replaced by your search, the second part is replaced by your lookup, the third part is modified to fit the comparisons you need in your specific use case. Sometimes, it is better to leave you with something to do, to give you an opportunity to understand the concepts behind the solution. 😀

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!