I have a lookup that can return multivalue for two fields, one of them a timestamp, like this
key | text | date |
key1 | abc|def | 2021-04-06 |
key1 | efg|hij | 2021-06-04 |
... |
A lookup on key1 will return
key | text | date |
key1 | abc|def efg|hij | 2021-04-06 2021-12-31 |
I then want to match the text field as regex, and perform calculation on the matching date. For example, match(text, mytext) where mytext = "abc", and compare now() > strptime(date, "%Y-%m-%d").
I saw many mvexpand solutions in the past, and some mvjoin() solution. mvexpand is expensive for my data, and mvjoin() only works for regex on text.
Is there some way to do this without mvexpand?
Here is a runanywhere example - the first part sets up some dummy data with different combinations of text and dates; the second part effectively is the results of the lookup (and should be replaced with the real lookup; and, the last part does the mvzip and mvmap to process the matching to return which entry matched the text and date.
| makeresults
| eval field=split("abc123,def234,cde345,abcdef987,efg234,abchij",",")
| mvexpand field
| eval time=split("2021-04-05,2021-04-06,2021-06-04,2021-06-05",",")
| mvexpand time
| appendcols
[| makeresults
| eval _raw="key text date
key1 abc|def 2021-04-06
key1 efg|hij 2021-06-04"
| multikv forceheader=1
| fields - _* linecount
| stats list(*) as * by key]
| filldown key text date
| eval comb=mvzip(text,date,",")
| eval match=mvmap(comb,if(match(field,mvindex(split(comb,","),0)),if(time=mvindex(split(comb,","),1),comb,null),null))
| table time field key text date comb match
One crazy solution is to use mvzip() to "tie" those potentially multivalue fields together, perform the match against tied string (considerations in not create false matches, too), then split the matched jumbo string to return text, date in the match.
| eval crazymv = mvzip(text, date, "|:crazy:|"),
crazymatch = mvfind(crazymv, "abc"),
crazysv = mvindex(crazymv, crazymatch)
crazysplit = split(crazysv, "|:crazy:|"), 1),
text = mvindex(crazysplit, 0),
date = mvindex(crazysplit, 1)
Can you mvzip text and date, then mvmap the zipped field where the function uses the two halves for a match and date comparison?
In what way is mvexpand "expensive"? If you need an alternative to mvexpand, I posted a solution here, although this was more to do with avoiding the limitations of mvexpand and may be just as "expensive" in your case as there are a lot of steps involved to perform the same action, but it does have the benefit of being able to expand more than one field at the same time.
Yes, mvzip(), then unzip (split()) is what I ended up doing. The whole operation feels crazy.
@ITWhisperer wrote:In what way is mvexpand "expensive"? If you need an alternative to mvexpand, I posted a solution here, although this was more to do with avoiding the limitations of mvexpand and may be just as "expensive" in your case as there are a lot of steps involved to perform the same action, but it does have the benefit of being able to expand more than one field at the same time.
mvexpand multiples total number of events down the stream. In my example, there will be 4x original events. BTW, that link is broken.
mvexpand multiples total number of events down the stream. In my example, there will be 4x original
I had thought that if I did mvexpand in a subsearch that had very few events, or even just synthetic events, but incorporated the lookup, mvexpand would be cheap inside. But I couldn't figure out how to make it do what I wanted. The mvzip() method was from a long time ago, but isn't very SPL-like.
Here is a runanywhere example - the first part sets up some dummy data with different combinations of text and dates; the second part effectively is the results of the lookup (and should be replaced with the real lookup; and, the last part does the mvzip and mvmap to process the matching to return which entry matched the text and date.
| makeresults
| eval field=split("abc123,def234,cde345,abcdef987,efg234,abchij",",")
| mvexpand field
| eval time=split("2021-04-05,2021-04-06,2021-06-04,2021-06-05",",")
| mvexpand time
| appendcols
[| makeresults
| eval _raw="key text date
key1 abc|def 2021-04-06
key1 efg|hij 2021-06-04"
| multikv forceheader=1
| fields - _* linecount
| stats list(*) as * by key]
| filldown key text date
| eval comb=mvzip(text,date,",")
| eval match=mvmap(comb,if(match(field,mvindex(split(comb,","),0)),if(time=mvindex(split(comb,","),1),comb,null),null))
| table time field key text date comb match
So, this solution took two elements that I wasn't familiar.
To illustrate in pseudo code,
`THE search to return key and textfield for matching lookup`
| appendcols [
lookup key_text_date key
]
| filldown key text date
| eval comb=mvzip(text, date, ",")
| eval match=mvmap(comb, if(match(textfield, mvindex(split(comb, ","), 0)), if(time=mvindex(split(comb, ","), 1), comb, null), null))
This is the SPLeque solution I am looking for. Thanks, @ITWhisperer
To illustrate in pseudo code,
...
| eval match=mvmap(comb, if(match(textfield, mvindex(split(comb, ","), 0)), if(time=mvindex(split(comb, ","), 1), comb, null), null))
Mmm I didn't inspect the code close enough. The last piece tries to match a nonexistent "time" field with the date field from lookup. The real use case of the date field is to be compared with something, not a precise match. `THE Search to return key and textfield for matching lookup` only returns "key" to match the exact "key" in lookup, and a "textfield" to be matched as regex against "text" in lookup. Like so
| makeresults count=4
| eval keyfield="key1", textfield=split("abc123,def234,cde345,abcdef987,efg234,abchij",",")
| mvexpand textfield
| streamstats count
| eval _time=_time - count * 864000
"date" in lookup is not used for matching events, but for comparison or other purposes, e.g., comparison and display.
As such, the general method of using mvzip - mvmap still stands, although another split is warranted to extract matching "date" for display purposes. (I'll use comparison _time <date as matching criterion in this illustration, although it can be any operation.)
`THE search to return key and textfield for matching lookup`
| lookup key_text_date key
| eval comb=mvzip(text, date, ",")
| eval match=mvmap(comb,
if(match(textfield, mvindex(split(comb, ","), 0)),
if(_time < strptime(mvindex(split(comb, ","), 1), "%Y-%m-%d"), comb, null), null)
)
| eval date = mvindex(split(match, ","), 1)
(I also realize that appendcol is just to simulate lookup; lookup effectively appends columns to search results.) If I need matching text value for display, another mvindex() will be needed. The code will look similarly complex, but this solution doesn't require consideration for the mvjoin() and split() delimiter.
(I'll use comparison _time <date as matching criterion in this illustration, although it can be any operation.)
| eval match=mvmap(comb,
if(match(textfield, mvindex(split(comb, ","), 0)),
if(_time < strptime(mvindex(split(comb, ","), 1), "%Y-%m-%d"), comb, null), null)
)
| eval date = mvindex(split(match, ","), 1)
Note on application: Sometimes it is better to leave the operation on date out of mvmap() altogether; instead, perform that operation AFTER obtaining single-value "date" field. This can be important if you are using flexible selectors in a dashboard. In fact, the resultant code is outright more readable.
`THE search to return key and textfield for matching lookup`
| lookup key_text_date key
| eval comb=mvzip(text, date, ",")
| eval textmatch=mvmap(comb, if(match(textfield, mvindex(split(comb, ","), 0)), comb, null))
| eval date = mvindex(split(textmatch, ","), 1)
| eval fullmatch=if(_time < strptime(date, "%Y-%m-%d"), "yes", null)
(One more note on syntax. I have gotten into the habit of using null() in evals, not realizing that a side effect of SPL syntax makes unquoted token "null" equivalent to null(). Of course, using side effect has its side effects. The unquoted token may as well be "full" and it will still be interpreted as null(); e.g., the last eval above can be written as
| eval fullmatch=if(_time < strptime(date, "%Y-%m-%d"), "yes", yes)
and remains semantically identical.)
Yes, the first part of my solution is replaced by your search, the second part is replaced by your lookup, the third part is modified to fit the comparisons you need in your specific use case. Sometimes, it is better to leave you with something to do, to give you an opportunity to understand the concepts behind the solution. 😀