Solved: How to compare events with "milestone" lookup?

yuanliu · ‎08-20-2022

I have a really simple task but haven't figured out how. This is a simple table of milestones

milestone1	milestone2	milestone3	release
2022-01-30	2022-02-28	2022-03-25	1_0
2022-04-20	2022-05-10	2022-05-25	1_1
2022-07-02	2022-07-21	2022-08-14	1_2
2022-09-20	2022-10-14	2022-11-03	1_3
2022-12-21	2023-01-11	2023-01-31	2_0

I need to determine the "release" cycle a given event is in, and perform some calculations in relationship to milestones. For illustration purposes, let's say if an event is in between milestone1 of 1_1 and 1_2 (2022-04-20 and 2022-07-02), I'll say it belongs in release cycle 1_1. In other words, milestone2, milestone3, etc., can be considered mere attributes that I need to retrieve. (In the real world, some columns are not dates.)

Initially I thought a simple lookup would suffice. But after various trials, I have made little progress. If I could devise a macro based on the lookup table to output the value of release, I can certainly then lookup in the table to obtain the rest of attributes. I even thought of adding a dummy (constant value) column so I could retrieve the entire table with every event. But even with that, I still couldn't find an easy way to match event with row.

The best I have come up with so far is to determine the current release by comparing | inputlookup with now(), like this

| inputlookup release
| where now() > strptime(milestone1, "%F")
| eventstats max(release) as current_release
| where release == current_release

If milestone1 is in epoc time, the search could be simpler but in any case, this only gives now, and I cannot really use it in a macro unless the macro is placed in a subsearch of sorts. (And if the macro is in a subsearch, I cannot pass event time as a parameter, this means that I still don't get to match with events.)

bowesmana · ‎09-04-2022

@yuanliu I saw your post on Slack. Your original solution is not optimal for a number of reasons - although it may not be the solution, I believe this is a more efficient solution to your lookup version given that

you only ever want a zero or one values of milestone1 from the lookup, where
- zero if _time is < the earliest milestone1
- one, where the value is the closest value (in the past) of milestone1 to _time

so, there's no need to use mvexpand/where/eventstats/where at all, as you can use mvmap/eval to filter milestone1 values to zero or one values and then lookup/where to filter those events where _time<milestone1

| makeresults count=1000
| eval _time=now() - ((random() % 240) * 86400)
| eval dummy="dumb"
| lookup release dummy OUTPUT milestone1
``` this filters to only those dates after _time ```
| eval milestone1=mvmap(milestone1, if(_time > strptime(milestone1, "%F"), milestone1, null()))
``` and this takes the latest of them ```
| eval milestone1=max(milestone1)
| lookup release milestone1
| where isnotnull(release)
| sort _time

I used your data set of milestones for my lookup.

I'm not sure if this is the best solution, but it avoids those expensive mvexpand/eventstats combinations.

View solution in original post

yuanliu · ‎09-04-2022

This is wonderful, @bowesmana! I didn't realize that statistical functions apply to any multivalue data, not just in stats commands.

yuanliu · ‎08-20-2022

Via the magic of posting😉, I made some progress that I hadn't been able to make in days. But the performance is so poor.

As mentioned, I added a field "dummy" (with value "dumb") to the lookup so I can retrieve the entire table into every event.

| eval dummy="dumb"
| lookup release dummy OUTPUT milestone1
| mvexpand milestone1
| where _time > strptime(milestone1, "%F")
| eventstats max(milestone1) by _time
| where milestone1 == 'max(milestone1)'
| lookup release milestone1

This kind of get what I needed, but a search that returned < 9,000 events more than doubled search time (from 1.7s to 4s). And this is only dealing with 5 values of milestone1. As expected, eventstats seems to be the most expensive, followed by mvexpand.

Is there a better way to do this?

bowesmana · ‎09-04-2022

@yuanliu I saw your post on Slack. Your original solution is not optimal for a number of reasons - although it may not be the solution, I believe this is a more efficient solution to your lookup version given that

you only ever want a zero or one values of milestone1 from the lookup, where
- zero if _time is < the earliest milestone1
- one, where the value is the closest value (in the past) of milestone1 to _time

so, there's no need to use mvexpand/where/eventstats/where at all, as you can use mvmap/eval to filter milestone1 values to zero or one values and then lookup/where to filter those events where _time<milestone1

| makeresults count=1000
| eval _time=now() - ((random() % 240) * 86400)
| eval dummy="dumb"
| lookup release dummy OUTPUT milestone1
``` this filters to only those dates after _time ```
| eval milestone1=mvmap(milestone1, if(_time > strptime(milestone1, "%F"), milestone1, null()))
``` and this takes the latest of them ```
| eval milestone1=max(milestone1)
| lookup release milestone1
| where isnotnull(release)
| sort _time

I used your data set of milestones for my lookup.

I'm not sure if this is the best solution, but it avoids those expensive mvexpand/eventstats combinations.

How to compare events with "milestone" lookup?

lookup

subsearch

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor