Solved: Re: How to utilize wildcards in time fields in loo...

TBH0 · ‎10-05-2022

I have a lookup which has a field with time values (in 24 hr time; i.e. 00:30, 13:45, 23:15), which tells my dashboard the scheduled start time of jobs. I have a number of jobs which are set to run hourly, and as such need to have every hour as their start time (XX:00). I've tried adding wildcard functionality to the desired fields in the lookup definition like this:

WILDCARD(Field_Name_1),WILDCARD(Field_Name_2)

This has unfortunately not worked as I'd hoped, though, and it does not allow the wildcards to be every number when searching for all jobs which are set to run this hour.

Any ideas on how I can best implement this within the lookup?

yuanliu · ‎10-05-2022

Things are simpler if match is not involved. Even so, an illustration of the lookup data would help others. And I still think HH:MM may not be the best representation, especially because your actual schedule runs mixed intervals. Let me first demonstrate a quick solution with HH:MM. As I supposed, we'll call this field start_time, and assume that another field is called job.

| inputlookup dailyschedule
| rex field=start_time "(?<job_hour>[^:]+):(?<job_minute>[^:]+)"
| eval hod = mvrange(0,24)
| eval hod = mvmap(hod, if(hod < 10, "0".hod, hod))
| mvexpand hod
| eval job = if(job_hour == hod, job, null())
| stats values(job) as jobs by hod

This gives you something like

hod	jobs
00
01
02
03
04
05	job1 job3
06
07
08	job2
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23

Is this what you expect?

Now, suppose your mixed schedule is represented in a crontab format, like

minute	hour	dom	month	dow	job
05	08	*	*	*	job1
15	13	*	*	*	job2
40	15	*	*	*	job3
00	*	*	*	*	job4
45	00	01	3,6,9,12	*	job5
00	12	*	*	0	job6
30	16	*	*	*	job7
00	00	15	*	*	job8

Then, you can display all jobs in a given hour no matter its interval. For simplicity, say you still want to only display the daily jobs, you can do

| inputlookup samplecron
| where dow=="*" AND dom=="*" AND month=="*"
| fields hour job
| eval hod = mvrange(0,24)
| eval hod = mvmap(hod, if(hod < 10, "0".hod, hod))
| mvexpand hod
| eval job = if(hour == "*" OR hour == hod, job, null())
| stats values(job) as jobs by hod

The output will be similar, but will also include hourly job4, like

hod	jobs
00	job4
01	job4
02	job4
03	job4
04	job4
05	job4
06	job4
07	job4
08	job1 job4
09	job4
10	job4
11	job4
12	job4
13	job2 job4
14	job4
15	job3 job4
16	job4 job7
17	job4
18	job4
19	job4
20	job4
21	job4
22	job4
23	job4

View solution in original post

yuanliu · ‎10-05-2022

I cannot quite understand the use case. Can you elaborate? I can see that one field in lookup has the HH:MM format. (Let's call it start_time.) What is the data like for the jobs? Is the requirement to match those jobs' actual _time with that start_time field, but only match to the hour?

If this is the requirement, wildcard may not be the best solution. Do you have flexibility in the lookup's design? For example, can the start_time field be split into two fields, start_hour, start_minute? If so, all you need to do is to lookup start_hour, like

| eval job_hour = strftime(_time, "%H")
| lookup mylookup start_hour AS job_hour

That will be the easiest in my opinion.

TBH0 · ‎10-05-2022

Sorry for the confusion, in this query I am only looking at the lookup file to read out details about a schedule onto the dashboard. The difficulty I have is that the schedule contains jobs with irregular schedules (hourly, weekly, biweekly, monthly, etc.) and those are a bit more difficult to show in a 'scheduled to run today' panel on a dashboard as they aren't just 'XX:XX' daily. So for the ones which are every hour, I need a way to present those to Splunk in such a way that shows the jobs in every hour regardless of which timeframe I select.

yuanliu · ‎10-05-2022

Things are simpler if match is not involved. Even so, an illustration of the lookup data would help others. And I still think HH:MM may not be the best representation, especially because your actual schedule runs mixed intervals. Let me first demonstrate a quick solution with HH:MM. As I supposed, we'll call this field start_time, and assume that another field is called job.

| inputlookup dailyschedule
| rex field=start_time "(?<job_hour>[^:]+):(?<job_minute>[^:]+)"
| eval hod = mvrange(0,24)
| eval hod = mvmap(hod, if(hod < 10, "0".hod, hod))
| mvexpand hod
| eval job = if(job_hour == hod, job, null())
| stats values(job) as jobs by hod

This gives you something like

hod	jobs
00
01
02
03
04
05	job1 job3
06
07
08	job2
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23

Is this what you expect?

Now, suppose your mixed schedule is represented in a crontab format, like

minute	hour	dom	month	dow	job
05	08	*	*	*	job1
15	13	*	*	*	job2
40	15	*	*	*	job3
00	*	*	*	*	job4
45	00	01	3,6,9,12	*	job5
00	12	*	*	0	job6
30	16	*	*	*	job7
00	00	15	*	*	job8

Then, you can display all jobs in a given hour no matter its interval. For simplicity, say you still want to only display the daily jobs, you can do

| inputlookup samplecron
| where dow=="*" AND dom=="*" AND month=="*"
| fields hour job
| eval hod = mvrange(0,24)
| eval hod = mvmap(hod, if(hod < 10, "0".hod, hod))
| mvexpand hod
| eval job = if(hour == "*" OR hour == hod, job, null())
| stats values(job) as jobs by hod

The output will be similar, but will also include hourly job4, like

hod	jobs
00	job4
01	job4
02	job4
03	job4
04	job4
05	job4
06	job4
07	job4
08	job1 job4
09	job4
10	job4
11	job4
12	job4
13	job2 job4
14	job4
15	job3 job4
16	job4 job7
17	job4
18	job4
19	job4
20	job4
21	job4
22	job4
23	job4

TBH0 · ‎10-06-2022

The cron format is something I had not known about, but may be the more efficient way of doing what I'm doing. The Job and Start_Time columns are essentially as you have it, just with supporting details about the job which is also shown in the dashboard (and is also joined to an alert subsearch for certain panels).

The way I am utilizing this currently is actually even simpler than you had predicted: I'm just comparing the current time (or time range selected) to the HH:MM presented in the lookup, then showing all jobs which are <current time, >selected time, so it's been a fairly simplistic approach to it overall.

I don't have any experience with making anything in the cron format, but would this support having multiple jobs in the same time slot? How would you handle that as far as a .csv goes, just multiple rows with the same cron values and different job names?

yuanliu · ‎10-06-2022

Indeed if you only compare with the current hour, it's hugely simpler,

| inputlookup dailyschedule
| where strftime(now(), "%H") == mvindex(split(start_time, ":"), 0)

Crontab format is very flexible. Yes, you can have two jobs at the same time. For statistical purposes, it is easier if job names are different, but cron jobs do not even require that.

Remember it is just one possible representation. If you have a different method to represent mixed schedule, you can still use SPL to "translate" it for display. If the task is to perform a match in search, on the other hand, certain representations could be easier than others.

TBH0 · ‎10-06-2022

I appreciate that insight, I had not heard of the cron format and I think that's going to make this schedule a lot easier to work with. If I'm understanding correctly, I will try to convert the scheduled and late time columns to a cron format, then translate back to a readable format when displaying in a table.

I'll do more research into this, but I appreciate you!

How to utilize wildcards in time fields in lookup file?

lookup

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes