Hello, I am creating some reports to measure the uptime of hardware we have deployed, and I need a way to filter out multiple date/time ranges the match up with maintenance windows. We are utilizing a Data Model and tstats as the logs span a year or more.
The (truncated) data I have is formatted as so:
time range: Oct. 3rd - Oct 7th.
|tstats summariesonly=true allow_old_summaries=true count(device.status) as count from datamodel=Devices.device where device.status!="" AND device.customer="*" AND device.device_id ="*" by device.customer, device.device_id, device.name, device.status _time
device.customer device.device_id device.name device.status _time count
ppt | webOS-205AZXCA8162 | Sao Paulo Office | offline | 2022-10-04 | 314 |
ppt | webOS-205AZXCA8162 | Sao Paulo Office | offline | 2022-10-05 | 782 |
ppt | webOS-205AZXCA8162 | Sao Paulo Office | offline | 2022-10-06 | 749 |
ppt | webOS-205AZXCA8162 | Sao Paulo Office | offline | 2022-10-07 | 1080 |
ppt | webOS-205AZXCA8162 | Sao Paulo Office | online | 2022-10-04 | 510 |
ppt | webOS-205AZXCA8162 | Sao Paulo Office | online | 2022-10-05 | 658 |
ppt | webOS-205AZXCA8162 | Sao Paulo Office | online | 2022-10-06 | 691 |
ppt | webOS-205AZXCA8162 | Sao Paulo Office | online | 2022-10-07 | 360 |
ppt | webOS-205AZXCA8162 | Sao Paulo Office | warning | 2022-10-04 | 1 |
ppt | webOS-205AZXCA8162 | Sao Paulo Office | warning | 2022-10-06 | 2 |
ppt | webOS-205AZXCA8162 | Sao Paulo Office | warning | 2022-10-07 | 1 |
As the reports will be run by other teams ad hoc, I was attempting to use a 'blacklist' lookup table to allow them to add the devices, time ranges, or device AND time range they wish to exclude from the results. That lookup table is formatted as such:
type | start | end | deviceID | note |
time | 2022-10-03T13:10:30.000-04:00 | 2022-10-04T14:10:30.000-04:00 | test range 10-04-2022 1:30 through 2:10 in EST UTC-4 | |
device | 12345 | |||
timedevice | 2022-10-04T13:10:30.000-04:00 | 2022-10-05T14:10:30.000-04:00 | webOS-205AZXCA8162 | |
time | 2022-10-06T13:10:30.000-04:00 | 2022-10-06T14:10:30.000-04:00 | test range 10-06-2022 1:30 through 2:10 in EST UTC-4 | |
device | webOS-205AZXCA8122 |
In my head, this works as a report they run on the total timeframe they wish to analyze, and then the devices, timeframes, and timeframe/device events are removed as entered on the lookup table.
My biggest hang up right now is finding a way to exclude the unknown quantity of time or timedevice blacklist entries from the total list of results.
Thank you for any help you can provide!
This looks like a good use case for time based lookups:
1. Create a lookup "exclude_maintenance_window_lookup.csv" with an additional field of excluded=1, for example:
type | start | end | deviceID | note | excluded |
time | 2022-10-03T13:10:30.000-04:00 | 2022-10-04T14:10:30.000-04:00 | test range 10-04-2022 1:30 through 2:10 in EST UTC-4 | 1 |
2. Configure a new lookup definition (use default value for anything not defined below):
Name: exclude_maintenance_window_lookup
Type: File-based
Lookup file: exclude_maintenance_window_lookup.csv
Configure time-based lookup: Checked
Name of time field: start
Time format: %Y-%m-%dT%H:%M:%S.%3N%z
Minimum offset: 0
Advanced Options -> Maximum matches: 1
3. To use the lookup:
<base_search>
| eval excluded=1
| lookup exclude_maintenance_window_lookup excluded OUTPUT start end note
| eval end_time=strptime(end, "%Y-%m-%dT%H:%M:%S.%3N%z")
| eval is_excluded=IF(_time<end_time, "Y", "N")
| search is_excluded="N"
This looks like a good use case for time based lookups:
1. Create a lookup "exclude_maintenance_window_lookup.csv" with an additional field of excluded=1, for example:
type | start | end | deviceID | note | excluded |
time | 2022-10-03T13:10:30.000-04:00 | 2022-10-04T14:10:30.000-04:00 | test range 10-04-2022 1:30 through 2:10 in EST UTC-4 | 1 |
2. Configure a new lookup definition (use default value for anything not defined below):
Name: exclude_maintenance_window_lookup
Type: File-based
Lookup file: exclude_maintenance_window_lookup.csv
Configure time-based lookup: Checked
Name of time field: start
Time format: %Y-%m-%dT%H:%M:%S.%3N%z
Minimum offset: 0
Advanced Options -> Maximum matches: 1
3. To use the lookup:
<base_search>
| eval excluded=1
| lookup exclude_maintenance_window_lookup excluded OUTPUT start end note
| eval end_time=strptime(end, "%Y-%m-%dT%H:%M:%S.%3N%z")
| eval is_excluded=IF(_time<end_time, "Y", "N")
| search is_excluded="N"
@johnhuang How 'time sensitive' is this lookup when used with a data model -- as in, If the lookup table is modified, would the results be immediate? Is it correct that when the lookup table is modified, the model would have to be rebuilt to have it include the updated ranges?
Yes, the result is immediate as long as you refresh/reload the search. The lookup does not impact or change any data in the datamodel, so no rebuild necessary.
One caveat is that if you're summarizing data, you need to first summarize it at an small enough time interval that matches the granularity of your maintenance window, e.g. if your maintenance window is typically 30 minute block increments, you should set the tstat span=30m, then run the lookup to filter result, and then run stats to summarize by day.
| tstats span=1h summariesonly=true allow_old_summaries=true count(device.status) as count from datamodel=Devices.device where device.status!="" AND device.customer="*" AND device.device_id ="*" by device.customer, device.device_id, device.name, device.status _time
| eval excluded=1
| lookup exclude_maintenance_window_lookup excluded OUTPUT end
| eval end_time=strptime(end, "%Y-%m-%dT%H:%M:%S.%3N%z")
| where _time<end_time
| bucket _time span=1d
| rename device.customer AS device_customer device.device_id AS device_id device.name AS device_name device.status AS device_status
| stats sum(count) AS count BY _time device_customer device_id device_name device_status
@johnhuang It doesn't appear to be evaluating the is_excluded field properly. Could it be related to the end_time field becoming multi value after the lookup?
Just to confirm, here is the setup I have based on your instructions:
Search:
|tstats summariesonly=true allow_old_summaries=true count(device.status) as count from datamodel=Devices.device where device.status!="" AND device.customer="*" AND device.device_id ="*" by device.customer, device.device_id, device.name, device.status _time span=30m
|eval excluded=1
|lookup Historical_Uptime_Blacklist.csv excluded OUTPUT start end note
| eval end_time=strptime(end, "%Y-%m-%dT%H:%M:%S.%3N%z")
| eval epoch1=_time
| eval is_excluded=IF(_time<end_time, "Y", "N")
| search is_excluded="*"
Lookup Table: Historical_Uptime_Blacklist.csv
type | start | end | device | excluded | note |
time | 2022-10-04T13:10:30.000-04:00 | 2022-10-04T14:10:30.000-04:00 | 1 | test range 10-04-2022 1:30 through 2:10 in EST UTC-4 | |
device | 1234 | 1 | 0 | ||
timedevice | 2022-10-04T13:10:30.000-04:00 | 2022-10-04T14:10:30.000-04:00 | 1234 | 1 | 0 |
time | 2022-10-05T13:10:30.000-04:00 | 2022-10-06T14:10:30.000-04:00 | 1 | test range 10-06-2022 1:30 through 2:10 in EST UTC-4 | |
device | 00-10-7f-75-bf-b4_-1 | 1 | DSG 'MTR-CP-B510' Test |
Lookup Definition:
And lastly here is some sample data that I believe displays the issue. Based on the search, I would have expected that first entry to have been is_excluded=Y (Pasted as CSV due to the forum detecting HTML and reformatting the table.)
device.customer,device.device_id,device.name,device.status,_time,count,end,end_time,epoch1,excluded,is_excluded
RootIntegration,00-10-7f-de-0f-12_-1,TSW-1060-00107FDE0F12,Online,10/4/2022 0:00,20,2022-10-04T14:10:30.000-04:00,1664907030,1664856000,1,N
,,,,,,2022-10-04T14:10:30.000-04:00,1664907030,,,
,,,,,,2022-10-06T14:10:30.000-04:00,1665079830,,,
RootIntegration,00-10-7f-de-0f-12_-1,TSW-1060-00107FDE0F12,Online,10/4/2022 0:30,23,2022-10-04T14:10:30.000-04:00,1664907030,1664857800,1,N
,,,,,,2022-10-04T14:10:30.000-04:00,1664907030,,,
,,,,,,2022-10-06T14:10:30.000-04:00,1665079830,,,
RootIntegration,00-10-7f-de-0f-12_-1,TSW-1060-00107FDE0F12,Online,10/4/2022 1:00,21,2022-10-04T14:10:30.000-04:00,1664907030,1664859600,1,N
,,,,,,2022-10-04T14:10:30.000-04:00,1664907030,,,
,,,,,,2022-10-06T14:10:30.000-04:00,1665079830,,,
RootIntegration,00-10-7f-de-0f-12_-1,TSW-1060-00107FDE0F12,Online,10/4/2022 1:30,22,2022-10-04T14:10:30.000-04:00,1664907030,1664861400,1,N
,,,,,,2022-10-04T14:10:30.000-04:00,1664907030,,,
,,,,,,2022-10-06T14:10:30.000-04:00,1665079830,,,
Thanks for your assistance with this!
You're referencing the csv file for the lookup instead of the the lookup definition.
| lookup Historical_Uptime_Blacklist.csv excluded OUTPUT start end note
Make sure you run it against the "Name" you defined in the lookup definition rather than the csv file name. For example, if you named it Historical_Uptime_Blacklist:
| lookup Historical_Uptime_Blacklist excluded OUTPUT start end note
This looks like it is working as expected. I have more testing to do to verify it fully, but this seems to be a great solution. Thank you!
😥 Exactly why I posted so much detail. stupid human error, great catch! Initial results look promising, I'm testing this more in depth now. Thank you so much!
Thank you, that sounds great. I'll proceed with that testing now. Much appreciated!
@johnhuang I'll give this a try, thank you for the lead! I'll report back with results. Thank you!