Solved: Re: How to write a search to only return data from...

TCK101 · ‎01-20-2016

Using

| bucket span=1d _time
| stats count by _time

and set custom time @d+8h to get TODAY'S data from 8AM onwards until @d+16h whenever I run the report.

However, I want to create a list of the results for EACH of the last 7 days that only includes data from the day 08:00 up until 16:00.

I tried to do via the custom time:
Start
-7d@d+8h
Latest
@d+16h

but this ends up including all the data from 24 hours between those two date / time ranges

I just want 08:00 to 16:00 each day

aljohnson_splun · ‎01-20-2016

Add

| eval date_hour = strftime(_time, "%H")
| search date_hour>8 date_hour<16

to your existing search and you should be good to go.

View solution in original post

vbumgarner · ‎01-21-2016

date_hour is an indexed field, so you can do this in your original query.

originial search date_hour>8 date_hour<16

The goal is to pull fewer events off the disk, and use the index wherever possible. If you can do it in the base query, then you can also use it as the query for an accelerated data model, which I'm guessing you'll want for your use case. Incidentally, I assumed that spelling each hour out with a bunch OR statements would have been more efficient, but it doesn't seem to make any difference. I think that's an improvement in recent years.

Is the reason for the timeframe to find "business hours"? The date_hour is UTC, so is not going to honor Daylight savings. Pad accordingly.

This would be a great use for an eventtype, then you can simply add to any query eventtype=businesshours. An eventtype does introduce some added load, since it is calculated any time you run a query against the events that the eventtype is configured to apply to, so for minimal load, use a macro. An eventtype is handy in other ways, of course, but it does introduce load.

aljohnson_splun · ‎01-21-2016

the problem with using date_hour is that they don't reflect timezone changes - see this answer here for a detailed response.

vbumgarner · ‎01-21-2016

True, but you should always strive to pull less off the disk. The best answer is probably the combination -- use date_hour to get the hours that are possible, given DST, then use the search filter to throw away the one hour that isn't useful.

If using a data model, you could set a couple of fields to subsequently filter on.

dst_hour=strftime(_time, "%H")
business_hours=if(dst_hour>8 and dst_hour<16, 1, 0)

vbumgarner · ‎01-21-2016

Or I suppose define an eval field, and use that in the eventtype or datamodel query? I haven't used eval fields anywhere...

vbumgarner · ‎01-21-2016

Oh, now I read the post you linked to. I didn't realize how limited those fields actually are. Okay, so they should be used with a bucket full of caveats. Too bad.

So, the only supportable way to make this particularly efficient, if this data is going to be used a lot, is to make a data model, make an eval field that determines if the date is in scope, and then accelerate that data model. That's your only hope to make this efficient.

aljohnson_splun · ‎01-20-2016

Add

| eval date_hour = strftime(_time, "%H")
| search date_hour>8 date_hour<16

to your existing search and you should be good to go.

masonmorales · ‎01-20-2016

@aljohnson Since you work for Splunk, I'm going to ask you if this approach is more efficient than running a bunch of appends with the desired time ranges, and if so, why?

aljohnson_splun · ‎01-21-2016

vbmgarner nailed it. basically, subsearches have limitations. I'm sure you've seen people asking regularly how to avoid subsearches for that exact reason. If possible, its (generally) better to avoid them if possible. In terms of whether or not its actually more performant, I'll let someone else do a test. The search job inspector will let you know if the gain is worth it or not.

Check out the docs here.

vbumgarner · ‎01-21-2016

In all but the most extreme cases, running a single query with more arguments is going to be faster than running multiple queries and using append, just because of the time to set up each query. Append also has a time limit, as all sub-searches do.

How to write a search to only return data from a set time range (8:00 to 16:00) each day?

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

Are you a member of the Splunk Community?

How to write a search to only return data from a set time range (8:00 to 16:00) each day?

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...