Alerting

How to trigger alerts when _indextime != _time?

marycordova
SplunkTrust
SplunkTrust

Problem definition: there are 3 possible "times" associated with an event and this can cause events to be missed in scheduled searches

Example: _time may have value 1 OR 2 but not 3 (_indextime)

  1. the timestamp listed in the _raw event data (TIME_PREFIX or other config) = 0:4:58
  2. the time the event is seen up by the forwarder (CURRENT) = 0:5:58
  3. the time the event is written to disk (_indextime) = 0:6:03

if either time 1 or time 2 is not equal to _indextime your event can be missed by a search

for example if you have an event where CURRENT time is 0:5:58 seconds, timestamp from _raw is 0:4:58, and _indextime is 0:6:03 then a search running every minute -1m@m to @m will never see this particular event

for sourcetypes with various props configurations such as EXTRACT, EVAL, FIELDALIAS, LOOKUP, etc there can easily be differences of several seconds between _time and _index time meaning an event is not available on disk until after the time the scheduled search has run

1 Solution

martin_mueller
SplunkTrust
SplunkTrust

Say you run a search every minute, over one minute of some notion of time. You want to see every event exactly once - not twice, not zero times. You can't do -2m@m to -m@m because events delayed by more than one minute might be missed. If your maximum indexing delay you want to cover were one day, you could do -d-m@m to -d@m, that would see every event exactly once... but would introduce an alert delay of one day, for all events. Not good.

Here's what I often do to accommodate unknown/large/varying indexing delays:

  • schedule the search to run every minute
  • set the timerange to -d to +d to allow for that one day maximum skew between indexing time and time
  • add _index_earliest=-2m@m _index_latest=-m@m

That way you look at every event exactly once, ensured by the _index_earliest/_latest and allow for the full day of skew, ensured by the regular timerange.
During a global indexing delay, e.g. no network to your indexers, the search will find nothing. Once indexing runs again it will find lots of things, everything that came in during that minute regardless of event timestamp.

That's for alerting. To detect indexing delays [or timezone configuration issues] I like this approach:

| tstats max(_indextime) as max where index=foo by host _time span=1s
| eval delta = max - _time | timechart max(delta) by host

https://answers.splunk.com/answers/232475/how-to-search-when-an-event-was-indexed.html#comment-23622...

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

Say you run a search every minute, over one minute of some notion of time. You want to see every event exactly once - not twice, not zero times. You can't do -2m@m to -m@m because events delayed by more than one minute might be missed. If your maximum indexing delay you want to cover were one day, you could do -d-m@m to -d@m, that would see every event exactly once... but would introduce an alert delay of one day, for all events. Not good.

Here's what I often do to accommodate unknown/large/varying indexing delays:

  • schedule the search to run every minute
  • set the timerange to -d to +d to allow for that one day maximum skew between indexing time and time
  • add _index_earliest=-2m@m _index_latest=-m@m

That way you look at every event exactly once, ensured by the _index_earliest/_latest and allow for the full day of skew, ensured by the regular timerange.
During a global indexing delay, e.g. no network to your indexers, the search will find nothing. Once indexing runs again it will find lots of things, everything that came in during that minute regardless of event timestamp.

That's for alerting. To detect indexing delays [or timezone configuration issues] I like this approach:

| tstats max(_indextime) as max where index=foo by host _time span=1s
| eval delta = max - _time | timechart max(delta) by host

https://answers.splunk.com/answers/232475/how-to-search-when-an-event-was-indexed.html#comment-23622...

View solution in original post

DalJeanis
SplunkTrust
SplunkTrust

@marycordovacaa - Looks like you're pulling in help from the heavy hitters with this one. Martin's write up is pretty clearly equivalent to what we discussed last night. The final note about using tstats to detect and analyze delays is a great trick.

marycordova
SplunkTrust
SplunkTrust

ok...just for a visual inspection of the behavior i did the below and it looks to be exactly what i need...

1 - cron running every 15 min with a 2 minute offset from the 0 minute

index=x sourcetype=y earliest=-2h@h latest=+2h@h _index_earliest=-17m@m _index_latest=-2m@m
| eval start=relative_time(now(),"-17m@m")
| eval end=relative_time(now(),"-2m@m")
| rename _time as time _indextime as indextime
| table time indextime start end
| convert ctime(time) as time ctime(indextime) as indextime ctime(start) as start ctime(end) as end
| foreach time indextime start end
[ rex field=<> "\d{2}\/\d{2}\/\d{4}\s+\d{2}:(?<<>>[\d:]*)"]
| dedup indextime | head 5

2 - this gives me the below events (all of which just happen to arriving in good time in this sample but at least i can see where things would fall into _time vs _indextime vs the start of my search vs the end of my search vs when my cron would run

time indextime start end
07:47 07:47 53:00 08:00
07:42 07:42 53:00 08:00
07:37 07:37 53:00 08:00

so this pretty clearly shows that regardless of _time there is no way _indextime will ever fall outside the bounds of the executing search and this i am happy with 😄

3 - so i would just use the very first line of the query above in all my high fidelity/critical/time sensitive alerts adjusting the cron skew and _indextime as necessary to load balance my searches so they are not all running at once and then of course add the rest of my spl to the end

index=x sourcetype=y earliest=-2h@h latest=+2h@h _index_earliest=-17m@m _index_latest=-2m@m ...1

THANK YOU @DalJeanis, @martin_mueller, @somesoni2, @adonio (i have reversed CURRENT out of my props.conf LOL)

dstuder
Path Finder

I'm not sure you want the ...

earliest=-2h@h latest=+2h@h

as this ...

_index_earliest=-17m@m _index_latest=-2m@m

is what you are really after. You want to alert on the stuff as is comes in (being indexed) not based at all on the actual time of the event. What if the machine was off for four hours and is turned back on? The _time (earliest) will be -4h and fall outside your search as it doesn't meet the first part even though it meets the second part. I think you really just want ...

index=x sourcetype=y _index_earliest=-17m@m _index_latest=-2m@m

martin_mueller
SplunkTrust
SplunkTrust

You still need some earliest and latest to filter _time with. What that range should be depends on the specific use case. I'd advise against all time in most cases because the IOP / file handle / cache poison impact can be tremendous.

0 Karma

marycordova
SplunkTrust
SplunkTrust

the threads started to get convoluted so here is a summary for anyone else with this issue:

problem: event _time < cron schedule < event available on disk to be searched resulting in events being missed in alerts


  1. preserves near real-time monitoring
  2. preserves actual event timestamping in _time
  3. accounts for variable time delays such as between 0-99 minutes
  4. may not be good for extremely high volume indexes or extremely wide time frames but is likely the best general practice

credit @DalJeanis

| multisearch
[ your search earliest=-4m@m latest=-2m@m
]
[ your search earliest=-2h@m latest=-4m@m
| where _indextime>relative_time(now(),"-4m@m")
]

  1. (does this need a rename or eval of _indextime somewhere?)
  2. good example of multisearch usage
  3. good example of combined various time notations -2h@m
  4. hard to test retroactively against data that will always match the 1st search regardless of the 2nd search unless the data can be artificially sent/received, the search run, the timestamp back dated, and then the data indexed all within the questionable time range of a few seconds or minutes - I need to figure out a way to stimulus response validate this

credit: @somesoni2 (read 3rd para): https://docs.splunk.com/Documentation/Splunk/7.1.2/SearchReference/SearchTimeModifiers#_time_and__in...

(NOTE caveat below 3rd para however)


credit @adonio: many times users leverage the _indextime field and something like _index_earliest=-6m@m _index_latest=-1m@m


last ditch solution if the data set permits :

  1. set _time to the following in props.conf: DATETIME_CONFIG = CURRENT
  2. re-write -1m@m to @m searches to accommodate potential delays: -2m@m to -1m@m

DalJeanis
SplunkTrust
SplunkTrust

Very good writeup.

For a confidence test, you can do a one-shot ingestion of any available past-dated CSV into a test index.

DalJeanis
SplunkTrust
SplunkTrust

You are correct that _time will, in practice, never match _indextime. They should not, because those two items have nothing to do with one another. They will only match when an event has NO specified time in its structure and is in essence merely recording internal information as/of the instant it is recorded.

You are correct that there is a processing delay between _time and _indextime in almost all cases, and that this delay can be any length of time that the transmission chain takes to move the data from place to place. In the case of a monthly transaction file transmitted by a vendor, the difference may even be more than 30 days.

The above are the reason why the first rule of administering Splunk is "Know your data." If your various computers are phoning home every 5 minutes, then the lag time can be in the range of up to 5m to 7m depending on load. That means that a summarizing search should run with a delay of perhaps 10m, to accurately summarize that particular kind of data.

This kind of tuning is part and parcel of Splunk administration.

And, yes, the standard for that is to schedule hourly jobs X minutes after the hour, for example, and to use earliest=-1h@h latest=@h ... so with the exception of your first suggestion, which is seldom going to be advisable, your answer is generally correct.

marycordova
SplunkTrust
SplunkTrust

earliest=-1h@h latest=@h does not resolve the issue because if i have an event where _time is back dated prior to actual indexing and the event is written to disk after my search has run on the hour i will be missing that particular event

if _time or _indextime were not epoch values with second level granularity and were only stamped with minute level granularity then perhaps that would work, except in the case where events have a minute's worth of delay or more

earliest=-61m@m latest=-1m@m is the only way to ensure that events are captured by search when you have delays of several seconds and in order to ensure that those events are time stamped within 59 seconds of indexing and therefore available to the search is to set _time to CURRENT

in my case over the course of a week the vast majority of my events for this particular data set have a 0 second time difference using a regex time EXTRACT however, delays of up to 99 minutes for some events were discovered and there is no way to craft a search that would account for such a widely variable time delay without both, delaying all notifications for the data set and, introducing an unacceptable deviation from near-real time monitoring

this particular data set includes high fidelity security alerts and can confirm that a 6 minuted delay caused a notification to be missed and we were only made aware of having missed this through another 3rd party notification outside of Splunk

as @DalJeanis stated, knowing your data is pre-requisite, and that was the first qualification made in the original answer submitted

DalJeanis
SplunkTrust
SplunkTrust

@marycordovacaa - If you had asked that question - how do you pick up events where the _time is older than X but have been indexed after X, I would have given you THIS answer -

| multisearch 
   [ your search earliest=-4m@m latest=-2m@m
   ] 
   [ your search  earliest=-2h@m latest=-4m@m 
    | where _indextime>relative_time(now(),"-4m@m")
  ]

done.

marycordova
SplunkTrust
SplunkTrust

thanks, that's more helpful for small data sets like this, so i guess there is way if one gets more creative 🙂

for anyone else, tips inside that search query:

combining time designations like hours and minutes -2h@m
usecase for multisearch

DalJeanis
SplunkTrust
SplunkTrust

Thanks, @marycordovacaa - SPL and regexes are my specialties within the Trust. You'll also often find great tips like this from @somesoni2 and @woodcock, and useful straightforward answers by @adonio, @martin_mueller, @MuS and @jkat54 on more general subjects. If it's dashboarding, then you want to watch for answers by @niketnilay, who is our guru for that kind of thing.

With regard to multisearch, I used that only because of the where clause that is applicable for only half the search. Looking at it again, since the _indextime test will always be true for any events that meet the first search criteria, you can actually drop the multisearch and code it with OR and parenthesis and it will work just as well, like this...

( your search earliest=-4m@m latest=-2m@m) 
OR
( your search  earliest=-2h@m latest=-4m@m) 
| where _indextime>relative_time(now(),"-4m@m")

...and that, of course, resolves down to...

( your search  earliest=-2h@m latest=-2m@m) 
| where _indextime>relative_time(now(),"-4m@m")

...which, just in case something in this group of things being indexed is being assigned a "future-ish" _time, might be safest as...

( your search  earliest=-2h@m latest=-2m@m) 
| where _indextime>=relative_time(now(),"-4m@m") AND
       _indextime<relative_time(now(),"-2m@m") 

or, you can leave it as the multisearch, since the intention behind that is much more clear to a casual observer.

adonio
SplunkTrust
SplunkTrust

what is the question?
many times users leverage the _indextime field and something like _index_earliest=-6m@m _index_latest=-1m@m for alerts / saved searches purposes

marycordova
SplunkTrust
SplunkTrust

thanks for that tip, I hadn't thought of it 😄

0 Karma

somesoni2
Revered Legend

There is a delay before an indexed event is searchable (Transmission from Forwarders, event processing at indexer, indexing process and creating a searchable copy of the data), so it's often recommended to allow some delay, say 1-3 mins, when creating alert. E.g. if you want to run your search every minute, looking at one minute time period, your time-range can be -2m@m to -1m@m (allowing 1 min to indexing to be complete) OR -3m@m to -2m@m (allowing 2 min to indexing to be complete). This is because the time range of your search (earliest/latest) is applied on time not _indextime value, thus the event has to be searchable before a search can fetch it.
Alternately, you can use indextime time modifier in your search. (read 3rd para)
https://docs.splunk.com/Documentation/Splunk/7.1.2/SearchReference/SearchTimeModifiers#_time_and
_indextime_fields

marycordova
SplunkTrust
SplunkTrust

thanks, that is helpful but i would just highlight the caveat below P3:

Note: When using index-time based modifiers such as index_earliest and index_latest, your search must also have an event-time window which will retrieve the events. In other words, chunks of events might be ruled out based on the non index-time window as well as the index-time window. To be certain of retrieving every event based on index-time, you must run your search using All Time.

as we know...All Time is dangerous (and generally not the use case searches are based on)!

0 Karma

marycordova
SplunkTrust
SplunkTrust

Solution:

if the data set permits set _time to the following in props.conf: DATETIME_CONFIG = CURRENT

re-write -1m@m to @m searches to accommodate potential delays: -2m@m to -1m@m

you can be creative with searches that hourly or daily for example -61m@m to -1m@m

don't forget to adjust your cron schedule accordingly if necessary

DalJeanis
SplunkTrust
SplunkTrust

@marycordovacaa - We've unaccepted your answer. The first item in your answer -- which because of the way it is written, most people will read as if it were the primary solution -- is not advisable. (This is one of the few cases in which splunk answers allows down-votes: when the answer is ill-advised and likely to be harmful to the user's system.) If you'd like to reorder your information, perhaps putting the first item last and using a break between, and clarify that the first item is rarely applicable, then your solution would be more correct.

marycordova
SplunkTrust
SplunkTrust

@DalJeanis

  1. this is the solution confirmed as acceptable by Splunk Support
  2. answers should not be unaccepted if an acceptable answer is not provided
  3. no justification for the reason why the answer is not advisable
  4. no justification for the reason why this is likely to be harmful to the system

DalJeanis
SplunkTrust
SplunkTrust

@marycordovacaa - The problem is all the unspoken caveats in the line "if the dataset permits." We assume that Splunk Support has examined your use case, your particular data set, to give you that advice for that dataset. This is not a generally correct solution.

What that key word is telling the system is to ignore the explicit time recorded on the events, and instead use the current _indextime as the _time. There are very few data sets for which this would be appropriate and permitted.

Consider log files. If you used this setting, correlation searches would become completely untenable if the times of the individual events were constantly being altered randomly by differing delays in the ingestion pipeline. A later event sent to a faster indexer would get an earlier _time value. Thus, in an era of firewall/DHCP/NAT assignment and reuse of IPs, for example, properly deciding who had an IP at a given time that some other thing occurred would become even harder than it is.

Using this method to solve the problem of indexing delays is like trying to get moths out of your closet by throwing in a hand grenade. Therefore, that first suggestion is extremely unlikely to be the right answer for most people reading this question and answer.