Recently, Enterprise Security allowed for event timestamps to be index time instead of event time. I was excited about this since it would alleviate some issues related to log ingestion delays and outages. However, it appears there are some limitations which I have questions about. From the previously linked docs:
The Index time time range might not be applied correctly to the original correlation search with datamodels, stats, streaming, or lookup commands at the end of the search since the index time range is applied after the "savedseach" construct. Therefore, you must adjust the time range manually for the search. How might it not apply correctly? Is there a specific example?
When you select Index time to run the search, all the underlying searches are run using the '''All Time''' time range picker, which might impact the search performance. This includes the correlation search as well as the drill-down search of the notable adaptive response action. Additionally, the drill down search for the notable event in Incident Review also uses index time. Am I understanding that first sentence correctly? What possible reason could there be to run the underlying search over "All Time"? In that case, what purpose does the alert time range serve? This seems like a massive caveat that makes index time practically unusable.
Index time seemed super promising, but the fact that you can't use it with accelerated data models, that it searches over all time, and that it could modify drilldowns in mysterious and unknown ways makes me wonder what use it actually serves? These seem like major issues, but I wanted to make sure I wasn't misunderstanding something.
Splunk always uses event time to limit search range. It's the first and most important way of limiting the searched events because Splunk only searches through buckets holding events from given time range. So even if you're searching limiting index time, Splunk still has to use some limits for _time or search through All Time.
Also indextime is not included in any of the datamodels (and it shouldn't be - that makes no sense - it's the event's time that's important, not when it was finally reported). Therefore as it is not a part of datamodel fields, it will not be included in DAS. And since it's not included, you can't search by it.
"Index time filters" means adding conditions matching index time ranges.
And as I said before, _time is one thing, _indextime is another. Since you want to filter by _indextime you have no way of knowing what _time those events have. And since events are bucketed by _time, you have to search through All Time to find your events limited by _indextime. It's just how Splunk works.
Generally speaking, _indextime is a nice tool for troubleshooting Splunk itself but it's not a very useful field for searching real data in real life use cases.
Actually indextime has a WONDERFUL, very security relevant use case and that's for events with potentially delayed data. A great example is EDR data; if a user is off network for awhile and the agent can't report, when they do finally log on, their events may flow in with the proper timestamps for when the event occurred *however* because we are running our detections on our most recent events, detections will completely miss these.
In almost every other case, I'd recommend normal _time. But _indextime is very useful for this usecase. Also can be handy with RBA so notables don't fire as events from the beginning of the time window roll off the detection window despite having already fired a notable and APPEAR unique but throttling can't account for; explained here - https://splunk.github.io/rba/searches/deduplicate_notables/#method-ii
One could argue that it's not that _indextime is appropriate here but rather wrong _time is being indexed. But OTOH it makes you miss those several separate time fields from "traditional SIEMs". Like "event time", "receive time", "whatever time". I don't even remember how many separate timestamps ArcSight holds for each event - three? four?
When you've got ES control, but have to file a ticket that will take months to respond to from a Splunk core admin team for data issues, sometimes you just do what you gotta do. 🙂
A very interesting answer. I'm a little confused when you say: "So even if you're searching limiting index time, Splunk still has to use some limits for _time or search through All Time." It seems like index time could be that limit, no?
Trying to answer that question on my own, the issue it seems is that events in Splunk are fundamentally organized by event time. And when searching with index time, Splunk does not have that "inherent" knowledge of what the index time is for each event, like it does with event time. Therefore, it must search over All Time in order to gather that information. Does that sound correct?
I guess my follow up to all of this would be: in what situations is it ever appropriate to use index time instead of event time (specifically in the context of alert creation)? That and: what exactly is the effect on drilldowns?
No. Range of events contained within a bucket is stored in bucket directory name. So Splunk can easily judge whether to use that bucket in search. And in clustered setup Cluster Manager knows which peer has which buckets so Splunk can then decide to not even dispatch a search to peers which do not hold buckets covering interesting period of time.
You can check bucket parameters using the dbinspect command. An example from my home installation:
Splunk has no way of knowing when those events were indexed until it opens the bucket and reads contents of the tsidx files. Typically events are indexed in the past vs. the time they should have happened but sometimes you can ingest data regarding events which are suppose to happen in the future so there is no way of telling whether events within a bucket containing events with timestamps ranging from A to B were indexed before A, between A and B or after B. That's why limiting by indextime is "retroactive" - you do that on already opened bucket.
Can't tell you about the drilldowns because I've never used indextime for anything else than troubleshooting Splunk.
The only use case I see where index time would be appropriate is if you have sources with highly unreliable time sources and thus reporting higly unreliable timestamps. Of course then you'd probably want to define your sourcetype to use current timestamp on ingestion instead of parsing out the time from the event but sometimes you can't pinpoint the troublesome sources or you don't want to redefine the sourcetypes (for example you have a separate data admin responsible for onboarding data and you're only responsible for ES). In such case one could indeed try to use indextime to search using that on a selected subset of data. But it's a relatively uncommon scenario.
Hi @mobrien1 ,
I don't know the opinion of other colleagues, but I don't like to use indextime instead eventtime (replacing timestamp with _indextime in indexing), because in this way you loose the correlation time of your events, in other words if something.
if instead you want to maintain the event timestamp and use the _indextime for the searches, my question is one: why?
This request certainly comes from the need to compare the results with those of other SIEMs that used this modality, but in my opinion, Splunk's approach is more rigorous and effective, and anyway, you can add conditions using _indextime to your main searches (I did it for an Acceptance Test).
Anyway, if something happened at a certain time, I need that information to analyze the events in that time period, possibly from other sources, which perhaps were indexed before or after.
Anyway to answer your questions:
Last information: never use "All time" in your searches.
Ciao.
Giuseppe
You are going to miss data if you are using event time for security alerting. Event time stamps are unreliable. We have seen event times 2 years in the future due to system clocks misconfigurations. Event delays and outages are common. Our average delay is 20 minutes, SLA for delivery is 24 hours. If we want to run security alerting every hour to reduce the dwell time, we have to look back 24 hours instead of 1 hour. If we are running over 1K security searches, that adds up. On top of that, always a chance of missing a misconfigured clock unless we check AllTime.
Using the _indextime for alerting, and event time for analyzing the events would work perfect for our use case. Unfortunately, it seems to be not feasible with all the constraints in ES, so we have to run our searches for a very large time span to make sure we account for the event delays, we have to check future times, and we have to have an outage replay protocols. Very inconvenient, I wish we could just run searches on _indextime (every hour) with a broader _time (24 hours) (not AllTime).
There are pros and cons of everything of course.
But ES can't be - for example - a substitute for reliable time source and proper time synchronization. That's not what it's for. If you don't have a reliable time, how can you tell when the event really happened? If you have a case when the clock can be set to absolutely anything so you have to search All-Time, how can you tell when the event happened (not when it was reported)?
I can't, but at least I can catch that event with index time, correlate it with other security events and analyze to see a bigger picture. Things like that are expected in security world, and it's better to catch them with unreliable time than miss them. Being able to tell when something happened is not as critical as being able to tell it happened. Missing such events may mean a lot of damage to the company.
We are not asking for ES to be a time synchronization tool, but simply allowing to search on _indextime and _time would be incredibly useful.
I'm just saying you're putting the cart before the horse. You know _now_ that something happened. When? 10 minutes ago? 10 hours ago? 10 days ago? Do you know if you should react immediately and - for example isolate the workstation to prevent the threat from spreading in your infrastructure or whether you should rather focus on searching where it already spread to?
You're trying to solve a different problem than you have.
If you have sources which can be lagging, you should acount for that with your searches so you don't have situations where you miss the data because it came and got indexed outside of your search window. But that's different than just happily letting your clocks run loose and try to guess your way around it.
IMHO you're simply solving wrong problem.
But of course YMMV.
EDIT: Oh, and of course you have the possibility of using _indextime in your searches. It's just that _time is _the_ most important field about the event.
PS: If you think _indextime is the most important field for you, just use DATETIME_CONFIG=current and be done with it.
"_time is _the_ most important field " is precisely why we don't want to use the DATETIME_CONFIG=current solution. We are still using the _time, would be nice to use it together with _indextime.
We are operating at a scale too large to be fixing clocks. When a misconfiguration is intended, we first have to catch it. We have to "account for lagging sources with our searches", which means very large time windows. Plus missing data in case of outages, so have to replay those searches to cover the outage timeframes.
In any case, we are used to Splunk products being restrictive and making a lot of assumptions on how the customers should use it. We are working around exactly as you described it, just would be nice to have more options.
Well, even if you use index time as _time, you still can extract and use event's time as a field. You can also use _indextime directly or even extract event time as an indexed field to use it fast. There are several possibilities. It's just that by default Splunk works in a specific way.
And I still think (and it's actually not connected to Splunk itself) that lack of proper time synchronization is an important issue for any monitoring ans security monitoring even more so.
True, some SIEMs do have several separate time fields for any event but on the other hand they have very rigid parsing rules and once you have your data indexed, it's over. So each approach has its pros and cons. Splunk's bucketing by _time has one huuuuuge advantage - it speeds up searches by limiting whole buckets from being searched.
I guess the main reason I was interested in index time was because it solved issues with ingestion delays or outages. Splunk outlines what I'm talking about in their docs:
"Selecting Index Time when configuring correlation searches prevents event lag and improves security alerting. This is because configuring correlation searches using the Index time range can more effectively monitor data that arrives late and run the correlation searches against that data. Therefore, configure and run correlation searches by Index time to avoid the time lag in the search results and focus on the most recent events during an investigation.
For example: Deploy a correlation search (R1) that runs every five minutes and checks for a particular scenario (S1) within the last 5 minutes to fire an alert whenever S1 is found. Correlation searches are based on extracted time. So, when S1 events are delayed by five minutes, no alerts might be triggered by R1 because the five minute window checked by the continuous, scheduled R1 never re-scans the events from a previous, already checked window. Despite those delayed events being known to exist, R1 is already set to check another time window, thereby, missing the opportunity to detect S1 behavior from delayed events. When correlation searches use extracted time, some events may land on the indexers a bit later due to a transport bottleneck such as network or processing queue. Event lag is a common concern for some Cloud data sources because some data might take an extended period of time to come into Splunk after the event is generated."