How to leverage a secondary time field at scale?

woodcock · ‎02-26-2021

I have a dataset that has both FINISHDATE and CHANGEDATE fields in text. We use strptime(CHANGEDATE) for _time but we often need to filter only on FINISHDATE which means that we have to do an All time search, followed by | where strptime(FINISHDATE)... which is super inefficient. I am thinking that maybe an Accelerated Data Model with FINISHDATE setup as a number could help but I am not sure. If I create this data model, how can I use it to bring back raw events quickly filtered by the FINISHDATE field?

I created a calculated field like this:

FINISHDATE_ = strptime(FINISHDATE, ...)

Then I created datamodel DOUBLE_DATE like this:

Extracted field: FINISHDATE_ = Number
Extracted field: ID = String

Now I have several options:

I can use | from like this:

| from datamodel DOUBLE_DATE 
| search FINISHDATE_ > 1607299625

I can use | datamodel like this:

| datamodel DOUBLE_DATE flat
| search FINISHDATE_ > 1607299625

Or I can use | tstats like this:

| tstats count
FROM datamodel=DOUBLE_DATE 
WHERE DOUBLE_DATE.FINISHDATE_ > 1607299625
BY DOUBLE_DATE.ID
| rename DOUBLE_DATE.* AS *

The problem is that I am not sure that any of these does what I need for it to do which is to leverage the secondary time field in DOUBLE_DATE.FINISHDATE_ in the foundational search (before the first pipe) so that it will BOTH return the full raw events AND be very fast. The only way that I can see to do that is to use the last one above as a subsearch to a regular search but that would be klunky and have its own problems. Is there some other new/schema-acceleration option that I am missing?

Why don't I just use INGEST_EVAL and create an index-time field? Because we have already done millions of events over 10 years and I am trying to avoid having to reindex them. But if I did this and I setup fields.conf so that Splunk knows that it is an index-time field, then I would be able to use AND FINISHDATE_ > 1607299625.

Why don't I just test more? I have to wait for the admin to accelerate my datamodel and that may be a while. I am suspecting that the | datamodel .. | search might trigger a schema-accelerated search optimization and "just work". I will post an update after testing.

tscroggins · ‎02-28-2021

@woodcock

If your date format is fixed, e.g. %Y-%m-%d, you could use terms to limit the initial result set. If the date is in _raw with a field name and only minor breakers, e.g. FINISHDATE=%Y-%m-%d, you can further reduce the result set.

E.g.: With %Y-%m-%d and FINISHDATE>1607299625, FINISHDATE is 2020-07-06, and your terms would include (assuming latest=now as of this message):

(TERM(2020-07-07) OR TERM(2020-07-08) OR TERM(2020-07-09) OR TERM(2020-07-1*) OR TERM(2020-07-2*) OR TERM(2020-07-3*) OR TERM(2020-08-*) OR TERM(2020-09-*) OR TERM(2020-1*-*) OR (TERM(20*-*-*) NOT TERM(2020-*-*)))

With FINISHDATE=%Y-%m-%d:

(TERM(FINISHDATE=2020-07-07) OR TERM(FINISHDATE=2020-07-08) OR TERM(FINISHDATE=2020-07-09) OR TERM(FINISHDATE=2020-07-1*) OR TERM(FINISHDATE=2020-07-2*) OR TERM(FINISHDATE=2020-07-3*) OR TERM(FINISHDATE=2020-08-*) OR TERM(FINISHDATE=2020-09-*) OR TERM(FINISHDATE=2020-1*-*) OR (TERM(FINISHDATE=20*-*-*) NOT TERM(FINISHDATE=2020-*-*)))

The first set of terms will return more events than you need, but you'll have reduced the number of events piping through a subsequent where or search command.

The challenge becomes writing a family of date comparison macros (eq, ne, gt, lt, etc.) that convert an epoch value to a set of reduced terms, perhaps including a time format string parameter.

A similar set of reduced terms can be constructed for time values in e.g. %H:%M:%S format.

The only reference to "schema-accelerated search optimization" I can find in documentation is "improved data model drilldown" in the release notes for Splunk Enterprise 7.1. I believe this is a reference to data model search modes: search, flat, and acceleration_search.

As a test, I created an eval field:

EVAL-FINISHDATE_EPOCH = strptime(FINISHDATE, "%Y-%m-%d")

I then created an accelerated data model named change_with_finishdate with a root data set also named change_with_finishdate and a single auto-extracted field: FINISHDATE_EPOCH.

Searching the data model:

| datamodel summariesonly=t change_with_finishdate change_with_finishdate search
| search change_with_finishdate.FINISHDATE_EPOCH>1607299625

Or to strip data set prefixes from field names:

| datamodel summariesonly=t change_with_finishdate change_with_finishdate flat
| search FINISHDATE_EPOCH>1607299625

Splunk adds directives to the search:

| search (FINISHDATE_EPOCH=* index=main sourcetype=change_with_finishdate FINISHDATE_EPOCH>1607299625 (index=* OR index=_*)) DIRECTIVES(READ_SUMMARY(allow_old_summaries="false" dmid="CB3E9141-C5FE-4B9D-A458-16D364E70DC6_DM_search_change_with_finishdate" name="change_with_finishdate.change_with_finishdate" predicate="*" summariesonly="true"),READ_SUMMARY(predicate="\"change_with_finishdate.FINISHDATE_EPOCH\">1607299625")) | fields + _time, host, source, sourcetype, FINISHDATE_EPOCH

Even with hints (DIRECTIVES, READ_SUMMARY, dmid), I can't find documentation, .conf presentations, etc. on this feature.

In any case, raw events are returned, presumably filtered by the accelerated data model search.

In my very small test environment, a direct search using TERM as above is faster than using a data model. I'm very curious about what your results will be in a larger index. Day to day, I do find that effectively using index terms often outperforms data models.

woodcock · ‎03-06-2021

Yes, this would work but it would be too hand-tuned and inflexible to handle general timepicker use.

tscroggins · ‎03-06-2021

Agreed. Hence: "The challenge becomes writing a family of date comparison macros (eq, ne, gt, lt, etc.) that convert an epoch value to a set of reduced terms, perhaps including a time format string parameter." I haven't fiddled with it yet.

What did your data model experiments reveal?

How to leverage a secondary time field at scale?

fields

tstats

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!

Join the Conversation

How to leverage a secondary time field at scale?

fields

tstats

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!