<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to leverage a secondary time field at scale? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/How-to-leverage-a-secondary-time-field-at-scale/m-p/541562#M153330</link>
    <description>&lt;P&gt;&lt;SPAN&gt;I have a dataset that has both&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;FINISHDATE&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;and&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;CHANGEDATE&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;fields in text.&amp;nbsp; We use&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;strptime(CHANGEDATE)&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;for&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;_time&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;but we often need to filter only on&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;FINISHDATE&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;which means that we have to do an&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;All time&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;search, followed by&lt;STRONG&gt; |&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;STRONG&gt;where strptime(FINISHDATE)...&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;which is super inefficient.&amp;nbsp; I am thinking that maybe an&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Accelerated Data Model&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;with&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;FINISHDATE&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;setup as a number could help but I am not sure.&amp;nbsp; If I create this data model, how can I use it to bring back raw events quickly filtered by the&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;FINISHDATE&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;field?&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I created a calculated field like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;FINISHDATE_ = strptime(FINISHDATE, ...)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then I created datamodel &lt;STRONG&gt;DOUBLE_DATE&lt;/STRONG&gt; like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Extracted field: FINISHDATE_ = Number
Extracted field: ID = String&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;Now I have several options:&lt;BR /&gt;&lt;BR /&gt;I can use &lt;STRONG&gt;| from&lt;/STRONG&gt; like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| from datamodel DOUBLE_DATE 
| search FINISHDATE_ &amp;gt; 1607299625&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I can use &lt;STRONG&gt;| datamodel&lt;/STRONG&gt; like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| datamodel DOUBLE_DATE flat
| search FINISHDATE_ &amp;gt; 1607299625&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;Or I can use&lt;STRONG&gt; |&amp;nbsp;tstats&lt;/STRONG&gt; like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| tstats count
FROM datamodel=DOUBLE_DATE 
WHERE DOUBLE_DATE.FINISHDATE_ &amp;gt; 1607299625
BY DOUBLE_DATE.ID
| rename DOUBLE_DATE.* AS *&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The problem is that I am not sure that any of these does what I need for it to do which is to leverage the secondary time field in &lt;STRONG&gt;DOUBLE_DATE.FINISHDATE_&lt;/STRONG&gt; in the foundational search (before the first pipe) so that it will BOTH&amp;nbsp;&lt;STRONG&gt;return the full raw events&lt;/STRONG&gt; AND &lt;STRONG&gt;be very fast&lt;/STRONG&gt;.&amp;nbsp; The only way that I can see to do that is to use the last one above as a subsearch to a regular search but that would be klunky and have its own problems.&amp;nbsp; Is there some other new/schema-acceleration option that I am missing?&lt;BR /&gt;&lt;BR /&gt;Why don't I just use &lt;STRONG&gt;INGEST_EVAL&lt;/STRONG&gt; and create an &lt;STRONG&gt;index-time&lt;/STRONG&gt; field?&amp;nbsp; Because we have already done millions of events over 10 years and I am trying to avoid having to reindex them.&amp;nbsp;&amp;nbsp;&lt;SPAN&gt;But if I did this &lt;/SPAN&gt;&lt;SPAN&gt;and I setup &lt;STRONG&gt;fields.conf&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;SPAN&gt;so that Splunk knows that it is an &lt;STRONG&gt;index-time&lt;/STRONG&gt; field, then I would be able to use &lt;STRONG&gt;AND FINISHDATE_ &amp;gt; 1607299625&lt;/STRONG&gt;.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;Why don't I just test more?&amp;nbsp; I have to wait for the admin to accelerate my datamodel and that may be a while.&amp;nbsp; I am suspecting that the &lt;STRONG&gt;| datamodel .. | search&lt;/STRONG&gt; might trigger a &lt;STRONG&gt;schema-accelerated search optimization&lt;/STRONG&gt; and "just work".&amp;nbsp; I will post an update after testing.&lt;/P&gt;</description>
    <pubDate>Fri, 26 Feb 2021 18:13:43 GMT</pubDate>
    <dc:creator>woodcock</dc:creator>
    <dc:date>2021-02-26T18:13:43Z</dc:date>
    <item>
      <title>How to leverage a secondary time field at scale?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-leverage-a-secondary-time-field-at-scale/m-p/541562#M153330</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I have a dataset that has both&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;FINISHDATE&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;and&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;CHANGEDATE&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;fields in text.&amp;nbsp; We use&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;strptime(CHANGEDATE)&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;for&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;_time&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;but we often need to filter only on&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;FINISHDATE&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;which means that we have to do an&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;All time&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;search, followed by&lt;STRONG&gt; |&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;STRONG&gt;where strptime(FINISHDATE)...&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;which is super inefficient.&amp;nbsp; I am thinking that maybe an&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Accelerated Data Model&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;with&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;FINISHDATE&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;setup as a number could help but I am not sure.&amp;nbsp; If I create this data model, how can I use it to bring back raw events quickly filtered by the&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;FINISHDATE&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;field?&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I created a calculated field like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;FINISHDATE_ = strptime(FINISHDATE, ...)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then I created datamodel &lt;STRONG&gt;DOUBLE_DATE&lt;/STRONG&gt; like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Extracted field: FINISHDATE_ = Number
Extracted field: ID = String&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;Now I have several options:&lt;BR /&gt;&lt;BR /&gt;I can use &lt;STRONG&gt;| from&lt;/STRONG&gt; like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| from datamodel DOUBLE_DATE 
| search FINISHDATE_ &amp;gt; 1607299625&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I can use &lt;STRONG&gt;| datamodel&lt;/STRONG&gt; like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| datamodel DOUBLE_DATE flat
| search FINISHDATE_ &amp;gt; 1607299625&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;Or I can use&lt;STRONG&gt; |&amp;nbsp;tstats&lt;/STRONG&gt; like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| tstats count
FROM datamodel=DOUBLE_DATE 
WHERE DOUBLE_DATE.FINISHDATE_ &amp;gt; 1607299625
BY DOUBLE_DATE.ID
| rename DOUBLE_DATE.* AS *&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The problem is that I am not sure that any of these does what I need for it to do which is to leverage the secondary time field in &lt;STRONG&gt;DOUBLE_DATE.FINISHDATE_&lt;/STRONG&gt; in the foundational search (before the first pipe) so that it will BOTH&amp;nbsp;&lt;STRONG&gt;return the full raw events&lt;/STRONG&gt; AND &lt;STRONG&gt;be very fast&lt;/STRONG&gt;.&amp;nbsp; The only way that I can see to do that is to use the last one above as a subsearch to a regular search but that would be klunky and have its own problems.&amp;nbsp; Is there some other new/schema-acceleration option that I am missing?&lt;BR /&gt;&lt;BR /&gt;Why don't I just use &lt;STRONG&gt;INGEST_EVAL&lt;/STRONG&gt; and create an &lt;STRONG&gt;index-time&lt;/STRONG&gt; field?&amp;nbsp; Because we have already done millions of events over 10 years and I am trying to avoid having to reindex them.&amp;nbsp;&amp;nbsp;&lt;SPAN&gt;But if I did this &lt;/SPAN&gt;&lt;SPAN&gt;and I setup &lt;STRONG&gt;fields.conf&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;SPAN&gt;so that Splunk knows that it is an &lt;STRONG&gt;index-time&lt;/STRONG&gt; field, then I would be able to use &lt;STRONG&gt;AND FINISHDATE_ &amp;gt; 1607299625&lt;/STRONG&gt;.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;Why don't I just test more?&amp;nbsp; I have to wait for the admin to accelerate my datamodel and that may be a while.&amp;nbsp; I am suspecting that the &lt;STRONG&gt;| datamodel .. | search&lt;/STRONG&gt; might trigger a &lt;STRONG&gt;schema-accelerated search optimization&lt;/STRONG&gt; and "just work".&amp;nbsp; I will post an update after testing.&lt;/P&gt;</description>
      <pubDate>Fri, 26 Feb 2021 18:13:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-leverage-a-secondary-time-field-at-scale/m-p/541562#M153330</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2021-02-26T18:13:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to leverage a secondary time field at scale?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-leverage-a-secondary-time-field-at-scale/m-p/541673#M153369</link>
      <description>&lt;P&gt;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/1406"&gt;@woodcock&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If your date format is fixed, e.g. %Y-%m-%d, you could use terms to limit the initial result set. If the date is in _raw with a field name and only minor breakers, e.g. FINISHDATE=%Y-%m-%d, you can further reduce the result set.&lt;/P&gt;&lt;P&gt;E.g.: With %Y-%m-%d and FINISHDATE&amp;gt;1607299625, FINISHDATE is 2020-07-06, and your terms would include (assuming latest=now as of this message):&lt;/P&gt;&lt;P&gt;(TERM(2020-07-07) OR TERM(2020-07-08) OR TERM(2020-07-09) OR TERM(2020-07-1*) OR TERM(2020-07-2*) OR TERM(2020-07-3*) OR TERM(2020-08-*) OR TERM(2020-09-*) OR TERM(2020-1*-*) OR (TERM(20*-*-*) NOT TERM(2020-*-*)))&lt;/P&gt;&lt;P&gt;With FINISHDATE=%Y-%m-%d:&lt;/P&gt;&lt;P&gt;(TERM(FINISHDATE=2020-07-07) OR TERM(FINISHDATE=2020-07-08) OR TERM(FINISHDATE=2020-07-09) OR TERM(FINISHDATE=2020-07-1*) OR TERM(FINISHDATE=2020-07-2*) OR TERM(FINISHDATE=2020-07-3*) OR TERM(FINISHDATE=2020-08-*) OR TERM(FINISHDATE=2020-09-*) OR TERM(FINISHDATE=2020-1*-*) OR (TERM(FINISHDATE=20*-*-*) NOT TERM(FINISHDATE=2020-*-*)))&lt;/P&gt;&lt;P&gt;The first set of terms will return more events than you need, but you'll have reduced the number of events piping through a subsequent where or search command.&lt;/P&gt;&lt;P&gt;The challenge becomes writing a family of date comparison macros (eq, ne, gt, lt, etc.) that convert an epoch value to a set of reduced terms, perhaps including a time format string parameter.&lt;/P&gt;&lt;P&gt;A similar set of reduced terms can be constructed for time values in e.g. %H:%M:%S format.&lt;/P&gt;&lt;P&gt;The only reference to "schema-accelerated search optimization" I can find in documentation is "improved data model drilldown" in the release notes for Splunk Enterprise 7.1. I believe this is a reference to data model search modes: search, flat, and acceleration_search.&lt;/P&gt;&lt;P&gt;As a test, I created an eval field:&lt;/P&gt;&lt;P&gt;EVAL-FINISHDATE_EPOCH = strptime(FINISHDATE, "%Y-%m-%d")&lt;/P&gt;&lt;P&gt;I then created an accelerated data model named change_with_finishdate with a root data set also named change_with_finishdate and a single auto-extracted field: FINISHDATE_EPOCH.&lt;/P&gt;&lt;P&gt;Searching the data model:&lt;/P&gt;&lt;P&gt;| datamodel summariesonly=t change_with_finishdate change_with_finishdate search&lt;BR /&gt;| search change_with_finishdate.FINISHDATE_EPOCH&amp;gt;1607299625&lt;/P&gt;&lt;P&gt;Or to strip data set prefixes from field names:&lt;/P&gt;&lt;P&gt;| datamodel summariesonly=t change_with_finishdate change_with_finishdate flat&lt;BR /&gt;| search FINISHDATE_EPOCH&amp;gt;1607299625&lt;/P&gt;&lt;P&gt;Splunk adds directives to the search:&lt;/P&gt;&lt;P&gt;| search (FINISHDATE_EPOCH=* index=main sourcetype=change_with_finishdate FINISHDATE_EPOCH&amp;gt;1607299625 (index=* OR index=_*)) DIRECTIVES(READ_SUMMARY(allow_old_summaries="false" dmid="CB3E9141-C5FE-4B9D-A458-16D364E70DC6_DM_search_change_with_finishdate" name="change_with_finishdate.change_with_finishdate" predicate="*" summariesonly="true"),READ_SUMMARY(predicate="\"change_with_finishdate.FINISHDATE_EPOCH\"&amp;gt;1607299625")) | fields + _time, host, source, sourcetype, FINISHDATE_EPOCH&lt;/P&gt;&lt;P&gt;Even with hints (DIRECTIVES, READ_SUMMARY, dmid), I can't find documentation, .conf presentations, etc. on this feature.&lt;/P&gt;&lt;P&gt;In any case, raw events are returned, presumably filtered by the accelerated data model search.&lt;/P&gt;&lt;P&gt;In my very small test environment, a direct search using TERM as above is faster than using a data model. I'm very curious about what your results will be in a larger index. Day to day, I do find that effectively using index terms often outperforms data models.&lt;/P&gt;</description>
      <pubDate>Sun, 28 Feb 2021 17:03:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-leverage-a-secondary-time-field-at-scale/m-p/541673#M153369</guid>
      <dc:creator>tscroggins</dc:creator>
      <dc:date>2021-02-28T17:03:56Z</dc:date>
    </item>
    <item>
      <title>Re: How to leverage a secondary time field at scale?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-leverage-a-secondary-time-field-at-scale/m-p/542710#M153729</link>
      <description>&lt;P&gt;Yes, this would work but it would be too hand-tuned and inflexible to handle general timepicker use.&lt;/P&gt;</description>
      <pubDate>Sun, 07 Mar 2021 00:56:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-leverage-a-secondary-time-field-at-scale/m-p/542710#M153729</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2021-03-07T00:56:33Z</dc:date>
    </item>
    <item>
      <title>Re: How to leverage a secondary time field at scale?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-leverage-a-secondary-time-field-at-scale/m-p/542712#M153731</link>
      <description>&lt;P&gt;Agreed. Hence: "&lt;SPAN&gt;The challenge becomes writing a family of date comparison macros (eq, ne, gt, lt, etc.) that convert an epoch value to a set of reduced terms, perhaps including a time format string parameter." I haven't fiddled with it yet.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;What did your data model experiments reveal?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 07 Mar 2021 01:12:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-leverage-a-secondary-time-field-at-scale/m-p/542712#M153731</guid>
      <dc:creator>tscroggins</dc:creator>
      <dc:date>2021-03-07T01:12:23Z</dc:date>
    </item>
  </channel>
</rss>

