Archive

Search for oldest event in splunk by _indextime to test data retention.

Engager

Hi. I have been trying to create a search that will return the _indextime (because log times of events may not be reliable) of the earliest event for each index which I will use to determine how long data is retained.

Currently my search is: index=* | chart min(_indextime) by index

Is there a way to speed up this search or another way to obtain this information with out going thorough every event on every index?

Tags (2)
1 Solution

SplunkTrust
SplunkTrust

Fun question.

First stop, it would be awesome if the metadata command could do type="indexes" because then you could use that command's firstTime field to display the oldest timestamp in each index. However it cannot, it can only do sources, sourcetypes and hosts.

Moving on, the eventcount command runs in essentially constant time and it can give you the list of indexes, so there's one piece of the puzzle.

| eventcount summarize="false" index=*

but it wont give you any earliest or latest time information for that index. So by itself it's kind of a dead-end.

Next up, the dbinspect command can give you the timestamp of the oldest event very cheaply. The only catch is that it can only do one index at a time and you have to specify the index in the command, like so:

| dbinspect index=main | eval earliestEpoch=strptime(earliestTime,"%m/%d/%Y:%H:%M:%S") | stats min(earliestEpoch) as earliest

from this dbinspect search, to actually get the oldest event for that particular index is then a simple subsearch:

index="main" latest=[| dbinspect index=main | eval earliestEpoch=strptime(earliestTime,"%m/%d/%Y:%H:%M:%S") | stats min(earliestEpoch) as earliest | eval earliest=earliest+1 | rename earliest as search | table search] | tail 1

What we're doing in the above is taking the timestamp from dbinspect, and plugging it in as a a latest="" searchterm in our main search. And the | tail 1 is because there might well be more than one event in that single second. And also the | eval earliest=earliest+1 is there because of how earliest time searches include the given second itself, but endtime searches do not. So If we don't add that extra second then we'll get no events.

Backing up a second, even if we'd had our wish and the metadata command could have given us our earliest timestamps per index, We wouldn't be able to plug this into a subsearch to get our latest events, because you can only specify one timerange per search in Splunk. Although you can use earliest and latest, you can't actually combine these terms with boolean operators and parentheses. And because what we'd want to yield out of our subsearch would look like (latest=123213141 index=A ) OR (latest=12321421 index=B) well that wouldn't have worked anyway.

So from that logic it seems that, assuming we don't want to dispatch one giant search that pulls every event off every index (ouch), if we want to get only a handful of events off disk it seems that we're going to have to dispatch one search per index somehow anyway.

You might look to the map command, since that's exactly what map does; it takes the incoming search results and runs the subsearch pipeline one time for each row. So you could in theory pipe the eventcount command's output to map somehow. the problem here is that the search pipelines executed in map can't themselves use the field values from the input rows as $foo$ tokens, so we can't actually use the index="$index$" in our mapped dbinspect. Phew. So it's a brainbender but ultimately a dead-end.

And I can't think of any more search language tricks to throw at it !! Maybe someone else can.

However, if your end goal is to simply see the oldest event per index, then we can resort to tricks in the UI layer, and the Multiplexer module from Sideview Utils can be used to do this in a regular old Advanced XML view.

Here is an example that works and ultimately shows you the latest event for each index. It dispatches one search per index, but each of those searches will only search the very oldest second in that index, and as a result it's surprisingly efficient! It renders each of the oldest event in a simple list view where each event has an "index=foo" header above it identifying which index it's from.

<view onunloadCancelJobs="true" template="dashboard.html" isSticky="False">
  <label>Using Multiplexer to help get oldest event per index</label>
  <module name="AccountBar" layoutPanel="appHeader" />
  <module name="AppBar" layoutPanel="appHeader" />
  <module name="SideviewUtils" layoutPanel="appHeader" />

  <module name="Message" layoutPanel="messaging">
    <param name="filter">*</param>
    <param name="maxSize">2</param>
    <param name="clearOnJobDispatch">False</param>
  </module>

  <module name="HTML" layoutPanel="viewHeader">
    <param name="html"><![CDATA[
    <h1>Using Multiplexer to help get oldest event per index</h1>
    ]]></param>
  </module>

  <module name="Search" layoutPanel="panel_row1_col1" autoRun="True">
    <param name="search">| eventcount summarize="false" index=* | where count!="0"</param>

    <module name="Pager">
      <module name="Multiplexer">
        <param name="field">index</param>
        <module name="Search">
          <param name="search"><![CDATA[
            index="$index$" latest=[| dbinspect index=$index$ | eval earliestEpoch=strptime(earliestTime,"%m/%d/%Y:%H:%M:%S") | stats min(earliestEpoch) as earliest | rename earliest as search | eval search=search+1 | table search] | tail 1 | table _raw
            ]]></param>
          <module name="HTML">
            <param name="html"><![CDATA[
              <b>index=$index$</b>
            ]]></param>
          </module>
          <module name="JobProgressIndicator" />
          <module name="EventsViewer">
          </module>
        </module>
      </module>
    </module>
  </module>
</view>

Note that you'll have to get a recent build of Sideview Utils from the Sideview site in order to get the Multiplexer module; the old version on Splunkbase won't have it I'm afraid.

View solution in original post

Path Finder

This gave me what i needed:

| dbinspect [eventcount summarize="false" index=* | dedup index | fields index] | stats min(startEpoch) AS startEpoch,min(modTime) AS modTime by index,splunkserver | convert ctime(startEpoch) AS startEpoch | sort index,splunkserver,modTime | rename modTime AS "Oldest Bucket Closed For Writing",startEpoch AS "Earliest timestamp of Event in index"

SplunkTrust
SplunkTrust

Fun question.

First stop, it would be awesome if the metadata command could do type="indexes" because then you could use that command's firstTime field to display the oldest timestamp in each index. However it cannot, it can only do sources, sourcetypes and hosts.

Moving on, the eventcount command runs in essentially constant time and it can give you the list of indexes, so there's one piece of the puzzle.

| eventcount summarize="false" index=*

but it wont give you any earliest or latest time information for that index. So by itself it's kind of a dead-end.

Next up, the dbinspect command can give you the timestamp of the oldest event very cheaply. The only catch is that it can only do one index at a time and you have to specify the index in the command, like so:

| dbinspect index=main | eval earliestEpoch=strptime(earliestTime,"%m/%d/%Y:%H:%M:%S") | stats min(earliestEpoch) as earliest

from this dbinspect search, to actually get the oldest event for that particular index is then a simple subsearch:

index="main" latest=[| dbinspect index=main | eval earliestEpoch=strptime(earliestTime,"%m/%d/%Y:%H:%M:%S") | stats min(earliestEpoch) as earliest | eval earliest=earliest+1 | rename earliest as search | table search] | tail 1

What we're doing in the above is taking the timestamp from dbinspect, and plugging it in as a a latest="" searchterm in our main search. And the | tail 1 is because there might well be more than one event in that single second. And also the | eval earliest=earliest+1 is there because of how earliest time searches include the given second itself, but endtime searches do not. So If we don't add that extra second then we'll get no events.

Backing up a second, even if we'd had our wish and the metadata command could have given us our earliest timestamps per index, We wouldn't be able to plug this into a subsearch to get our latest events, because you can only specify one timerange per search in Splunk. Although you can use earliest and latest, you can't actually combine these terms with boolean operators and parentheses. And because what we'd want to yield out of our subsearch would look like (latest=123213141 index=A ) OR (latest=12321421 index=B) well that wouldn't have worked anyway.

So from that logic it seems that, assuming we don't want to dispatch one giant search that pulls every event off every index (ouch), if we want to get only a handful of events off disk it seems that we're going to have to dispatch one search per index somehow anyway.

You might look to the map command, since that's exactly what map does; it takes the incoming search results and runs the subsearch pipeline one time for each row. So you could in theory pipe the eventcount command's output to map somehow. the problem here is that the search pipelines executed in map can't themselves use the field values from the input rows as $foo$ tokens, so we can't actually use the index="$index$" in our mapped dbinspect. Phew. So it's a brainbender but ultimately a dead-end.

And I can't think of any more search language tricks to throw at it !! Maybe someone else can.

However, if your end goal is to simply see the oldest event per index, then we can resort to tricks in the UI layer, and the Multiplexer module from Sideview Utils can be used to do this in a regular old Advanced XML view.

Here is an example that works and ultimately shows you the latest event for each index. It dispatches one search per index, but each of those searches will only search the very oldest second in that index, and as a result it's surprisingly efficient! It renders each of the oldest event in a simple list view where each event has an "index=foo" header above it identifying which index it's from.

<view onunloadCancelJobs="true" template="dashboard.html" isSticky="False">
  <label>Using Multiplexer to help get oldest event per index</label>
  <module name="AccountBar" layoutPanel="appHeader" />
  <module name="AppBar" layoutPanel="appHeader" />
  <module name="SideviewUtils" layoutPanel="appHeader" />

  <module name="Message" layoutPanel="messaging">
    <param name="filter">*</param>
    <param name="maxSize">2</param>
    <param name="clearOnJobDispatch">False</param>
  </module>

  <module name="HTML" layoutPanel="viewHeader">
    <param name="html"><![CDATA[
    <h1>Using Multiplexer to help get oldest event per index</h1>
    ]]></param>
  </module>

  <module name="Search" layoutPanel="panel_row1_col1" autoRun="True">
    <param name="search">| eventcount summarize="false" index=* | where count!="0"</param>

    <module name="Pager">
      <module name="Multiplexer">
        <param name="field">index</param>
        <module name="Search">
          <param name="search"><![CDATA[
            index="$index$" latest=[| dbinspect index=$index$ | eval earliestEpoch=strptime(earliestTime,"%m/%d/%Y:%H:%M:%S") | stats min(earliestEpoch) as earliest | rename earliest as search | eval search=search+1 | table search] | tail 1 | table _raw
            ]]></param>
          <module name="HTML">
            <param name="html"><![CDATA[
              <b>index=$index$</b>
            ]]></param>
          </module>
          <module name="JobProgressIndicator" />
          <module name="EventsViewer">
          </module>
        </module>
      </module>
    </module>
  </module>
</view>

Note that you'll have to get a recent build of Sideview Utils from the Sideview site in order to get the Multiplexer module; the old version on Splunkbase won't have it I'm afraid.

View solution in original post

Engager

Thank you, this was very helpful.

0 Karma