Splunk Search

How do I get time of earliest indexed event?


I want to get the earliest time that an event was indexed in each of my indexes--not the time of the event itself, but the time it was indexed.

long story:
My goal is to calculate the amount of space on the filesystem taken up per day by each index.

First I tried this, but it doesn't seem to reflect the actual space used on the filesystem:

index=_internal per_index_thruput NOT series=_* earliest=-4w | eval MB=kb/1024 | timechart span=1d sum(MB) by series

After that I figured I could perhaps calculate the size of the indexes on the filesystem manually and then get the earliest time an event was indexed, that way I could at least get an average, but getting that time hasn't been as straightforward as I'd hoped:

  • I first tried looking at the timestamp in the filename of the oldest bucket on the filesystem, but that is for the time of the oldest event, not the time it was indexed.

  • Next I tried using the metadata command, but that also seems to only keep the time of the event, not the time it was indexed.

  • This seems to work, but it takes an unreasonable amount of time:

    index=main | stats min(_indextime)

  • This is the best thing I've been able to come up with so far:

    index="_internal" group="per_index_thruput" series="main" | stats min(_time)
    index="_internal" group="per_index_thruput" NOT series=_* | stats min(_time) by series

    It doesn't take very long, but I'm not sure how accurate it is. At best it seems to be less than a minute off. And I'm guessing once data starts being removed (we haven't had any data moved to frozen yet) this won't be accurate at all any more.

Splunk Employee
Splunk Employee

I might suggest using the | metadata command:

| metadata type=sourcetypes index=_internal | stats min(firstTime) as firstTime

Unfortunately, it doesn't break it down by index, so you'd have to run it for each index separately. You could also use | dbinspect:

| dbinspect index=* index=_* timeformat="%s" 
| rex field=path "^(?<base>.*)/[^\/]+" 
| stats min(earliestTime) as earliestTime by base

You can figure out the index from the path field, which I've done above with a hacky regex, but you can maybe do better. Unfortunately with this command, you have to run it separately on each indexer, as it doesn't work with distributed search. On the other hand, if you're looking at disk space, maybe that's okay.

Update: Looks to me like the | dbinspect command will be most useful for your purposes.


the above command returns an error stating
Could not find an index named *

Splunk Employee
Splunk Employee

oh okay. there's no great solution. The best you can do is probably look at the modification/creation times of the bucket directories. The bucket index number is sequential based on the creation time, so you can approximately figure this out, and | dbinspect might give you information to efficiently get this approximation.

0 Karma


It looks to me like both of those commands are returning the earliest time an event happened, but I need the time that the first event was indexed, regardless of when that event actually took place. For instance, when I run those searches on one of my indexes, I get a timestamp from 2005. My servers have only been running a couple months. Sorry if I wasn't clear about that.

0 Karma