Getting Data In

how can i tell when splunk is finished indexing a log file?

Path Finder

Is there an internal log message that will tell me when Splunk has finished indexing a file?

1 Solution

Splunk Employee
Splunk Employee
  • When "monitoring" a file, Splunk never "finishes" since it can not know that when it reaches the end of the file that it will not be appended. Reaching the end of file is not recorded, since in normal operation, Splunk will often reach the end of file on every single event in the monitored log file, since it can read faster than logs are usually written.

  • In batch mode, the file will be deleted when it is finished.

  • In oneshot mode, you can query the REST API at https://localhost:8089/services/data/inputs/oneshot to see if the input is present; if it isn't, then indexing is complete.

View solution in original post

Super Champion

It looks like there is a newer option where you can get some of this information from a REST call. You can point your browser the the following URL (make sure you adjust for your server/port) and see what your monitor stanzas are up to:

https://splunk-server:8089/services/admin/inputstatus/TailingProcessor:FileStatus

It looks that file that were completely read show up with a "percent" of "100.00", also the "type" shows up as "finished reading". I haven't seen any files that not yet caught up, so I'm not sure what that looks like exactly, but I would assume that the "file position" and "file size" would not line up in such cases. Anyways, this may be another way to get the desired information.

(I'm not sure when this was first introduced, but it's available on my 4.1.3 installs)

Communicator

Sorry, "as of 4.3.4" I do not have enough reputation to edit my comment.

0 Karma

Communicator

As of 4.3.5, It's still available.

Might be a problem with the counter rolling over. I have a LARGE file. file position = 8316767731, file size = 440951863, percent = 1886.09, type = open file

0 Karma

Path Finder

This was introduced in 4.1 as development debug endpoint point. It is not part of the QA testing and its efficiency at scale is still a bit of an unknown. Useful tool but its information should be taken with a grain of salt.

SplunkTrust
SplunkTrust

If you want to be able to tell yourself, as opposed to a script or other process being able to tell, the real-time search feature available in 4.1 can be very handy.

The simplest way is to run a real-time search in the main search interface, searching for whatever sourcetype, source and/or host that you're indexing the data with.

I recommend using an unbounded real-time search, ie the "Real Time > All time (real-time)" entry in the TimeRangePicker pulldown.

Then start the indexing, and you'll see the data pouring in. When the event numbers stop changing or when the rate of change drops down to whatever the real-time rate is, that means it's done or at least caught up to real-time.

A more interesting way is to watch the data from splunk's metrics log in real-time.

index="_internal" source="*metrics.log" group="per_sourcetype_thruput" series="<your_sourcetype_here>" | eval MB=kb/1024 | chart sum(MB)

or to watch everything happening split by sourcetype....

index="_internal" source="*metrics.log" group="per_sourcetype_thruput" | eval MB=kb/1024 | chart sum(MB) avg(eps) over series

(To get the same functionality for indexes, hosts, sources. just change per_sourcetype_thruput to one of per_index_thruput, per_host_thruput, per_source_thruput.)

And if you're having trouble with a data input and you want a way to troubleshoot it, particularly if your whitelist/blacklist rules arent working the way you expect, go to this URL:

https://yoursplunkhost:8089/services/admin/inputstatus

NOTE: You'll have to click through a bunch of browser security exceptions because splunkd has a self-signed cert and browsers are suspicious of this.

But that page will actually tell you status on every file matched by every input and whether it's matching a blacklist rule or a whitelist rule.

Splunk Employee
Splunk Employee
  • When "monitoring" a file, Splunk never "finishes" since it can not know that when it reaches the end of the file that it will not be appended. Reaching the end of file is not recorded, since in normal operation, Splunk will often reach the end of file on every single event in the monitored log file, since it can read faster than logs are usually written.

  • In batch mode, the file will be deleted when it is finished.

  • In oneshot mode, you can query the REST API at https://localhost:8089/services/data/inputs/oneshot to see if the input is present; if it isn't, then indexing is complete.

View solution in original post