I'm working on a problem where Splunk is not displaying (sometimes) all indexed events.
The problematic index has the following tsidx files:
31.08.2010. 23:05 96.459.548 1283292338-1257614821-48000.tsidx 08.10.2010. 23:01 40.510.462 1286575276-1257614821-24760.tsidx 26.10.2010. 23:02 19.906.067 1288130534-1257614821-35920.tsidx 01.11.2010. 00:02 6.051.835 1288566138-1257614821-36040.tsidx 03.11.2010. 00:04 1.616.672 1288739031-1257614821-23840.tsidx
These numbers are clearly Unix time - but how does Splunk use them to search the data. Notice that the second number is always the same, is that normal?
You're right--the numbers are Unix time. They signify the time of the latest event and earliest event, respectively, in the tsidx file. It's not abnormal for multiple tsidx files to have the same second number since you could have multiple events occurring in the same second indexed to several tsidx files.
This naming convention allows Splunk to optimize the retrieval of events. Based on the time range specified in your search, Splunk will only search the tsidx files whose events fall within the time range.
How did you identify this bucket as being problematic? And how did you find Splunk is sometimes displaying all indexed events instead of always?
Thanks - after checking some documentation I figured out how Splunk uses these numbers.
I have an open ticket with Splunk regarding this but not much happened yet.
There are two indexes which are having problems (this being one of them). I simply used a search such as this one:
index=problematic_index earliest=10/22/2010:12:0:0 latest=10/25/2010:0:0:0 | stats count by host sourcetype date_mday | sort sourcetype date_mday
And the search is not giving same results always - sometimes a certain host has less events and sometimes more.
Any ideas for debugging this are appreciated!