I have a file with ~6M events that gets FTP'd to Splunk on a daily basis. Unfortunately I don't have control of the output and there are no timestamp. Using CURRENT_TIME breaks things since all events show up with the same time and I have to search across an entire day at a time.
Any thoughts on how to get enough timestamps so that that I don't run into search limitations?
I was thinking of using an LWF to receive the FTP'd file and tweak the maxKBps in limits.conf so that CURRENT_TIME processes across 10's or 100's of seconds. Thoughts?
The easiest way would simply be to name the file with the date/timestamp in a way that datetime.xml can get the timestamp, assuming the events are all supposed to have the same timestamp. Then, Splunk should extract the date/time from the file name, and auto-increment the extracted time as it finds that it's getting too many repeats.
Similarly, if you can manipulate the file, you could prepend a single timestamp at the top of the file and subsequent events lacking a timestamp should get that timestamp.
If more than 100,000 events come in for the same host/source/sourcetype in sequence with the same second timestamp, Splunk will auto-increment timestamps by 1 second, specifically to avoid this issue, so either of these solutions should work.