Getting Data In

What is the best way to add timestamps to a large log file without timestamps?

Splunk Employee
Splunk Employee

I have a file with ~6M events that gets FTP'd to Splunk on a daily basis. Unfortunately I don't have control of the output and there are no timestamp. Using CURRENT_TIME breaks things since all events show up with the same time and I have to search across an entire day at a time.

Any thoughts on how to get enough timestamps so that that I don't run into search limitations?

I was thinking of using an LWF to receive the FTP'd file and tweak the maxKBps in limits.conf so that CURRENT_TIME processes across 10's or 100's of seconds. Thoughts?

1 Solution

Splunk Employee
Splunk Employee

The easiest way would simply be to name the file with the date/timestamp in a way that datetime.xml can get the timestamp, assuming the events are all supposed to have the same timestamp. Then, Splunk should extract the date/time from the file name, and auto-increment the extracted time as it finds that it's getting too many repeats.

Similarly, if you can manipulate the file, you could prepend a single timestamp at the top of the file and subsequent events lacking a timestamp should get that timestamp.

If more than 100,000 events come in for the same host/source/sourcetype in sequence with the same second timestamp, Splunk will auto-increment timestamps by 1 second, specifically to avoid this issue, so either of these solutions should work.

View solution in original post

Splunk Employee
Splunk Employee

The easiest way would simply be to name the file with the date/timestamp in a way that datetime.xml can get the timestamp, assuming the events are all supposed to have the same timestamp. Then, Splunk should extract the date/time from the file name, and auto-increment the extracted time as it finds that it's getting too many repeats.

Similarly, if you can manipulate the file, you could prepend a single timestamp at the top of the file and subsequent events lacking a timestamp should get that timestamp.

If more than 100,000 events come in for the same host/source/sourcetype in sequence with the same second timestamp, Splunk will auto-increment timestamps by 1 second, specifically to avoid this issue, so either of these solutions should work.

View solution in original post