Getting Data In

What is the best way to add timestamps to a large log file without timestamps?

dskillman
Splunk Employee
Splunk Employee

I have a file with ~6M events that gets FTP'd to Splunk on a daily basis. Unfortunately I don't have control of the output and there are no timestamp. Using CURRENT_TIME breaks things since all events show up with the same time and I have to search across an entire day at a time.

Any thoughts on how to get enough timestamps so that that I don't run into search limitations?

I was thinking of using an LWF to receive the FTP'd file and tweak the maxKBps in limits.conf so that CURRENT_TIME processes across 10's or 100's of seconds. Thoughts?

1 Solution

gkanapathy
Splunk Employee
Splunk Employee

The easiest way would simply be to name the file with the date/timestamp in a way that datetime.xml can get the timestamp, assuming the events are all supposed to have the same timestamp. Then, Splunk should extract the date/time from the file name, and auto-increment the extracted time as it finds that it's getting too many repeats.

Similarly, if you can manipulate the file, you could prepend a single timestamp at the top of the file and subsequent events lacking a timestamp should get that timestamp.

If more than 100,000 events come in for the same host/source/sourcetype in sequence with the same second timestamp, Splunk will auto-increment timestamps by 1 second, specifically to avoid this issue, so either of these solutions should work.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

The easiest way would simply be to name the file with the date/timestamp in a way that datetime.xml can get the timestamp, assuming the events are all supposed to have the same timestamp. Then, Splunk should extract the date/time from the file name, and auto-increment the extracted time as it finds that it's getting too many repeats.

Similarly, if you can manipulate the file, you could prepend a single timestamp at the top of the file and subsequent events lacking a timestamp should get that timestamp.

If more than 100,000 events come in for the same host/source/sourcetype in sequence with the same second timestamp, Splunk will auto-increment timestamps by 1 second, specifically to avoid this issue, so either of these solutions should work.

Get Updates on the Splunk Community!

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...