Getting Data In

How to extract data from a CSV file with multiple lines, but the timestamp on a different line??

anasar
New Member

Hi,

My log has a timestamp and a CSV rows. Eg. given 2 records.

Sun Feb 14 07:01:05 EST 2016

customer_name,cust_id, response_code, response_time, size
abc, 1002304,200, 0.111,120
def, 1002203,200,0.112,150
ghi, 1002206,500,0.113,160

Sun Feb 14 07:04:55 EST 2016

customer_name,cust_id, response_code, response_time, size
abc, 1002304,200, 0.114,110
def, 1002203,200,0.118,190
ghi, 1002206,500,0.117,130

How do I index them with the timestamp mentioned for all records in the CSV?? pls help.

0 Karma
1 Solution

Lowell
Super Champion

There's no real easy way to do this. The problem here is that your file is a mix between a traditional "log" file (with timestamps existing between events) and a tabular (CSV) data format. This isn't a typical use, and therefore isn't well supported. A few thoughts/ideas listed below.

Options:

  1. Modify the log-producing program to format so that there's one timestamp per row (line).
  2. Pre-process the CSV file so that the timestamp repeats for each row. This would be relatively simple to do in Python, Perl, AWK, or whatever tool you prefer.
  3. Allow Splunk ingest each chunk of tabular data into a single event; then use some search-time magic to split each event into multiple events--one per csv row. I know a wrote a custom search command a few years back that did this exact thing. (I may be able to find it lying around; I never got around to actually releasing it.) But this probably isn't the best answer and may not perform as well, depending on your use case.

There could be some other clever approaches to this as well, but most of them will be either a pain to implement, hard to understand, or have a serious limitations. I'd try options 1 or 2 first.

View solution in original post

Lowell
Super Champion

There's no real easy way to do this. The problem here is that your file is a mix between a traditional "log" file (with timestamps existing between events) and a tabular (CSV) data format. This isn't a typical use, and therefore isn't well supported. A few thoughts/ideas listed below.

Options:

  1. Modify the log-producing program to format so that there's one timestamp per row (line).
  2. Pre-process the CSV file so that the timestamp repeats for each row. This would be relatively simple to do in Python, Perl, AWK, or whatever tool you prefer.
  3. Allow Splunk ingest each chunk of tabular data into a single event; then use some search-time magic to split each event into multiple events--one per csv row. I know a wrote a custom search command a few years back that did this exact thing. (I may be able to find it lying around; I never got around to actually releasing it.) But this probably isn't the best answer and may not perform as well, depending on your use case.

There could be some other clever approaches to this as well, but most of them will be either a pain to implement, hard to understand, or have a serious limitations. I'd try options 1 or 2 first.

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Observe and Secure All Apps with Splunk

 Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

What's New in Splunk Observability - August 2025

What's New We are excited to announce the latest enhancements to Splunk Observability Cloud as well as what is ...

Introduction to Splunk AI

How are you using AI in Splunk? Whether you see AI as a threat or opportunity, AI is here to stay. Lucky for ...