Getting Data In

How to extract data from a CSV file with multiple lines, but the timestamp on a different line??

anasar
New Member

Hi,

My log has a timestamp and a CSV rows. Eg. given 2 records.

Sun Feb 14 07:01:05 EST 2016

customer_name,cust_id, response_code, response_time, size
abc, 1002304,200, 0.111,120
def, 1002203,200,0.112,150
ghi, 1002206,500,0.113,160

Sun Feb 14 07:04:55 EST 2016

customer_name,cust_id, response_code, response_time, size
abc, 1002304,200, 0.114,110
def, 1002203,200,0.118,190
ghi, 1002206,500,0.117,130

How do I index them with the timestamp mentioned for all records in the CSV?? pls help.

0 Karma
1 Solution

Lowell
Super Champion

There's no real easy way to do this. The problem here is that your file is a mix between a traditional "log" file (with timestamps existing between events) and a tabular (CSV) data format. This isn't a typical use, and therefore isn't well supported. A few thoughts/ideas listed below.

Options:

  1. Modify the log-producing program to format so that there's one timestamp per row (line).
  2. Pre-process the CSV file so that the timestamp repeats for each row. This would be relatively simple to do in Python, Perl, AWK, or whatever tool you prefer.
  3. Allow Splunk ingest each chunk of tabular data into a single event; then use some search-time magic to split each event into multiple events--one per csv row. I know a wrote a custom search command a few years back that did this exact thing. (I may be able to find it lying around; I never got around to actually releasing it.) But this probably isn't the best answer and may not perform as well, depending on your use case.

There could be some other clever approaches to this as well, but most of them will be either a pain to implement, hard to understand, or have a serious limitations. I'd try options 1 or 2 first.

View solution in original post

Lowell
Super Champion

There's no real easy way to do this. The problem here is that your file is a mix between a traditional "log" file (with timestamps existing between events) and a tabular (CSV) data format. This isn't a typical use, and therefore isn't well supported. A few thoughts/ideas listed below.

Options:

  1. Modify the log-producing program to format so that there's one timestamp per row (line).
  2. Pre-process the CSV file so that the timestamp repeats for each row. This would be relatively simple to do in Python, Perl, AWK, or whatever tool you prefer.
  3. Allow Splunk ingest each chunk of tabular data into a single event; then use some search-time magic to split each event into multiple events--one per csv row. I know a wrote a custom search command a few years back that did this exact thing. (I may be able to find it lying around; I never got around to actually releasing it.) But this probably isn't the best answer and may not perform as well, depending on your use case.

There could be some other clever approaches to this as well, but most of them will be either a pain to implement, hard to understand, or have a serious limitations. I'd try options 1 or 2 first.

Get Updates on the Splunk Community!

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Watch On Demand the Tech Talk on November 6 at 11AM PT, and empower your SOC to reach new heights! Duration: ...

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...