Getting Data In

How to extract data from a CSV file with multiple lines, but the timestamp on a different line??

anasar
New Member

Hi,

My log has a timestamp and a CSV rows. Eg. given 2 records.

Sun Feb 14 07:01:05 EST 2016

customer_name,cust_id, response_code, response_time, size
abc, 1002304,200, 0.111,120
def, 1002203,200,0.112,150
ghi, 1002206,500,0.113,160

Sun Feb 14 07:04:55 EST 2016

customer_name,cust_id, response_code, response_time, size
abc, 1002304,200, 0.114,110
def, 1002203,200,0.118,190
ghi, 1002206,500,0.117,130

How do I index them with the timestamp mentioned for all records in the CSV?? pls help.

0 Karma
1 Solution

Lowell
Super Champion

There's no real easy way to do this. The problem here is that your file is a mix between a traditional "log" file (with timestamps existing between events) and a tabular (CSV) data format. This isn't a typical use, and therefore isn't well supported. A few thoughts/ideas listed below.

Options:

  1. Modify the log-producing program to format so that there's one timestamp per row (line).
  2. Pre-process the CSV file so that the timestamp repeats for each row. This would be relatively simple to do in Python, Perl, AWK, or whatever tool you prefer.
  3. Allow Splunk ingest each chunk of tabular data into a single event; then use some search-time magic to split each event into multiple events--one per csv row. I know a wrote a custom search command a few years back that did this exact thing. (I may be able to find it lying around; I never got around to actually releasing it.) But this probably isn't the best answer and may not perform as well, depending on your use case.

There could be some other clever approaches to this as well, but most of them will be either a pain to implement, hard to understand, or have a serious limitations. I'd try options 1 or 2 first.

View solution in original post

Lowell
Super Champion

There's no real easy way to do this. The problem here is that your file is a mix between a traditional "log" file (with timestamps existing between events) and a tabular (CSV) data format. This isn't a typical use, and therefore isn't well supported. A few thoughts/ideas listed below.

Options:

  1. Modify the log-producing program to format so that there's one timestamp per row (line).
  2. Pre-process the CSV file so that the timestamp repeats for each row. This would be relatively simple to do in Python, Perl, AWK, or whatever tool you prefer.
  3. Allow Splunk ingest each chunk of tabular data into a single event; then use some search-time magic to split each event into multiple events--one per csv row. I know a wrote a custom search command a few years back that did this exact thing. (I may be able to find it lying around; I never got around to actually releasing it.) But this probably isn't the best answer and may not perform as well, depending on your use case.

There could be some other clever approaches to this as well, but most of them will be either a pain to implement, hard to understand, or have a serious limitations. I'd try options 1 or 2 first.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Best Practices: Splunk auto adjust pipeline queue

When you enable autoAdjustQueue in Splunk, maxSize should be understood as the queue size Splunk starts with ...

Request for Professional Development: Attending .conf26

Winning Over the Boss: Your Pass to .conf26 conf26 is going to be here before you know it. If don't already ...

Casting Call: Compete in Cyber Games

Lights, Camera, SecOps: Apply to Compete in Cyber Games     Think you have what it takes to beat the clock? ...