Getting Data In

How to extract data from a CSV file with multiple lines, but the timestamp on a different line??

anasar
New Member

Hi,

My log has a timestamp and a CSV rows. Eg. given 2 records.

Sun Feb 14 07:01:05 EST 2016

customer_name,cust_id, response_code, response_time, size
abc, 1002304,200, 0.111,120
def, 1002203,200,0.112,150
ghi, 1002206,500,0.113,160

Sun Feb 14 07:04:55 EST 2016

customer_name,cust_id, response_code, response_time, size
abc, 1002304,200, 0.114,110
def, 1002203,200,0.118,190
ghi, 1002206,500,0.117,130

How do I index them with the timestamp mentioned for all records in the CSV?? pls help.

0 Karma
1 Solution

Lowell
Super Champion

There's no real easy way to do this. The problem here is that your file is a mix between a traditional "log" file (with timestamps existing between events) and a tabular (CSV) data format. This isn't a typical use, and therefore isn't well supported. A few thoughts/ideas listed below.

Options:

  1. Modify the log-producing program to format so that there's one timestamp per row (line).
  2. Pre-process the CSV file so that the timestamp repeats for each row. This would be relatively simple to do in Python, Perl, AWK, or whatever tool you prefer.
  3. Allow Splunk ingest each chunk of tabular data into a single event; then use some search-time magic to split each event into multiple events--one per csv row. I know a wrote a custom search command a few years back that did this exact thing. (I may be able to find it lying around; I never got around to actually releasing it.) But this probably isn't the best answer and may not perform as well, depending on your use case.

There could be some other clever approaches to this as well, but most of them will be either a pain to implement, hard to understand, or have a serious limitations. I'd try options 1 or 2 first.

View solution in original post

Lowell
Super Champion

There's no real easy way to do this. The problem here is that your file is a mix between a traditional "log" file (with timestamps existing between events) and a tabular (CSV) data format. This isn't a typical use, and therefore isn't well supported. A few thoughts/ideas listed below.

Options:

  1. Modify the log-producing program to format so that there's one timestamp per row (line).
  2. Pre-process the CSV file so that the timestamp repeats for each row. This would be relatively simple to do in Python, Perl, AWK, or whatever tool you prefer.
  3. Allow Splunk ingest each chunk of tabular data into a single event; then use some search-time magic to split each event into multiple events--one per csv row. I know a wrote a custom search command a few years back that did this exact thing. (I may be able to find it lying around; I never got around to actually releasing it.) But this probably isn't the best answer and may not perform as well, depending on your use case.

There could be some other clever approaches to this as well, but most of them will be either a pain to implement, hard to understand, or have a serious limitations. I'd try options 1 or 2 first.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...