Getting Data In

Defining timestamp on data from .csv file

lyndac
Contributor

I'm trying to index some data input from a .csv file. Is it possible to tell splunk to use a specific column of data as the timestamp for the event? My data looks like this:

Company Name, date, other junk
Company ABC Maryland,09-Feb-10,my notes
"Other company, inc ",12-Feb-10,more notes

I was hoping splunk would automatically detect the date in the second column and use that as the timestamp for the event, but it doesn't. It uses the current time. How can I make it use that column?

Thanks!

Tags (1)
1 Solution

Lowell
Super Champion

You can't exactly tell splunk to look in a certain column. However, you can tell splunk what comes before your timestamp, and you can write a simple regex to guide splunk to the right column by simply comma counting.

This type of logic can be setup using the TIME_PREFIX props.conf entry. Is also a good idea to indicate a time format as well (using TIME_FORMAT), and with out this sometimes splunk can mis-identify or otherwise miss your event's timestamp.

So if your timestamp is always in the second column, try adding a config entry like this to your splunk/etc/local/system/props.conf file:

[source::.../path/to/my/csvfile.csv]
sourcetype = my_csv_sourcetype

[my_csv_sourcetype]
TIME_PREFIX = ^(?:"(?:""|[^"]|\\")+"|[^,"]+),
TIME_FORMAT = %d-%b-%y
# If your events are all on a single line, you should add:
SHOULD_LINEMERGE = False
BREAK_ONLY_BEFORE_DATE = False

Note: This regex should allow for the first column to be quoted or unquoted. And if it's quoted, it a literal quote can be escaped by either doing a double quote (e.g. "", which is the CSV default way of escaping a quote), or by using backslash-quote (e.g. \") which is another popular way of escaping quotes.... Wow, that gets ugly really fast. I haven't tested this will all possible options.

Note that if your date and time are in a different column this can get tricky if they are not side by side. But often they are, in which case it's pretty simple do use a config entry like this:

 TIME_FORMAT = %d-%b-%y,%H:%M:%S

Where we simply put a literal comma in between the date and time fields.

If you haven't read this already, please take a look at the props.conf reference.

Hope this gets you started.

View solution in original post

Lowell
Super Champion

You can't exactly tell splunk to look in a certain column. However, you can tell splunk what comes before your timestamp, and you can write a simple regex to guide splunk to the right column by simply comma counting.

This type of logic can be setup using the TIME_PREFIX props.conf entry. Is also a good idea to indicate a time format as well (using TIME_FORMAT), and with out this sometimes splunk can mis-identify or otherwise miss your event's timestamp.

So if your timestamp is always in the second column, try adding a config entry like this to your splunk/etc/local/system/props.conf file:

[source::.../path/to/my/csvfile.csv]
sourcetype = my_csv_sourcetype

[my_csv_sourcetype]
TIME_PREFIX = ^(?:"(?:""|[^"]|\\")+"|[^,"]+),
TIME_FORMAT = %d-%b-%y
# If your events are all on a single line, you should add:
SHOULD_LINEMERGE = False
BREAK_ONLY_BEFORE_DATE = False

Note: This regex should allow for the first column to be quoted or unquoted. And if it's quoted, it a literal quote can be escaped by either doing a double quote (e.g. "", which is the CSV default way of escaping a quote), or by using backslash-quote (e.g. \") which is another popular way of escaping quotes.... Wow, that gets ugly really fast. I haven't tested this will all possible options.

Note that if your date and time are in a different column this can get tricky if they are not side by side. But often they are, in which case it's pretty simple do use a config entry like this:

 TIME_FORMAT = %d-%b-%y,%H:%M:%S

Where we simply put a literal comma in between the date and time fields.

If you haven't read this already, please take a look at the props.conf reference.

Hope this gets you started.

Lowell
Super Champion

Whoops. I had missed that your example contained quotes with a comma. I updated an example that should handle quotes, if your interested. It also handles some quote escaping, which you'll sometimes see in CSV files. (If this has answered your questions, you can accept it and then you will see a green check on the side of this answer.)

0 Karma

Scan001
Explorer

Thanks for all comments. I'll try suggestions out and provide feedback etc.

Cheers.

0 Karma

lyndac
Contributor

Whoops -- I got it...forgot to set SHOULD_LINEMERGE to false. It's working great now. thanks for your help.

0 Karma

lyndac
Contributor

I was trying to do something similar, but can't seem to get the regex correct. If there is a comma in value of the first column like:
"Other company, inc ",12-Feb-10,more notes
It appears to merge with the next line until it can match the regex and time format next to each other.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...