Solved: How to parse date and time in different columns du...

felipetesta · ‎09-08-2015

Hello,

My raw CSV data looks like this (it's open data, no anonymization needed):

Anno;Reg;Articolo;Classe;Dat Infr;Loc1;Numciv1;Bisinternolettera;Loc2;Ora Infr;Sanzione accessoria;Tipoinfr;Numero Verbali;Sanzioni
2014;Legge 285/92;158;Autovettura;01/07/2014;SEBASTOPOLI/(CORSO);238;B;(Altri);0:00;(Mancante);Sosta;1;1
2014;Legge 285/92;142;Autovettura;01/07/2014;UNITA' D'ITALIA/(CORSO);96;AFR.EST;(Altri);0:01;(Mancante);Limiti di Velocita';1;1

Date is in column 5 "Dat Infr" and time in column 9 "Ora Infr". I tried most combinations with/out quotes, comma, semicolon, SHOULD_LINEMERGE true/false, but nothing has worked and the correct timestamp doesn't come up. On the other hand, I can get Splunk to recognize the date field if I leave it alone in TIMESTAMP_FIELDS and TIME_FORMAT. The proposed props.conf is:

[ csv ]
TIME_FORMAT=%d/%m/%Y,%H:%M
TZ=Europe/Rome
MAX_TIMESTAMP_LOOKAHEAD=500
TIMESTAMP_FIELDS=Dat Infr,Ora Infr
SHOULD_LINEMERGE=false
INDEXED_EXTRACTIONS=csv
NO_BINARY_CHECK=true
KV_MODE=none
disabled=false
pulldown_type=true

What am I missing? Even the official docs have a similar unanswered question at http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/ConfigurePositionalTimestampExtraction

Thanks.

felipetesta · ‎09-08-2015

Answering my own question after half-a-day of research and trial&error.

I checked strptime() man page and found a "wildcard" field descriptor: %n, "Arbitraty whitespace". So, replacing the "comma" with "%n" into

TIME_FORMAT=%d/%m/%Y%n%H:%M

did the trick. Mind that CSV field names are case sensitive. Double quotes are not necessary if the CSV field name contains whitespace.

View solution in original post

felipetesta · ‎09-08-2015

Answering my own question after half-a-day of research and trial&error.

I checked strptime() man page and found a "wildcard" field descriptor: %n, "Arbitraty whitespace". So, replacing the "comma" with "%n" into

TIME_FORMAT=%d/%m/%Y%n%H:%M

did the trick. Mind that CSV field names are case sensitive. Double quotes are not necessary if the CSV field name contains whitespace.

felipetesta · ‎09-08-2015

Findings after further investigation.

Browsing further into the preview the timestamp gets recognized and importing the 2014 CSV works, except for first ~100 events. But CSV from year 2013 has the time in hh.mm (dot instead of colon) and I cannot import it.

Curiously, if I preview the extraction of date or time only, the TIME_FORMAT (%d/%m/%Y or %H.%M) is comfirmed to be correct. Concatenation doesn't seem to work (reliably). Swapping date and time neither.

I would prefer not to edit the source CSV with +700k lines.

Splunk version 6.2.2.

How to parse date and time in different columns during CSV import?

Can’t make it to .conf25? Join us online!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Unlock What’s Next: The Splunk Cloud Platform at .conf25

Are you a member of the Splunk Community?

How to parse date and time in different columns during CSV import?

Can’t make it to .conf25? Join us online!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Unlock What’s Next: The Splunk Cloud Platform at .conf25