Getting Data In

How to parse date and time in different columns during CSV import?

felipetesta
Path Finder

Hello,

My raw CSV data looks like this (it's open data, no anonymization needed):

Anno;Reg;Articolo;Classe;Dat Infr;Loc1;Numciv1;Bisinternolettera;Loc2;Ora Infr;Sanzione accessoria;Tipoinfr;Numero Verbali;Sanzioni
2014;Legge 285/92;158;Autovettura;01/07/2014;SEBASTOPOLI/(CORSO);238;B;(Altri);0:00;(Mancante);Sosta;1;1
2014;Legge 285/92;142;Autovettura;01/07/2014;UNITA' D'ITALIA/(CORSO);96;AFR.EST;(Altri);0:01;(Mancante);Limiti di Velocita';1;1

Date is in column 5 "Dat Infr" and time in column 9 "Ora Infr". I tried most combinations with/out quotes, comma, semicolon, SHOULD_LINEMERGE true/false, but nothing has worked and the correct timestamp doesn't come up. On the other hand, I can get Splunk to recognize the date field if I leave it alone in TIMESTAMP_FIELDS and TIME_FORMAT. The proposed props.conf is:

[ csv ]
TIME_FORMAT=%d/%m/%Y,%H:%M
TZ=Europe/Rome
MAX_TIMESTAMP_LOOKAHEAD=500
TIMESTAMP_FIELDS=Dat Infr,Ora Infr
SHOULD_LINEMERGE=false
INDEXED_EXTRACTIONS=csv
NO_BINARY_CHECK=true
KV_MODE=none
disabled=false
pulldown_type=true

What am I missing? Even the official docs have a similar unanswered question at http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/ConfigurePositionalTimestampExtraction

Thanks.

1 Solution

felipetesta
Path Finder

Answering my own question after half-a-day of research and trial&error.

I checked strptime() man page and found a "wildcard" field descriptor: %n, "Arbitraty whitespace". So, replacing the "comma" with "%n" into

TIME_FORMAT=%d/%m/%Y%n%H:%M

did the trick. Mind that CSV field names are case sensitive. Double quotes are not necessary if the CSV field name contains whitespace.

View solution in original post

felipetesta
Path Finder

Answering my own question after half-a-day of research and trial&error.

I checked strptime() man page and found a "wildcard" field descriptor: %n, "Arbitraty whitespace". So, replacing the "comma" with "%n" into

TIME_FORMAT=%d/%m/%Y%n%H:%M

did the trick. Mind that CSV field names are case sensitive. Double quotes are not necessary if the CSV field name contains whitespace.

felipetesta
Path Finder

Findings after further investigation.

Browsing further into the preview the timestamp gets recognized and importing the 2014 CSV works, except for first ~100 events. But CSV from year 2013 has the time in hh.mm (dot instead of colon) and I cannot import it.

Curiously, if I preview the extraction of date or time only, the TIME_FORMAT (%d/%m/%Y or %H.%M) is comfirmed to be correct. Concatenation doesn't seem to work (reliably). Swapping date and time neither.

I would prefer not to edit the source CSV with +700k lines.

Splunk version 6.2.2.

0 Karma
Get Updates on the Splunk Community!

Splunk Community Platform Survey

Hey Splunk Community, Starting today, the community platform may prompt you to participate in a survey. The ...

Observability Highlights | November 2022 Newsletter

 November 2022Observability CloudEnd Of Support Extension for SignalFx Smart AgentSplunk is extending the End ...

Avoid Certificate Expiry Issues in Splunk Enterprise with Certificate Assist

This blog post is part 2 of 4 of a series on Splunk Assist. Click the links below to see the other ...