Re: CSV file index time field extractions - how to...

kiril123 · ‎11-11-2019

Hello,

I have the following little csv file:

time,interface,utilization
2019-11-03,int_a,100
2019-11-04,int_b,200

You can see in contains a header and two rows with the data.

I want to perform index time extraction of the fields. I also want to use timestamp from the time column.

This is my props.conf configuration:

DATETIME_CONFIG =
INDEXED_EXTRACTIONS = csv
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
TIMESTAMP_FIELDS = time
TIME_FORMAT = %Y-%m-%d
category = Custom
pulldown_type = 1
HEADER_FIELD_LINE_NUMBER = 1
disabled = false
FIELD_HEADER_REGEX =
PREAMBLE_REGEX =

No matter what i do Splunk always indexes the header as well. I don't want that. I have tried the following settings:

PREAMBLE_REGEX - this ignores the header, but then index time field extractions are not performed. Probably because the header is ignored (chicken and egg situation). I can work around this by listing the comma separated field names manually but i want schema on write support which Splunk doesn't seem to provide.
HEADER_FIELD_LINE_NUMBER = 1 Tried this setting which made no difference.

Does anyone know if it is possible to index csv file fields without the header and without defining column names manually in props.conf?

Thank you,

Kiril

darrenfuller · ‎11-11-2019

I usually go with a props/transforms/nullQueue for these type of situations where the field names are known

# props.conf
[782506]
disabled = false
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d
MAX_TIMESTAMP_LOOKAHEAD = 15
LINE_BREAKER = ([\r\n]+)
SHOULD_LINEMERGE = false
INDEXED_EXTRACTIONS = csv
TRANSFORMS_01_killheader = Delete_csv_header

with

#transforms.conf
[Delete_csv_header]
disabled = false
REGEX = ^time\,interface\,utilization
DEST_KEY = queue
FORMAT = nullQueue

This resulted in two events, no header, and field extractions indexed.

anwarmian · ‎05-04-2020

Darrenfuller's answer is good. There are advantages and disadvantages in index time and search time field extraction for csv file with header.

Search Time : Less storage space for indexed data than index time extraction
Index Time: If a field's position changes, and it can happen sometimes, then creating a new report class for search time will override some of the old fields.
Here is a link to a case study on search time vs index time for json file.
https://www.hurricanelabs.com/blog/splunk-case-study-indexed-extractions-vs-search-time-extractions

CSV file index time field extractions - how to ignore the header? Is there schema on write support?

Routing logs with Splunk OTel Collector for Kubernetes

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM