Solved: How to skip header in CSV files before indexing?

sander_vandamme · ‎10-27-2017

My input files are in the following format (CSV):

Icon Statistics

Time;26.10.2017 00:00 - 27.10.2017 04:40
Service;Servicename
Statistic;Report_servicename

Date;Time;IncomingRequest;InternalSystemDBError;InternalSystemDataError;InternalSystemErrorOther;OK;SDUPTimeout;SDUPError;InvalidIncomingRequest;counter8;counter9;counter10;counter11;counter12;counter13;counter14;counter15;counter16;counter17;counter18;counter19
26.10.2017;00:00;4;0;0;0;4;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0
26.10.2017;00:10;2;0;0;0;2;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0
26.10.2017;00:20;5;0;0;0;5;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0
Total;;1,234;0;0;0;1,224;0;10;0;0;0;0;0;0;0;0;0;0;0;0;0

Before indexing these files, the "header" should be removed.
I configured the Splunk Universal Forwarder to monitor these files in the following way:

[monitor:///opt/ect/data/sdp/mail/statistics/*SDUP*.csv]
index=csdp_prod_stats
source=statistics
sourcetype=csv
crcSalt = <SOURCE>
ignoreOlderThan=14d

On the main Splunk instance, I configured the props.conf:

[csv]
TRANSFORMS-eliminate_header = eliminate_header
INDEXED_EXTRACTIONS = CSV
FIELD_DELIMITER = ;
TIMESTAMP_FIELDS = Date,Time
HEADER_FIELD_LINE_NUMBER = 7

And transforms.conf as following:

[eliminate_header]
REGEX = ^(?:Icon|Time|Service|Statistic|Total)
DEST_KEY = queue
FORMAT = nullQueue

When I check the search in Splunk, it seems like the remove of the header is not working. The complete file is being indexed. What am I doing wrong?

Also I want to use the column names in the CSV as field names in Splunk from the line I did not remove from the CSV file. Is this the correct way of specifying this automatic extraction of fields in Spunk? ("HEADER_FIELD_LINE_NUMBER = 7" as seen above in props.conf)

Thank you in advance!

mattymo · ‎10-27-2017

Hey Sander!

you need to make sure you put the props/transforms on the forwarder when dealing with structured data:

"If you want to forward fields that you extract from structured data files to another Splunk instance, you must configure the props.conf settings that define the field extractions on the forwarder that sends the data."

https://docs.splunk.com/Documentation/Splunk/7.0.0/Data/Extractfieldsfromfileswithstructureddata#Fie...

This props worked for me, you should just pick the right timezone (TZ) value for this data, and perhaps just dump the Total line..by providing the header line number , I believe you remove the need for props/transforms to dump the header as we do it automagically I believe:

[ sander_csv ]
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
SHOULD_LINEMERGE=false
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
HEADER_FIELD_LINE_NUMBER=7
FIELD_DELIMITER=;
TZ=UTC
TIMESTAMP_FIELDS=Date,Time

- MattyMo

View solution in original post

mattymo · ‎10-27-2017

Hey Sander!

you need to make sure you put the props/transforms on the forwarder when dealing with structured data:

"If you want to forward fields that you extract from structured data files to another Splunk instance, you must configure the props.conf settings that define the field extractions on the forwarder that sends the data."

https://docs.splunk.com/Documentation/Splunk/7.0.0/Data/Extractfieldsfromfileswithstructureddata#Fie...

This props worked for me, you should just pick the right timezone (TZ) value for this data, and perhaps just dump the Total line..by providing the header line number , I believe you remove the need for props/transforms to dump the header as we do it automagically I believe:

[ sander_csv ]
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
SHOULD_LINEMERGE=false
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
HEADER_FIELD_LINE_NUMBER=7
FIELD_DELIMITER=;
TZ=UTC
TIMESTAMP_FIELDS=Date,Time

- MattyMo

DUThibault · ‎01-12-2018

Please clarify: when you say "you need to make sure you put the props/transforms on the forwarder", do you mean a forwarding Splunk instance, or do you mean a Splunk Universal Forwarder?

mattymo · ‎01-12-2018

speaking specifically about indexed_extractions, it would be any forwarding instance.

- MattyMo

DUThibault · ‎01-12-2018

So I should copy [Splunk Instance]/opt/splunk/etc/apps/search/local/props.conf and transforms.conf to [Splunk Universal Forwarder]/opt/splunkforwarder/etc/apps/_server_app_<server class>/local/ , correct?

mattymo · ‎01-12-2018

hard to say, not sure what you are trying to do. maybe start a new answers post and link me and I'll help you there, or catch me on slack (splk.it/splunk - my username is @mattymo)

- MattyMo

DUThibault · ‎01-12-2018

How do I "link you"? I don't see anything resembling that on my original question's page.

mattymo · ‎01-12-2018

just post the link here

- MattyMo

DUThibault · ‎01-12-2018

See you here: https://answers.splunk.com/answers/598234/importing-collectd-csv-data-for-consumption-by-spl.html

And thanks for helping me!

sander_vandamme · ‎10-27-2017

Thank you! This one is working for me. Your proposed props.conf in Combination with the transforms.conf the "Total" line is also skipped from indexing.

mattymo · ‎10-27-2017

sweet, what did it? pushing the props/transforms the forwarder?

- MattyMo

sander_vandamme · ‎10-27-2017

Yes indeed, moved both files to the forwarder and it started to work flawlessly!
Thanks once more!

koshyk · ‎10-27-2017

is the above example data, a single event or whole contents of a file? Just checking this because if "Icon Statistics" occur again the same file, it might need line breaker and line merge false options

sander_vandamme · ‎10-27-2017

The data above is an example of such file. In the monitored location (/opt/ect/data/sdp/mail/statistics/SDUP.csv) the same kind of file is being exported every 10 minutes (with a different name of course). The header I am speaking off that needs to be skipped is the same structure in every csv file.

How to skip header in CSV files before indexing?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!