Splunk Search

How to skip header in CSV files before indexing?

sander_vandamme
Path Finder

My input files are in the following format (CSV):

Icon Statistics

Time;26.10.2017 00:00 - 27.10.2017 04:40
Service;Servicename
Statistic;Report_servicename

Date;Time;IncomingRequest;InternalSystemDBError;InternalSystemDataError;InternalSystemErrorOther;OK;SDUPTimeout;SDUPError;InvalidIncomingRequest;counter8;counter9;counter10;counter11;counter12;counter13;counter14;counter15;counter16;counter17;counter18;counter19
26.10.2017;00:00;4;0;0;0;4;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0
26.10.2017;00:10;2;0;0;0;2;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0
26.10.2017;00:20;5;0;0;0;5;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0
Total;;1,234;0;0;0;1,224;0;10;0;0;0;0;0;0;0;0;0;0;0;0;0

Before indexing these files, the "header" should be removed.
I configured the Splunk Universal Forwarder to monitor these files in the following way:

[monitor:///opt/ect/data/sdp/mail/statistics/*SDUP*.csv]
index=csdp_prod_stats
source=statistics
sourcetype=csv
crcSalt = <SOURCE>
ignoreOlderThan=14d

On the main Splunk instance, I configured the props.conf:

[csv]
TRANSFORMS-eliminate_header = eliminate_header
INDEXED_EXTRACTIONS = CSV
FIELD_DELIMITER = ;
TIMESTAMP_FIELDS = Date,Time
HEADER_FIELD_LINE_NUMBER = 7

And transforms.conf as following:

[eliminate_header]
REGEX = ^(?:Icon|Time|Service|Statistic|Total)
DEST_KEY = queue
FORMAT = nullQueue

When I check the search in Splunk, it seems like the remove of the header is not working. The complete file is being indexed. What am I doing wrong?

Also I want to use the column names in the CSV as field names in Splunk from the line I did not remove from the CSV file. Is this the correct way of specifying this automatic extraction of fields in Spunk? ("HEADER_FIELD_LINE_NUMBER = 7" as seen above in props.conf)

Thank you in advance!

0 Karma
1 Solution

mattymo
Splunk Employee
Splunk Employee

Hey Sander!

you need to make sure you put the props/transforms on the forwarder when dealing with structured data:

"If you want to forward fields that you extract from structured data files to another Splunk instance, you must configure the props.conf settings that define the field extractions on the forwarder that sends the data."

https://docs.splunk.com/Documentation/Splunk/7.0.0/Data/Extractfieldsfromfileswithstructureddata#Fie...

This props worked for me, you should just pick the right timezone (TZ) value for this data, and perhaps just dump the Total line..by providing the header line number , I believe you remove the need for props/transforms to dump the header as we do it automagically I believe:

[ sander_csv ]
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
SHOULD_LINEMERGE=false
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
HEADER_FIELD_LINE_NUMBER=7
FIELD_DELIMITER=;
TZ=UTC
TIMESTAMP_FIELDS=Date,Time

alt text

- MattyMo

View solution in original post

mattymo
Splunk Employee
Splunk Employee

Hey Sander!

you need to make sure you put the props/transforms on the forwarder when dealing with structured data:

"If you want to forward fields that you extract from structured data files to another Splunk instance, you must configure the props.conf settings that define the field extractions on the forwarder that sends the data."

https://docs.splunk.com/Documentation/Splunk/7.0.0/Data/Extractfieldsfromfileswithstructureddata#Fie...

This props worked for me, you should just pick the right timezone (TZ) value for this data, and perhaps just dump the Total line..by providing the header line number , I believe you remove the need for props/transforms to dump the header as we do it automagically I believe:

[ sander_csv ]
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
SHOULD_LINEMERGE=false
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
HEADER_FIELD_LINE_NUMBER=7
FIELD_DELIMITER=;
TZ=UTC
TIMESTAMP_FIELDS=Date,Time

alt text

- MattyMo

DUThibault
Contributor

Please clarify: when you say "you need to make sure you put the props/transforms on the forwarder", do you mean a forwarding Splunk instance, or do you mean a Splunk Universal Forwarder?

0 Karma

mattymo
Splunk Employee
Splunk Employee

speaking specifically about indexed_extractions, it would be any forwarding instance.

- MattyMo
0 Karma

DUThibault
Contributor

So I should copy [Splunk Instance]/opt/splunk/etc/apps/search/local/props.conf and transforms.conf to [Splunk Universal Forwarder]/opt/splunkforwarder/etc/apps/_server_app_<server class>/local/ , correct?

0 Karma

mattymo
Splunk Employee
Splunk Employee

hard to say, not sure what you are trying to do. maybe start a new answers post and link me and I'll help you there, or catch me on slack (splk.it/splunk - my username is @mattymo)

- MattyMo
0 Karma

DUThibault
Contributor

How do I "link you"? I don't see anything resembling that on my original question's page.

0 Karma

mattymo
Splunk Employee
Splunk Employee

just post the link here

- MattyMo
0 Karma

DUThibault
Contributor
0 Karma

sander_vandamme
Path Finder

Thank you! This one is working for me. Your proposed props.conf in Combination with the transforms.conf the "Total" line is also skipped from indexing.

0 Karma

mattymo
Splunk Employee
Splunk Employee

sweet, what did it? pushing the props/transforms the forwarder?

- MattyMo
0 Karma

sander_vandamme
Path Finder

Yes indeed, moved both files to the forwarder and it started to work flawlessly!
Thanks once more!

0 Karma

koshyk
Super Champion

is the above example data, a single event or whole contents of a file? Just checking this because if "Icon Statistics" occur again the same file, it might need line breaker and line merge false options

0 Karma

sander_vandamme
Path Finder

The data above is an example of such file. In the monitored location (/opt/ect/data/sdp/mail/statistics/SDUP.csv) the same kind of file is being exported every 10 minutes (with a different name of course). The header I am speaking off that needs to be skipped is the same structure in every csv file.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...