Splunk Search
Highlighted

Index a CSV with different data types

Engager

Hi all - bit of a weird one! I've run out of ideas. Help please!

I'm trying to index some CSV files. However, the first line does not contain headers, it contains CSV info about the file itself (such as the start/end dates for the data covered in the file, the original sending organisation, and the type of data it relates to), and a different number of "columns" than the following rows.

I would like to parse this header row to extract these values, and append them as fields to each event contained within that file.

The Header & Detail rows are easily located (they are the first & every other row respectively, and start with HDR or DET respectively).

Sample data here would look like this:

HDR,ICPHHAB,10,XXX,XXX,YYYY,7/03/2019,15:04:34,20190307,1344,201902,E,I
DET,############,ZZZZZZ,F,1/02/2019,1,44.597,0,,X,,,
DET,############,ZZZZZZ,F,1/02/2019,2,44.972,0,,X,,,
DET,############,ZZZZZZ,F,1/02/2019,3,44.972,0,,X,,,
DET,############,ZZZZZZ,F,1/02/2019,4,44.947,0,,X,,,

And what I'm wanting to achieve would be to not include the HDR rows in the search results, but rather to pick up those bits of data & append them to each of the DET rows that follow.

Because these are files we are receiving, and they need to follow a certain industry standard, I would like to avoid doing any python style pre-processing (which I agree would be easiest route) in order to leave an audit trail of the original files.

I've tried various things from the props.conf like

PREAMBLE_REGEX = ^HDR\,(.*)

and transforms.conf

REGEX = (?=[^H]*(?:HDR|H.*HDR))^\w+,(?P<field1>\w+),(?P<`field2>\d+\.\d+),(?P<field3 >\w+), etc etc etc

But the HDR files are returning in my search, the fields from the HDR row are only appearing for those rows (not appending to DET ones), and the field extractions I'm doing in the props.conf for the DET rows are being polluted with HDR row (like col3 contains all values from col3, regardless of if it was a DET or HDR row).

0 Karma
Highlighted

Re: Index a CSV with different data types

SplunkTrust
SplunkTrust

You should write a scripted input the parses the file, does the needed transformations, and writes the results to stdout for Splunk to index.

---
If this reply helps you, an upvote would be appreciated.
0 Karma
Highlighted

Re: Index a CSV with different data types

Engager

As per original post:

Because these are files we are receiving, and they need to follow a certain industry standard, I would like to avoid doing any python style pre-processing (which I agree would be easiest route) in order to leave an audit trail of the original files.

Is there no way with REX / REGEX / props.conf / transforms.conf to achieve?

0 Karma
Highlighted

Re: Index a CSV with different data types

SplunkTrust
SplunkTrust

If you change the data using transforms then you haven't left an audit trail of the original file. Your scripted input can always copy the data to an archive location.

---
If this reply helps you, an upvote would be appreciated.
0 Karma
Speak Up for Splunk Careers!

We want to better understand the impact Splunk experience and expertise has has on individuals' careers, and help highlight the growing demand for Splunk skills.