Splunk Search

Index a CSV with different data types

phil__tanner
Path Finder

Hi all - bit of a weird one! I've run out of ideas. Help please!

I'm trying to index some CSV files. However, the first line does not contain headers, it contains CSV info about the file itself (such as the start/end dates for the data covered in the file, the original sending organisation, and the type of data it relates to), and a different number of "columns" than the following rows.

I would like to parse this header row to extract these values, and append them as fields to each event contained within that file.

The Header & Detail rows are easily located (they are the first & every other row respectively, and start with HDR or DET respectively).

Sample data here would look like this:

HDR,ICPHHAB,10,XXX,XXX,YYYY,7/03/2019,15:04:34,20190307,1344,201902,E,I
DET,############,ZZZZZZ,F,1/02/2019,1,44.597,0,,X,,,
DET,############,ZZZZZZ,F,1/02/2019,2,44.972,0,,X,,,
DET,############,ZZZZZZ,F,1/02/2019,3,44.972,0,,X,,,
DET,############,ZZZZZZ,F,1/02/2019,4,44.947,0,,X,,,

And what I'm wanting to achieve would be to not include the HDR rows in the search results, but rather to pick up those bits of data & append them to each of the DET rows that follow.

Because these are files we are receiving, and they need to follow a certain industry standard, I would like to avoid doing any python style pre-processing (which I agree would be easiest route) in order to leave an audit trail of the original files.

I've tried various things from the props.conf like

PREAMBLE_REGEX = ^HDR\,(.*)

and transforms.conf

REGEX = (?=[^H]*(?:HDR|H.*HDR))^\w+,(?P<field1>\w+),(?P<`field2>\d+\.\d+),(?P<field3 >\w+), etc etc etc

But the HDR files are returning in my search, the fields from the HDR row are only appearing for those rows (not appending to DET ones), and the field extractions I'm doing in the props.conf for the DET rows are being polluted with HDR row (like col3 contains all values from col3, regardless of if it was a DET or HDR row).

0 Karma

richgalloway
SplunkTrust
SplunkTrust

You should write a scripted input the parses the file, does the needed transformations, and writes the results to stdout for Splunk to index.

---
If this reply helps you, Karma would be appreciated.
0 Karma

phil__tanner
Path Finder

As per original post:

Because these are files we are receiving, and they need to follow a certain industry standard, I would like to avoid doing any python style pre-processing (which I agree would be easiest route) in order to leave an audit trail of the original files.

Is there no way with REX / REGEX / props.conf / transforms.conf to achieve?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If you change the data using transforms then you haven't left an audit trail of the original file. Your scripted input can always copy the data to an archive location.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...