In a CSV file, can you help me skip the first few ...

mlevsh · ‎11-13-2018

I need to read RoleStatus.csv file , that's being rolled over every day.

The first line of file is always empty.

Lines 2-5 have some info about file and results.

Line 6 is a Header

I need to skip the first 6 lines and read this file starting from line 7. I was using the HEADER_FIELD_LINE_NUMBER = 6 in props.conf to accomplish that.

Also, in the majority of cases, the ONLY difference between the current version and the previous version of RoleStatus.csv is the date and time, so Splunk doesn't read the current version because it thinks it already read it previously

For example:
Current version of RoleStatus.csv

1.                  
2. Report : Role Status                     
3. Date : 11/13/2018 5:10:00 PM                     
4. Number of query results returned : 3                     
5. User : Service                       
6. Entity   Health  Current server  Servers Version Status  Maintenance
7. Taipei xx.xx.xxx.xx5 (Security Center Federation™) Online  HOST1   HOST1   5.7.809.45  Started 
8. SATDR xx.xx.xxx.xx6  (Security Center Federation)    Online  HOST1   HOST1   5.7.809.45  Started 
9. Austinxx.xx.xxx.xx6 (Security Center Federation™   Online  HOST1   HOST1   5.7.809.45  Started

The previous version of RoleStatus.csv (file is not ingested into the Splunk)

1.                  
2. Report : Role Status                     
3. Date : 11/13/2018 11:10:00 AM
4. Number of query results returned : 3                     
5. User : Service                       
6. Entity   Health  Current server  Servers Version Status  Maintenance
7. Taipei xx.xx.xxx.xx5 (Security Center Federation™) Online  HOST1   HOST1   5.7.809.45  Started 
8. SATDR xx.xx.xxx.xx6  (Security Center Federation)    Online  HOST1   HOST1   5.7.809.45  Started 
9. Austinxx.xx.xxx.xx6 (Security Center Federation™   Online  HOST1   HOST1   5.7.809.45  Started

Inputs.conf:

[monitor://d:\logs\RoleStatus.csv]
disabled = false
index = gp
sourcetype = gp_rolestatus

props.conf:

[gp_rolestatus]

DATETIME_CONFIG = CURRENT
HEADER_FIELD_LINE_NUMBER = 6
INDEXED_EXTRACTIONS = csv
KV_MODE = none
MAX_DAYS_AGO = 10951
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Structured
description = Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled = false
pulldown_type = true

Any advices how to accomplish the following: to skip first 6 lines, recognize 6th line as a header and re-read the file if only change is Date and/or Time on the second line?

Thank you in advance

FrankVl · ‎11-14-2018

By default Splunk uses the first 256 bytes to calculate a crc hash which is used to detect whether a file is new or not. The behavior you are experiencing sounds like splunk doesn't take that from the actual start of the file when you configure it to ignore the first few lines - like you do with the HEADER_FIELD_LINE_NUMBER = 6 - but only calculates the crc over the 256 bytes starting at the header line?

Because looking at your sample data, that timestamp should fall well withing the first 256 bytes I would say.

Do you have any control over the filename of the log? Can you configure the system generating the log such that it includes the timestamp into the filename? Then you could use crcSalt = <SOURCE> in inputs.conf to make Splunk take the filename into account for detecting new files.

That, or raise a case with Splunk support to check whether my suspicion is correct and whether or not that is a bug or intended behavior (and if so: how to deal with it).

mlevsh · ‎11-14-2018

@FrankVl , wanted to add timestamp to the name of the file as a last resort because the user will need to setup some kind of process to clean the data and due to different reasons they want to explore other ways first.

Thank you for you suggestion!

FrankVl · ‎11-14-2018

If this file is generated periodically in one go (so not continuously being written to), you could look at using a batch input rather than file monitor, such that Splunk removes the file after processing.

mlevsh · ‎11-19-2018

@ FrankVl, never tried batch inputs before. Will research it. Thank you so much for your suggestion.

In a CSV file, can you help me skip the first few lines, recognize n's line as a header, and re-read if only difference is the Date/Time on one of the skipped lines?

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Join the Conversation

In a CSV file, can you help me skip the first few lines, recognize n's line as a header, and re-read if only difference is the Date/Time on one of the skipped lines?

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...