I need to read RoleStatus.csv file , that's being rolled over every day.
The first line of file is always empty.
Lines 2-5 have some info about file and results.
Line 6 is a Header
I need to skip the first 6 lines and read this file starting from line 7. I was using the HEADER_FIELD_LINE_NUMBER = 6 in props.conf to accomplish that.
Also, in the majority of cases, the ONLY difference between the current version and the previous version of RoleStatus.csv is the date and time, so Splunk doesn't read the current version because it thinks it already read it previously
Current version of RoleStatus.csv
1. 2. Report : Role Status 3. Date : 11/13/2018 5:10:00 PM 4. Number of query results returned : 3 5. User : Service 6. Entity Health Current server Servers Version Status Maintenance 7. Taipei xx.xx.xxx.xx5 (Security Center Federation™) Online HOST1 HOST1 5.7.809.45 Started 8. SATDR xx.xx.xxx.xx6 (Security Center Federation) Online HOST1 HOST1 5.7.809.45 Started 9. Austinxx.xx.xxx.xx6 (Security Center Federation™ Online HOST1 HOST1 5.7.809.45 Started
The previous version of RoleStatus.csv (file is not ingested into the Splunk)
1. 2. Report : Role Status 3. Date : 11/13/2018 11:10:00 AM 4. Number of query results returned : 3 5. User : Service 6. Entity Health Current server Servers Version Status Maintenance 7. Taipei xx.xx.xxx.xx5 (Security Center Federation™) Online HOST1 HOST1 5.7.809.45 Started 8. SATDR xx.xx.xxx.xx6 (Security Center Federation) Online HOST1 HOST1 5.7.809.45 Started 9. Austinxx.xx.xxx.xx6 (Security Center Federation™ Online HOST1 HOST1 5.7.809.45 Started
[monitor://d:\logs\RoleStatus.csv] disabled = false index = gp sourcetype = gp_rolestatus
DATETIME_CONFIG = CURRENT HEADER_FIELD_LINE_NUMBER = 6 INDEXED_EXTRACTIONS = csv KV_MODE = none MAX_DAYS_AGO = 10951 NO_BINARY_CHECK = true SHOULD_LINEMERGE = false category = Structured description = Comma-separated value format. Set header and other settings in "Delimited Settings" disabled = false pulldown_type = true
Any advices how to accomplish the following: to skip first 6 lines, recognize 6th line as a header and re-read the file if only change is Date and/or Time on the second line?
Thank you in advance
By default Splunk uses the first 256 bytes to calculate a crc hash which is used to detect whether a file is new or not. The behavior you are experiencing sounds like splunk doesn't take that from the actual start of the file when you configure it to ignore the first few lines - like you do with the
HEADER_FIELD_LINE_NUMBER = 6 - but only calculates the crc over the 256 bytes starting at the header line?
Because looking at your sample data, that timestamp should fall well withing the first 256 bytes I would say.
Do you have any control over the filename of the log? Can you configure the system generating the log such that it includes the timestamp into the filename? Then you could use
crcSalt = <SOURCE> in inputs.conf to make Splunk take the filename into account for detecting new files.
That, or raise a case with Splunk support to check whether my suspicion is correct and whether or not that is a bug or intended behavior (and if so: how to deal with it).
@FrankVl , wanted to add timestamp to the name of the file as a last resort because the user will need to setup some kind of process to clean the data and due to different reasons they want to explore other ways first.
Thank you for you suggestion!
If this file is generated periodically in one go (so not continuously being written to), you could look at using a batch input rather than file monitor, such that Splunk removes the file after processing.