Hi guys,
I am currently monitoring a folder (recursively) so that the files in the directory/sub-directories are indexed. These files will only ever be .CSV. The issue I have is that Splunk seems to only index the first three lines of the CSV files, the rest is ignored.
Here is my stanza:
[monitor://E:\Reports]
disabled = false
index = reports
recursive = true
host = host_name
sourcetype = csv
And here is the first 10 lines of my CSV... (the rest of the CSV file follows a similar format).
,,Business Name,,
Business Name - Calls Completed Last Week,,,,
"Generated by System Administrator on : Dec 5, 2019 09:15 AM",,,,
Total records : 110,,,,
"Completed Time : From Nov 24, 2019 12:00 AM To Nov 30, 2019 11:59 PM",,,,
Request ID,Subject,Created Time,DueBy Time,Technician
"Nov 25, 2019",,,,
15624,Url log,"Nov 22, 2019 05:02 PM",Not Assigned,Tom
15625,Url Log - Daily Blocked Words List,"Nov 22, 2019 05:02 PM",Not Assigned,Tom
15629,Url Log - Daily Blocked Words List,"Nov 23, 2019 05:10 PM",Not Assigned,Tom
15630,Url log,"Nov 23, 2019 05:10 PM",Not Assigned,Tom
What else may I need to modify to get Splunk to index the data correctly? As i said, the first three lines of the CSV have been indexed - the rest ignored for some reason.
Thanks for your help!
Dan
Hey,
The problem with your base data is that the header is appearing at line no.6.
I assume you are interested in the data from line No.7
First, define a custom sourcetype which parses the data as CSV.Settings-->SourceType-->New SourceType.
You have two options to do it.
Please try both and let us know.
Option 01
[monitor://E:\Reports]
disabled = false
index = reports
recursive = true
host = host_name
sourcetype = your_sourcetype
Please modify the props.conf as below.
[your_sourcetype]
DATETIME_CONFIG = CURRENT
HEADER_FIELD_LINE_NUMBER = 6
INDEXED_EXTRACTIONS = csv
SHOULD_LINEMERGE = false
category = Structured
disabled = false
pulldown_type = true
Option 02
Here you have to use field extraction to extract only the relevant lines and ignore the rest.
You could use SED commands or use a bash/shell script to extract the files and then index the file.
[Bash/Shell script will run and generate the file and store it a different location. Then monitor that location for the files]
Please modify the props.conf as below
[your_sourcetype]
01_TRANSFORMS-null = null_queue
02_TRANSFORMS-csv = transforms_csv
Please modify the transforms.conf as below
[null_queue]
REGEX = your_regex
DEST_KEY = queue
FORMAT = nullQueue
[transforms_csv]
DELIMS = ","
FIELDS = " Request ID","Subject","Created Time","DueBy Time","Technician"
Have you had any of the files ingested successfully? I'm assuming you need all the information on the header (everything before line 6) indexed as well. If so, is the pattern for the header consistent?
Could you also share your props?