I am attempting to get Splunk to recognize a specific column in a CSV as the _time column (Current_time) upon ingestion.
***Sample Data***
Some sample lines from the CSV in question:
name,dpname,last_login,create_date,modify_date,Current_time
CAT-user,testing,,2021-01-29 09:47:42.340000000,2021-01-29 09:47:42.340000000,2022-03-24 13:18:36.390000000
master,test,,2021-09-16 11:09:21.597000000,2021-09-16 11:09:21.597000000,2022-03-24 13:18:36.390000000
model,database,,2003-04-08 09:10:42.287000000,2003-04-08 09:10:42.287000000,2022-03-24 13:18:36.390000000
Note that multiple columns include timestamps. I want Splunk to ingest them but not use them for _time. Only Current_time should be _time.
***Error***
With the below config I am getting DateParserVerbose warning messages in the splunkd log for the host the Universal Forwarder is sending the CSV from (my-host):
0400 WARN DateParserVerbose - Failed to parse timestamp in first MAX_TIMESTAMP_LOOKAHEAD (128) characters of event. Defaulting to timestamp of previous event (Fri Jan 29 15:35:24 2021). Context: source=C:\temp\acct_0001.csv|host=my-host|transforms_sql_report_acct|13995
***Config***
We have a clustered environment with 1 search head, 1 deployment server, and 3 indexers.
On the Host with Universal Forwarder installed (deployed to UF from deployment server /opt/splunk/etc/deployment-apps/):
inputs.conf
[monitor://C:\temp\acct_*.csv]
index=department_index
sourcetype=sql_report_acct
crcSalt = <SOURCE>
disabled = 0
Props and Transforms (Located on deployment server in /opt/splunk/etc/master-apps/_cluster/local. Confirmed deployed to all 3 indexers /opt/splunk/etc/slave-apps/_cluster/local):
props.conf
[sql_report_acct]
BREAK_ONLY_BEFORE_DATE=null
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=false
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%9N
TIME_PREFIX=^([^,]*,){5}
category=Structured
description=sourcetype for my input
disabled=false
pulldown_type=true
REPORT-transforms_sql_report_acct = transforms_sql_report_acct
transforms.conf
[transforms_sql_report_acct]
DELIMS = ","
FIELDS = name, dpname, last_login, create_date, modify_date, Current_time
By my understanding timestamp recognition is processed during the data pipeline parsing phase on the indexers. Slave app local should also have the highest priority for configs so I am not quite sure what am I doing wrong here. How do I get the Current_time field recognized as _time? Thanks for your input!
I overlooked that fact that you're dealing with structured data type (CSV files, IIS logs etc). The parsing of Structured data type happens at the Universal forwarder level. Look at below link which will give information on where Structured
Try to set the props.conf on Universal forwarder (along with your inputs.conf).
Give this a try for your props.conf entry:
[sql_report_acct]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
INDEXED_EXTRACTIONS=csv
KV_MODE=none
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%9N
TIMESTAMP_FIELDS=Current_time
Thank you! Got a moment to try this out. Replaced props.conf as follows and distributed it:
[sql_report_acct]
description=sourcetype for my input
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
INDEXED_EXTRACTIONS=csv
KV_MODE=none
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%9N
TIMESTAMP_FIELDS=Current_time
REPORT-transforms_sql_report_acct = transforms_sql_report_acct
It is still attempting to use the create_date column as _time.
The CSV contains 2 instances where the create_date is much older (2003-04-08 09:10:42.287000000). Splunk is ingesting it as _time with the current date and year (4/1/22 9:10:42.287 AM). Looks like this is being done by the MAX_DAYS_AGO default.
I overlooked that fact that you're dealing with structured data type (CSV files, IIS logs etc). The parsing of Structured data type happens at the Universal forwarder level. Look at below link which will give information on where Structured
Try to set the props.conf on Universal forwarder (along with your inputs.conf).
Success!
The following steps got this working 100% -
Props and transforms were placed on the host with the UF. We also redid the query that generates the CSV to have the Custom_time field as the first column instead of the last.
I had no idea that CSVs need to be parsed before reaching the indexers. Great find.