Hi All,
I am getting some annoying messages in splunkd.log
03-20-2014 15:47:27.631 +1000 WARN DateParserVerbose - Failed to parse timestamp. Defaulting to timestamp of previous event (Thu Mar 20 16:45:00 2014). Context: source::/opt/mydata/PUBLIC_P5MIN_201403201550_20140320154535.CSV|host::amo-web|p5_reports|38558
Now I know what this error means but it doesnt really fit with my data. But I suspect I know what its occurring I just want to stop it.
So I have my CSV data file which is of the following format
I,P5MIN,LOCAL_PRICE,1,RUN_DATETIME,DUID,INTERVAL_DATETIME,LOCAL_PRICE_ADJUSTMENT,LOCALLY_CONSTRAINED,LASTCHANGED
D,P5MIN,LOCAL_PRICE,1,"2014/03/19 12:00:00",DATA1,"2014/03/19 12:00:00",0,0,"2014/03/19 11:55:29"
D,P5MIN,LOCAL_PRICE,1,"2014/03/19 12:00:00",DATA1,"2014/03/19 12:05:00",0,0,"2014/03/19 11:55:29"
D,P5MIN,LOCAL_PRICE,1,"2014/03/19 12:00:00",DATA1,"2014/03/19 12:10:00",0,0,"2014/03/19 11:55:29"
D,P5MIN,LOCAL_PRICE,1,"2014/03/19 12:00:00",DATA1,"2014/03/19 12:15:00",0,0,"2014/03/19 11:55:29"
D,P5MIN,LOCAL_PRICE,1,"2014/03/19 12:00:00",DATA1,"2014/03/19 12:20:00",0,0,"2014/03/19 11:55:29"
D,P5MIN,LOCAL_PRICE,1,"2014/03/19 12:00:00",DATA1,"2014/03/19 12:25:00",0,0,"2014/03/19 11:55:29"
I,P5MIN,REGIONSOLUTION,4,RUN_DATETIME,INTERVAL_DATETIME,REGIONID,RRP
D,P5MIN,REGIONSOLUTION,4,"2014/03/19 12:00:00","2014/03/19 12:00:00",STATE1,54.07
D,P5MIN,REGIONSOLUTION,4,"2014/03/19 12:00:00","2014/03/19 12:05:00",STATE1,53.8101
D,P5MIN,REGIONSOLUTION,4,"2014/03/19 12:00:00","2014/03/19 12:10:00",STATE1,53.8101
D,P5MIN,REGIONSOLUTION,4,"2014/03/19 12:00:00","2014/03/19 12:15:00",STATE1,53.8101
Now as you can see there are two sets of data in this file. I am only interested in the last section of data to go into Splunk.
This is achieved with the following props.conf
[p5_reports]
KV_MODE = none
SHOULD_LINEMERGE = false
TRANSFORMS-filterprices = setnull,getFiveMinutePrices
REPORT-extracts = fiveMinuteCsvExtract
TIME_PREFIX=D,P5MIN,REGIONSOLUTION,[^,]*,[^,]*
TIME_FORMAT=%y/%m/%d %H:%M:%S
and associated transforms.conf
[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
[getFiveMinutePrices]
REGEX = ^D,P5MIN,REGIONSOLUTION,(.*)
DEST_KEY = queue
FORMAT = indexQueue
[fiveMinuteCsvExtract]
DELIMS = ","
FIELDS = "I","P5MIN","REGIONSOLUTION","4","RUN_DATETIME","INTERVAL_DATETIME","REGIONID","RRP"
Now this all works fine and my data comes in and _time is associated with the second time field INTERVAL_DATETIME.
But my logfiles are FULL of these
03-20-2014 15:47:27.631 +1000 WARN DateParserVerbose - Failed to parse timestamp. Defaulting to timestamp of previous event (Thu Mar 20 16:45:00 2014). Context: source::/opt/mydata/PUBLIC_P5MIN_201403201550_20140320154535.CSV|host::amo-web|p5_reports|38558
So is props.conf running first and generating these errors BEFORE I have filtered out only the stuff I want?
ie at what point is the timestamp looked for?
And for bonus points it will be near impossible to extract both these types of data into seperate sourcetypes as the _times I want will be in different places?
Timestamps are extracted before transforms.
Maybe you can craft a more complex regex for TIME_PREFIX
?
TIME_PREFIX= ^([^,]*,){5}(\w+,)?\"
Which in theory (not tested it) should make the 6th element optional. In the example above \w
is used for matching this part. Adjust as needed.
Hope this helps,
K
Timestamps are extracted before transforms.
Maybe you can craft a more complex regex for TIME_PREFIX
?
TIME_PREFIX= ^([^,]*,){5}(\w+,)?\"
Which in theory (not tested it) should make the 6th element optional. In the example above \w
is used for matching this part. Adjust as needed.
Hope this helps,
K
Actually it might be possible with this in transforms.conf
[
REGEX =
FORMAT = sourcetype::
DEST_KEY = MetaData:Sourcetype
http://docs.splunk.com/Documentation/Splunk/6.0.2/Data/Advancedsourcetypeoverrides
But I am straying far from my original question now 🙂
This is exactly what I came here to post but you beat me to it.
The data file example I provided about is only an example. There are actually 6 header rows and I am getting 6 errors.
So yes there is nothing really I can do about these errors I just need to live with them.
Thanks for the regexp tip though.
I dont think I will be able to extract both sets of data into different sourcetypes unless transforms.conf allows me to override a sourcetype setting for a particular event if it matches a particular regexp.
Additionally, if the header rows are part of the file they will also generate these errors (since they do not contain any timestamp). Perhaps you should change your nullQueue:ing a bit to drop them too.
props
[p5_reports]
TRANSFORMS-filterprices = setnull,getFiveMinutePrices, drop5mHeader
transforms
[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
[getFiveMinutePrices]
REGEX = REGIONSOLUTION
DEST_KEY = queue
FORMAT = indexQueue
[drop5mHeader]
REGEX = REGIONID
DEST_KEY = queue
FORMAT = nullQueue
SEDCMD in props is an alternative for removing headers.
K