I have a script that runs in an app on my forwarders every 12 hours. Or at least that's what it was doing until it abruptly stopped working on all forwarders on September 30th.
The script is deployed using the deployment server, so all the forwarders have identical configurations.
The inputs.conf file has five scripts configured and all work except for this one. Here is its inputs.conf entry:
[script://splunk/etc/apps/all/bin/chknmon.sh]
interval = 43200 # Run every 12 hours
sourcetype = nmonchk
source = script://./bin/chknmon.sh
The path to the chknmon.sh script is correct:
[/splunk/etc/apps/all/default]# ls -las /splunk/etc/apps/all/bin/chknmon.sh
4 -rwxr-xr-- 1 xyz abc 96 Oct 04 14:13 /splunk/etc/apps/all/bin/chknmon.sh
Splunk runs as root, and the script is owned by root.
Running the script manually produces correct output.
Other scripts in the same app (and same paths/permissions) run just fine. This one was no exception until it abruptly stopped working a few days ago. Admittedly, I've been toying around with Splunk a bit, but I have no idea what I could have done that could have affected just this one script.
I've tried bouncing both the indexer and the forwarders, but that didn't help.
I'm running 4.1.4 on the forwarders and 4.1.5 on the indexer.
Thanks!
Did it stop on Sept 30, or after Sept 30th? I'm just thinking that you could have a timestamp parsing / configuration issue.
There are two things that could cause a date passing issues starting Oct 1st. First, this is the first time in 2010 that the month is 2 digits (depending on whether your data format uses "9" or "09" for the month of September). Anything that assumed a single digit could have issues with this (for example, if you use splunk's punct
pattern anywhere in a search or eventtype). Also, Oct 1st, is the first time time the month is the same as the 2 digit year; so if your timestamps format is in any way ambiguous, and splunk is guessing about your timeformat, then this could cause some confusion. You may want to do a search across "All Time" and see if your events were simply given the wrong timestamp.
The bottom line: If your props.conf
entry for [nmonchk]
doesn't have an explicit TIME_FORMAT
entry, then I would suggest adding one.
Did it stop on Sept 30, or after Sept 30th? I'm just thinking that you could have a timestamp parsing / configuration issue.
There are two things that could cause a date passing issues starting Oct 1st. First, this is the first time in 2010 that the month is 2 digits (depending on whether your data format uses "9" or "09" for the month of September). Anything that assumed a single digit could have issues with this (for example, if you use splunk's punct
pattern anywhere in a search or eventtype). Also, Oct 1st, is the first time time the month is the same as the 2 digit year; so if your timestamps format is in any way ambiguous, and splunk is guessing about your timeformat, then this could cause some confusion. You may want to do a search across "All Time" and see if your events were simply given the wrong timestamp.
The bottom line: If your props.conf
entry for [nmonchk]
doesn't have an explicit TIME_FORMAT
entry, then I would suggest adding one.
Thanks! I managed to get the TIME_FORMAT right, but now that I read your comment I think I'm better off with DATETIME_CONFIG=CURRENT.
Actually, if you don't want splunk to interpret any date times, set DATETIME_CONFIG=CURRENT
for your sourcetype in props.conf
. If you need help with a TIME_FORMAT
value, simply post a couple example timestamps and someone will give you a hand; based on your 02/10/10
example, I would guess you want TIME_FORMAT = %d/%m/%y %H:%M:%S
, but that's just a guess. You probably don't need to mess with TIME_PREFIX
, unless you have multiple timestamps in your event. If you set DATETIME_CONFIG=CURRENT
then you don't need either of the TIME_*
settings.
Actually I think I know where to go from here. The TIME_PREFIX regular expression is a bit tricky for this one, but I'll get it with trial and error.
Thanks again!
Wow....
Stopped at 09/30/2010 at 12:01 AM.
I took your suggestion and did an "all time" search... and I'm finding entries from 02/10/10... which is impossible because I didn't have this script running in February. Nice catch.
So I guess my question now is: what do I do with TIME_FORMAT? I just want the index time to be the time it invokes the script...
Thanks!