This is more statement than question, but the community should be advised Splunk Universal Forwarder 9.1.2 and 9.1.5 (those I have used and witnessed this occurring on, I cannot speak to others yet) does not abide by cron syntax 100% of the time. It seems to do so at a very low frequency of occurrence, which may make repeated runs of scripted inputs difficult to notice or detect that it's occurring.
I feel it's important the community is aware of this as some scripts may be time sensitive or you expect the scripts to only run once in a reporting period so no deduplicating was added -- amongst other possible circumstances a double-run might cause unintended impacts.
In my case I noticed it when doing a limited deploy of a Deployment Server GUID reset script. With only 8 targeted systems scheduled to run once per day it was very easy to notice that it ran twice for multiple assets. Fortunately, I'd designed this script to create a bookmark so it wouldn't run more than once so the UF could run it every day to catch new systems or name changes, so it running more than once wouldn't cause a flood of entries in the DS.
In my environment I noticed the Scheduler prematurely ran the scripts roughly 2 seconds before the cron-scheduled hour-minute combination, so the following SPL could find when this occurred:
index=_internal sourcetype=splunkd reschedule_ms reschedule_ms<2000
Example: Scripted Input is scheduled to run at "2 7 * * *" (07:02:00), but there was a run at 07:01:58 AND 07:02:00.
It ran two seconds early, then reschedules it to occur at the next scheduled interval which happens to be the expected time (two seconds later), and the cycle repeats. It seemed to fluctuate and vary, and sometimes even resolve itself given enough time. It's difficult to know what influences it or why after a correct run the run of milliseconds added is short by ~2000, but it does happen and may be influencing your scripts too.
Reported
I opened a ticket with Support, but there's no word on a permanent resolution to this less than 1% affected forwarders issue so far, so I feel letting the community know to look if that's happening -- and if it matters in your environment, because maybe a little duplication doesn't matter -- was important.
Workarounds
Changing from cron syntax to number of seconds for the Interval does work as a workaround, but that isn't always what you want -- sometimes you want a specific minute in an hour, not just hourly (whenever). Adding logic into a script to check the time is another possible workaround.
Hi @Pcktech,
Do your forwarders synchronize their clocks with an external source? If yes, have you confirmed whether clock synchronization occurred around the time of the first execution?
For example:
07:02:00 - scheduler queues task
07:02:01 - clock is synchronized with external source and set to 07:01:57
07:01:58 - scheduler executes task
07:02:00 - scheduler queues task
07:02:01 - scheduler executes task
If the forwarder logs do not indicate a step backwards, clock synchronization may still have occurred after the task was queued but before any events were logged.
Hello, it was not confirmed previously, but it appeared unlikely at the time.
Previously, the issue persisted even after I changed the schedule from 2 7 * * * to 2,27 7 * * * and later even 2 7,19 * * * which required UF restarts at different times of day. While time sync does occur it doesn't occur often enough to have affected all of these attempts.
Today, I double checked one of the systems more consistently affected (index=<WindowsLogs> host=<REDACT> EventCode=4616 4616 NewTime) and found a time synchronization did not occur around the time the issue manifested especially at the time of a UF service restart.