I have a script that pulls events from my REST API for Splunk to index. My script runs on schedule.
I want to only pull new events, to prevent duplication and unnecessary traffic. My events have incrementing IDs.
To pull new events I need my script to remember what was the ID of the last pulled event, i.e. my script needs to persist state between runs. If Splunk instance restarts, I too wouldn't like to bring all the events from the beginning.
What are my options here? I would like not to read last ID by issuing query to Splunk.
You can also run a search with curl or other http tools to get the
sourcetype=mysource | stats first(ID) by sourcetype, then use the results of this search in your script so that Splunk is your "database" / "config file".
If you do the filesystem file as suggested above, I recommend something like appName/bin/.deltaFile
the . in front of the filename will make it hidden on linux systems. Heres some unoptimized code for creating/reading/updating a delta file which contains the last date of execution,
def getDeltaDate(datapath): try: if os.path.exists(datapath + '/.delta_date'): #open delta file and return date from file with open(datapath + '/.delta_date','r') as deltafile: for lastrundate in deltafile: return lastrundate deltafile.close() #write new date to file, overwriting the original with open(datapath + '/.delta_date','w') as deltafile: deltafile.write(str(date.today())) deltafile.close() else: #write new date to file, original shouldnt exist at this point but will overwrite if so with open(datapath + '/.delta_date','w') as deltafile: deltafile.write(str(date.today())) deltafile.close() firstDate = "1970-01-01" return firstDate except OSError as e: logger.critical('Function: getDeltaDate failed due to the following error(s): ' + str(e)) print('Function: getDeltaDate failed due to the following error(s): ' + str(e))