I am using the Splunk API (with Python) to pull all values of a given record from a given index. I would like to download all data on a regular basis.

What strategy should I use to ensure I get all records without duplication and without missing anything? For instance, in a traditional DB world I might use a unique index value to keep track.

For example I would like to run this every hour to get all records:
"search index=syslog ssh* | table _raw"

Should I use some 'time' condition or is there some sort of count or index I could use?


You might be able to achieve what you want by triggering a real time search.

Since you are using the Python you know or are using the python SDK right?

I would probably use the stail.py script as a starting point:




