I'm writing a cron job (using the Python SDK) that does a search and exports the data to a CSV file (to analyze it in a different app). The result usually is a few hundreds thousands rows (~ a million or so), so I need to fetch it piecewise (by 50k). The problem is that sometimes the result suddenly disappears after fetching the first few segments.
How to get rid of that? Is there an option to mark a result as 'do not remove' or something like that?
Great question tomasv! The real answer has to do with the TTL of the search job. The default value is 600, and it unfortunately does not get reset when you get results from the job. You have a few options:
my_job.touch()after every call to
my_job.results(...)- that will reset the TTL.
The other thing to note is that if you truly have hundreds of thousands of rows you need to get out of a single search, you may be better suited with using the
search/jobs/export endpoint. In the Python SDK, you can take a look at the export sample here, which shows you how to do it. The benefit of the export endpoint is that it will simply stream the results to you as they are ready, so you don't have to keep paginating. You might find that this is a faster mechanism.
Let me know if this makes sense, and if not, I can add some more details.