Splunk Dev

Using the REST API in Python to export large search results, why does the search auto finalize?

karan1337
Path Finder

Hi,

I am trying to export (Stream) huge search results by using the REST API directly in python. For 1 minute of data, I get about 600,000 events. For 10 minutes I am able to get the data, but when I increase the time for more than 10 minutes, the search auto finalizes. (I see in the Jobs page that my search is not available in the UI, but the dispatch status is "finalizing")

My export search is something like:

index=somename sourcetype=somename earliest=-20m | table _indextime, _raw

Is there any setting that restricts even the export api from streaming all results?

Tags (3)
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

For large jobs you'd be better off creating a search "traditionally" by POSTing to search/jobs instead of search/jobs/export, retrieve the sid, and then load results off that sid. See this snippet from the docs:

If it is too big, you might instead run with the search/jobs (not search/jobs/export) endpoint (it takes POST with the same parameters), maybe using the exec_mode=blocking. You'll then get back a search id, and then you can page through the results and request them from the server under your control, which is a better approach for extremely large result sets that need to be chunked. 

http://docs.splunk.com/Documentation/Splunk/6.2.3/RESTREF/RESTsearch#search.2Fjobs.2Fexport

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

You may get much better speeds if you set output_mode=raw:

$ curl -k -u admin:changeme https://localhost:8089/services/search/jobs/export -d search="search index=_internal" -d output_mode=raw > outfile
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  686M    0  686M    0    45  13.1M      0 --:--:--  0:00:52 --:--:-- 12.0M
$ cat outfile | wc -l
4007497

Four million events, 700MB, 52 seconds, run on my home all-in-one Splunk instance.

0 Karma

karan1337
Path Finder

Thanks @martin_mueller. I will try that out.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

| table * is a terrible idea because it tells Splunk to extract ALL the fields. Consider | table _raw instead if that's all you're looking to export.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Here's what the docs recommend on exporting large volumes: http://docs.splunk.com/Documentation/Splunk/6.2.3/Search/Exportsearchresults#Python_SDK

0 Karma

karan1337
Path Finder

@martin_mueller I also tried POSTing to /search/jobs. For a large set of results (more than 10 million), this endpoint is not giving me more than 500,009 results ( i don't know the reason for this number). When i append | table * to my query, i do get all results but the result took more than 1 hour to stream back to my remote system from the splunk machine. Such a long time might not be practical for my use case.

0 Karma

karan1337
Path Finder

@martin_mueller I tried this and the only issue was streaming using the SDK is taking a hit on performance in my use case. Export or search directly using the REST API is way faster than using the SDK.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...