Solved: Export/stream massive results from splunk REST API

karan1337 · ‎07-04-2015

I need to export a massive number of events from splunk. Hence for performance reasons i resorted to directly using the REST API in my python code rather than using the Splunk SDK itself.

I found the following curl command to export results:-

curl -ku username:password
https://splunk_host:port/servicesNS/admin/search/search/jobs/export -d
search=“search index%3D_internal | head 3” -d output_mode=json

My attempt at simulating this using python's http functions is as follows:-

//assume i have authenticated to splunk and have a session key
base_url = "http://splunkhost:port"

search_job_urn = '/services/search/jobs/export'

myhttp = httplib2.Http(disable_ssl_certificate_validation=True)

searchjob = myhttp.request(base_url + search_job_urn, 'POST', headers=
{'Authorization': 'Splunk %s' % sessionKey},
body=urllib.urlencode({'search':'search index=indexname sourcetype=sourcename'}))[1]

print searchjob

The last print keeps printing all results until done. For large queries i get "Memory Errors". I need to be able to read results in chunks (say 50,000) and write them to a file and reset the buffer for searchjob. How can i accomplish that?

karan1337 · ‎07-05-2015

I solved the above using the python's requests API. Refer: http://docs.python-requests.org/en/latest/api/

Just need to set stream=true in iter_content call (the call is looped until a valid chunk is received) and write the chunk to a file.
Also refer here for more info: http://stackoverflow.com/questions/16694907/how-to-download-large-file-in-python-with-requests-py

View solution in original post

karan1337 · ‎07-05-2015

I solved the above using the python's requests API. Refer: http://docs.python-requests.org/en/latest/api/

Just need to set stream=true in iter_content call (the call is looped until a valid chunk is received) and write the chunk to a file.
Also refer here for more info: http://stackoverflow.com/questions/16694907/how-to-download-large-file-in-python-with-requests-py

guilmxm · ‎03-03-2016

Hello karan1337,

Would yu mind sharing a copy of your python script call rest api and using chunk ?

I'm trying to get the same behavior, and that would be very cool 🙂

Thank you anyway !

Guilhem

martin_mueller · ‎07-05-2015

Have you considered mass-exporting from the CLI?

$SPLUNK_HOME/bin/splunk export eventdata -index indexname -sourcetype sourcetypename -dir /path/to/write/to

More info by running splunk help export.

karan1337 · ‎07-05-2015

I don't have access to the box running splunk so cannot use the CLI. I need to do it remotely. I fixed the above problem by using the requests API and writing chunks of results to a file. But i see that for large searches, the job status sometimes auto finalizes when i have huge number of results.

Export/stream massive results from splunk REST API

Routing logs with Splunk OTel Collector for Kubernetes

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM