- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I need to export a massive number of events from splunk. Hence for performance reasons i resorted to directly using the REST API in my python code rather than using the Splunk SDK itself.
I found the following curl command to export results:-
curl -ku username:password
https://splunk_host:port/servicesNS/admin/search/search/jobs/export -d
search=“search index%3D_internal | head 3” -d output_mode=json
My attempt at simulating this using python's http functions is as follows:-
//assume i have authenticated to splunk and have a session key
base_url = "http://splunkhost:port"
search_job_urn = '/services/search/jobs/export'
myhttp = httplib2.Http(disable_ssl_certificate_validation=True)
searchjob = myhttp.request(base_url + search_job_urn, 'POST', headers=
{'Authorization': 'Splunk %s' % sessionKey},
body=urllib.urlencode({'search':'search index=indexname sourcetype=sourcename'}))[1]
print searchjob
The last print keeps printing all results until done. For large queries i get "Memory Errors". I need to be able to read results in chunks (say 50,000) and write them to a file and reset the buffer for searchjob. How can i accomplish that?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I solved the above using the python's requests API. Refer: http://docs.python-requests.org/en/latest/api/
Just need to set stream=true in iter_content call (the call is looped until a valid chunk is received) and write the chunk to a file.
Also refer here for more info: http://stackoverflow.com/questions/16694907/how-to-download-large-file-in-python-with-requests-py
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I solved the above using the python's requests API. Refer: http://docs.python-requests.org/en/latest/api/
Just need to set stream=true in iter_content call (the call is looped until a valid chunk is received) and write the chunk to a file.
Also refer here for more info: http://stackoverflow.com/questions/16694907/how-to-download-large-file-in-python-with-requests-py
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello karan1337,
Would yu mind sharing a copy of your python script call rest api and using chunk ?
I'm trying to get the same behavior, and that would be very cool 🙂
Thank you anyway !
Guilhem
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Have you considered mass-exporting from the CLI?
$SPLUNK_HOME/bin/splunk export eventdata -index indexname -sourcetype sourcetypename -dir /path/to/write/to
More info by running splunk help export
.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't have access to the box running splunk so cannot use the CLI. I need to do it remotely. I fixed the above problem by using the requests API and writing chunks of results to a file. But i see that for large searches, the job status sometimes auto finalizes when i have huge number of results.
