Getting Data In

How do I export a large dataset from the REST API?

andrewbeak
Path Finder

Hi,

I'm reading the documentation at http://docs.splunk.com/Documentation/Splunk/7.2.0/RESTREF/RESTsearch#search.2Fjobs but I'm having problems getting all the search results that I need.

The logic that I thought would work is:

  1. Create an asynchronous search job
  2. Regularly poll the job until it is done
  3. Fetch the resultCount from the job
  4. Fetch paginated results

I believe that there is some setting in limits.conf that restricts how many results you can pull at a time. By default, it's set to 50k and I'm using Splunk Cloud, so I can't touch this. That's why I'm fetching pages of results.

However, no matter what I set max_count to in the search request, Splunk normalizes this request to 1000. When I call the API to get the number of results in the data-set, it says that there are 1000.

Here is a screenshot from inspecting the job:

alt text

What is the best way to use the API to get a large dataset out of Splunk?

0 Karma
1 Solution

happycoding
Engager

The easiest way to export is in my opinion to do it via curl. E.g.

curl -k -u USERNAME:PASSWORD https://SPLUNK_URL:8089/services/search/jobs/export \
        --data-urlencode search='search index="my-index" earliest=0 latest=now | table field1, field2' \
        -d output_mode=csv \
        -d earliest_time='-y@y' \
        -d latest_time='@y' \
        -o output-file.csv

In this case from the previous year into the file output-file.csv

View solution in original post

dstaulcu
Builder

heres my powershell way of getting large results through rest

https://github.com/dstaulcu/SplunkTools/blob/master/Splunk-SearchLargeJobs-Example.ps1

0 Karma

happycoding
Engager

The easiest way to export is in my opinion to do it via curl. E.g.

curl -k -u USERNAME:PASSWORD https://SPLUNK_URL:8089/services/search/jobs/export \
        --data-urlencode search='search index="my-index" earliest=0 latest=now | table field1, field2' \
        -d output_mode=csv \
        -d earliest_time='-y@y' \
        -d latest_time='@y' \
        -o output-file.csv

In this case from the previous year into the file output-file.csv

andrewbeak
Path Finder

It seems that the export API endpoint streams results instead of saving them and so allows you to have much larger result sets.

The answer to the question is, therefore, not to use the /jobs/search endpoint to create a search job and then later go fetch the results. Instead, use export to stream it all over the wire.

0 Karma

alancalvitti
Path Finder

Andrew, we're getting XML parse errors from the jobs.export API over python SDK, whereas jobs.oneshot completes the same query (albeit too slow for our application). Is there an alternative, fast method to export query results as a single XML or json file perhaps?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...