Getting Data In

How do I export a large dataset from the REST API?

andrewbeak
Path Finder

Hi,

I'm reading the documentation at http://docs.splunk.com/Documentation/Splunk/7.2.0/RESTREF/RESTsearch#search.2Fjobs but I'm having problems getting all the search results that I need.

The logic that I thought would work is:

  1. Create an asynchronous search job
  2. Regularly poll the job until it is done
  3. Fetch the resultCount from the job
  4. Fetch paginated results

I believe that there is some setting in limits.conf that restricts how many results you can pull at a time. By default, it's set to 50k and I'm using Splunk Cloud, so I can't touch this. That's why I'm fetching pages of results.

However, no matter what I set max_count to in the search request, Splunk normalizes this request to 1000. When I call the API to get the number of results in the data-set, it says that there are 1000.

Here is a screenshot from inspecting the job:

alt text

What is the best way to use the API to get a large dataset out of Splunk?

0 Karma
1 Solution

happycoding
Engager

The easiest way to export is in my opinion to do it via curl. E.g.

curl -k -u USERNAME:PASSWORD https://SPLUNK_URL:8089/services/search/jobs/export \
        --data-urlencode search='search index="my-index" earliest=0 latest=now | table field1, field2' \
        -d output_mode=csv \
        -d earliest_time='-y@y' \
        -d latest_time='@y' \
        -o output-file.csv

In this case from the previous year into the file output-file.csv

View solution in original post

dstaulcu
Builder

heres my powershell way of getting large results through rest

https://github.com/dstaulcu/SplunkTools/blob/master/Splunk-SearchLargeJobs-Example.ps1

0 Karma

happycoding
Engager

The easiest way to export is in my opinion to do it via curl. E.g.

curl -k -u USERNAME:PASSWORD https://SPLUNK_URL:8089/services/search/jobs/export \
        --data-urlencode search='search index="my-index" earliest=0 latest=now | table field1, field2' \
        -d output_mode=csv \
        -d earliest_time='-y@y' \
        -d latest_time='@y' \
        -o output-file.csv

In this case from the previous year into the file output-file.csv

andrewbeak
Path Finder

It seems that the export API endpoint streams results instead of saving them and so allows you to have much larger result sets.

The answer to the question is, therefore, not to use the /jobs/search endpoint to create a search job and then later go fetch the results. Instead, use export to stream it all over the wire.

0 Karma

alancalvitti
Path Finder

Andrew, we're getting XML parse errors from the jobs.export API over python SDK, whereas jobs.oneshot completes the same query (albeit too slow for our application). Is there an alternative, fast method to export query results as a single XML or json file perhaps?

0 Karma
Get Updates on the Splunk Community!

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

The University of Nevada, Las Vegas (UNLV) is another premier research institution helping to shape the next ...

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...