I have several Giga bytes of data that I need to extract from Splunk in raw format. The Dump command does not seem to work as expected:
index=perforce earliest=09/22/2013:23:59:00 latest=09/23/2013:23:59:00 | dump format=raw basefilename=perforce_dmp
Trying to export this amount of data from the UI seems unreliable. Please provide clear instruction on how to obtain an estimated 35 to 40GB of data using the CLI, REST API and the Dump command.
Below are the options to export large amount of data from Splunk.
Option 1: Export from UI – but for this to work you may need to increase the web timeout by setting
Try setting server.socket_timeout in web.conf to 3 minutes.
server.socket_timeout=180
http://docs.splunk.com/Documentation/Splunk/6.0.6/Admin/Webconf
Option 2: The best is to use the CLI command as documented in link
http://docs.splunk.com/Documentation/Splunk/6.1.3/SearchReference/CLIsearchsyntax
This link provides various argument for CLI command argument.
Here refer argument "maxout" as by default CLI only export 100 rows. To export large amount of event add -maxout to CLI command to adjust the number of event to be exported.
For example I used below command and was able to export 200,000 events.
splunk search "index=_internal earliest=09/14/2014:23:59:00 latest=09/16/2014:01:00:00 " -output rawdata -maxout 200000 > c:/test123.dmp
Option 3: The other option could be to use REST call . Here is a blog that has useful information .
http://blogs.splunk.com/2013/09/15/exporting-large-results-sets-to-csv/
Also here is search that I used to export some million of records
curl -k -u admin:XXXXXX --data-urlencode search="search google.com OR yahoo.com earliest=-2day latest=-1day" -d "output_mode=raw" https://testbox:8089/servicesNS/admin/search/search/jobs/export > socid12346_export.log
The result set was 3,193,277 records. The file is 3.2GB, which is far too big for me to open .
From my experience, I needed to export several GB of data, and the best performance I got is from dump command.
You need to do few things:
With CLI my export was running for more than 24 hours, with dump I got my data out in 3-4 hours.
Below are the options to export large amount of data from Splunk.
Option 1: Export from UI – but for this to work you may need to increase the web timeout by setting
Try setting server.socket_timeout in web.conf to 3 minutes.
server.socket_timeout=180
http://docs.splunk.com/Documentation/Splunk/6.0.6/Admin/Webconf
Option 2: The best is to use the CLI command as documented in link
http://docs.splunk.com/Documentation/Splunk/6.1.3/SearchReference/CLIsearchsyntax
This link provides various argument for CLI command argument.
Here refer argument "maxout" as by default CLI only export 100 rows. To export large amount of event add -maxout to CLI command to adjust the number of event to be exported.
For example I used below command and was able to export 200,000 events.
splunk search "index=_internal earliest=09/14/2014:23:59:00 latest=09/16/2014:01:00:00 " -output rawdata -maxout 200000 > c:/test123.dmp
Option 3: The other option could be to use REST call . Here is a blog that has useful information .
http://blogs.splunk.com/2013/09/15/exporting-large-results-sets-to-csv/
Also here is search that I used to export some million of records
curl -k -u admin:XXXXXX --data-urlencode search="search google.com OR yahoo.com earliest=-2day latest=-1day" -d "output_mode=raw" https://testbox:8089/servicesNS/admin/search/search/jobs/export > socid12346_export.log
The result set was 3,193,277 records. The file is 3.2GB, which is far too big for me to open .
Hello, I know this thread is quite old but I have a question about the REST option. When I check the job activity I always realized the job expired. Is there a way to configure the expiration time in the REST API?
Best regards.
my experience with ./splunk export ...
seems even better than these options.
Could you share command used and the results you got
./splunk export eventdata -index main -dir /var/tmp/export -sourcetype router_log
more options:
./splunk help export