Reporting

How to export very large datasets from Splunk?

_gkollias
Builder

I’m trying to find a way that I can export a very large data set without bringing down any search heads (which I already learned the hard way). Even 10 days of data produces around 10M rows and Splunk isn't able to handle that size of an export.

When I use the outputcsv command, I’m finding that the large output gets replicated in that SH's searchpeer bundles. This results in end users not being able to pull up any data when running searches on that SH. I thought I could simply export it and move it to /tmp/ before anything squirrely occurred.

Do you have any ideas on how else I can export large data sets from Splunk? Here is the search:

(index=1)
OR
(index=2)
OR
(index=3)
| table various field names

Thanks in Advance

0 Karma
1 Solution

sloshburch
Ultra Champion

It sounds like you're trying to use output.csv. If this is a onetime thing, then theoretically I think you can simply run the search the produces the table, then go to the dispatch directory to find the results and download them. Be sure to do it quickly since the job may only persist for ten minutes.

Additionally, you can improve the performance of the search by running stats instead of table:

(index=1)
 OR
 (index=2)
 OR
 (index=3)
 | stats count by various field names
 | fields - count

I believe table is a streaming command and therefore returns all results to the search heads for processing. That's a HUGE memory footprint. stats simply tells the indexers to only send the fields of concern back to the search head. You can run sistats to get an idea of what is returned. You'll notice its much less data than all event's payloads.

I believe both the stats usage and fetching from the dispatch should address your issue.

View solution in original post

sloshburch
Ultra Champion

It sounds like you're trying to use output.csv. If this is a onetime thing, then theoretically I think you can simply run the search the produces the table, then go to the dispatch directory to find the results and download them. Be sure to do it quickly since the job may only persist for ten minutes.

Additionally, you can improve the performance of the search by running stats instead of table:

(index=1)
 OR
 (index=2)
 OR
 (index=3)
 | stats count by various field names
 | fields - count

I believe table is a streaming command and therefore returns all results to the search heads for processing. That's a HUGE memory footprint. stats simply tells the indexers to only send the fields of concern back to the search head. You can run sistats to get an idea of what is returned. You'll notice its much less data than all event's payloads.

I believe both the stats usage and fetching from the dispatch should address your issue.

_gkollias
Builder

This technique worked very well. I am also able to do a normal export without having to fetch data manually from /dispatch. Thank you, Burch!

0 Karma

renjith_nair
Legend

Try dump (dump) command

OR

using Rest : http://blogs.splunk.com/2013/09/15/exporting-large-results-sets-to-csv/

Also refer : https://answers.splunk.com/answers/172454/what-are-my-options-to-export-large-amounts-of-spl.html

---
What goes around comes around. If it helps, hit it with Karma 🙂
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...