Deployment Architecture

Multipule Large Data Sets

fk319
Builder

I have several sources of data that run into my Splunk server, some of the data sets exceeded 1G per day.

What is the best way to keep the data seperated so that searches are quicker?

I have defined apps and sourcetypes, but not knowing the internals, not sure if this direction to go in.

Tags (1)
1 Solution

Stephen_Sorkin
Splunk Employee
Splunk Employee

In general, for data volumes up to tens of GB per day there's no real advantage in separating the data to make search faster. There are some cases, however, where it makes sense to separate data into multiple indexes to gain "coherency" in the layout of data on disk to speed up raw data retrieval.

Specifically, if you have a low volume data set that's intermingled with a high volume data set and you commonly report on the entirety of the low volume data set, Splunk will have to decompress (and throw away) much of the high volume data set to get at the low volume one. In this case, segregating the lower volume data set into its own index can increase reporting performance from on the order of thousands of events per second to many tens of thousands of events per second.

If your searches are always over a small, scattered fraction of the data, and you can isolate that set, putting it in a separate index will help. If your reports are over many difference small, scattered data sets, without overlap, it's simplest and best to just keep the data in a single index.

View solution in original post

Stephen_Sorkin
Splunk Employee
Splunk Employee

In general, for data volumes up to tens of GB per day there's no real advantage in separating the data to make search faster. There are some cases, however, where it makes sense to separate data into multiple indexes to gain "coherency" in the layout of data on disk to speed up raw data retrieval.

Specifically, if you have a low volume data set that's intermingled with a high volume data set and you commonly report on the entirety of the low volume data set, Splunk will have to decompress (and throw away) much of the high volume data set to get at the low volume one. In this case, segregating the lower volume data set into its own index can increase reporting performance from on the order of thousands of events per second to many tens of thousands of events per second.

If your searches are always over a small, scattered fraction of the data, and you can isolate that set, putting it in a separate index will help. If your reports are over many difference small, scattered data sets, without overlap, it's simplest and best to just keep the data in a single index.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...