Splunk Search

Index with one sourcetype - search performance / best practices

splunkmagu
Explorer

Hello,

I have created a few indexes, each containing data only from one source with one sourcetype.
From a search performance point of view - Is it necessary to include the sourcetype in each search if there is only one sourcetype "associated" with a specific index ?
Is Splunk's internal "engine" slower when I do not specify that ?

Labels (1)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

It's not that easy. Of course it might make it easier for the user to search data (it's easier to remember index=my_firewall instead of sourcetype=whatever:sourcetype:comes:with:the:TA) but more indexes means more buckets, more overhead of managing the indexes and so on.

And if you add remote storage options... it's getting complicated.

So it's usually best to stick with as few indexes as you can reasonably manage. But yes, you might sometimes think ahead and separate data "logically" because you expect that in foreseeable future there will be - for example - a separate team using some specific data and it will need to be restricted to this data only.

So there are general guidelines but in the end it comes to a balance between manageability, useability and performance and needs to be analyzed case by case.

View solution in original post

splunkmagu
Explorer

Thank you everyone for quick replies but special thanks goes to @PickleRick  😃

0 Karma

PickleRick
SplunkTrust
SplunkTrust

As a general rule of thumb, there's not much point* of splitting events between different indexes unless you need to differentiate permissions or retention periods for events. In other cases you simply chose events from an index by other means (like sourcetype). Since sourcetype is an indexed field there is no significant performance penalty for using it to select events.

* There is a possible one other factor that might come into play in some particular cases - you might want to split data into different indexes due to performance reasons if you have sources which produce greatly different amounts of data (like milions of events per day vs. several dozens of event per day). But that's a different issue.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

As other already said in most cases it’s much better to use as few indexes as possible. The only reasons to create separate indexes are:

  • access rights
  • retention time
  • cardinality/amount of data per sourcetype/source/host

The sourcetype is there just for separate those events inside one index also source could/should use for that. Sourcetype describes the the lexical format of log file, so no need for create separate indexes for those. Usually adding a new indexes add lot of additional work and complexity to your environment instead of helping you.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @splunkmagu,

if you have one sourcetype in each index, it isn't important to specify it in searches.

The question should be: whay do you have a sourcetype for each index?

Usually an index is created for data with the same retention and the same access grants.

Ciao.

Giuseppe

splunkmagu
Explorer

Hi @gcusello,

some time ago we asked Support about some indexes' creation best practices and they replied (and provided some documentation links) that we should create many indexes if the data that was being ingested differed in "structure".
And it does in our case, it's a custom application data that had a custom sourcetype created (majority of cases) because globally available sourcetypes (through add-ons, TAs, etc) didn't extract fields correctly.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

It's not that easy. Of course it might make it easier for the user to search data (it's easier to remember index=my_firewall instead of sourcetype=whatever:sourcetype:comes:with:the:TA) but more indexes means more buckets, more overhead of managing the indexes and so on.

And if you add remote storage options... it's getting complicated.

So it's usually best to stick with as few indexes as you can reasonably manage. But yes, you might sometimes think ahead and separate data "logically" because you expect that in foreseeable future there will be - for example - a separate team using some specific data and it will need to be restricted to this data only.

So there are general guidelines but in the end it comes to a balance between manageability, useability and performance and needs to be analyzed case by case.

Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...