Splunk Search

Index with one sourcetype - search performance / best practices

splunkmagu
Explorer

Hello,

I have created a few indexes, each containing data only from one source with one sourcetype.
From a search performance point of view - Is it necessary to include the sourcetype in each search if there is only one sourcetype "associated" with a specific index ?
Is Splunk's internal "engine" slower when I do not specify that ?

Labels (2)
0 Karma
1 Solution

PickleRick
Ultra Champion

It's not that easy. Of course it might make it easier for the user to search data (it's easier to remember index=my_firewall instead of sourcetype=whatever:sourcetype:comes:with:the:TA) but more indexes means more buckets, more overhead of managing the indexes and so on.

And if you add remote storage options... it's getting complicated.

So it's usually best to stick with as few indexes as you can reasonably manage. But yes, you might sometimes think ahead and separate data "logically" because you expect that in foreseeable future there will be - for example - a separate team using some specific data and it will need to be restricted to this data only.

So there are general guidelines but in the end it comes to a balance between manageability, useability and performance and needs to be analyzed case by case.

View solution in original post

splunkmagu
Explorer

Thank you everyone for quick replies but special thanks goes to @PickleRick  😃

0 Karma

PickleRick
Ultra Champion

As a general rule of thumb, there's not much point* of splitting events between different indexes unless you need to differentiate permissions or retention periods for events. In other cases you simply chose events from an index by other means (like sourcetype). Since sourcetype is an indexed field there is no significant performance penalty for using it to select events.

* There is a possible one other factor that might come into play in some particular cases - you might want to split data into different indexes due to performance reasons if you have sources which produce greatly different amounts of data (like milions of events per day vs. several dozens of event per day). But that's a different issue.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

As other already said in most cases it’s much better to use as few indexes as possible. The only reasons to create separate indexes are:

  • access rights
  • retention time
  • cardinality/amount of data per sourcetype/source/host

The sourcetype is there just for separate those events inside one index also source could/should use for that. Sourcetype describes the the lexical format of log file, so no need for create separate indexes for those. Usually adding a new indexes add lot of additional work and complexity to your environment instead of helping you.

0 Karma

gcusello
Legend

Hi @splunkmagu,

if you have one sourcetype in each index, it isn't important to specify it in searches.

The question should be: whay do you have a sourcetype for each index?

Usually an index is created for data with the same retention and the same access grants.

Ciao.

Giuseppe

splunkmagu
Explorer

Hi @gcusello,

some time ago we asked Support about some indexes' creation best practices and they replied (and provided some documentation links) that we should create many indexes if the data that was being ingested differed in "structure".
And it does in our case, it's a custom application data that had a custom sourcetype created (majority of cases) because globally available sourcetypes (through add-ons, TAs, etc) didn't extract fields correctly.

0 Karma

PickleRick
Ultra Champion

It's not that easy. Of course it might make it easier for the user to search data (it's easier to remember index=my_firewall instead of sourcetype=whatever:sourcetype:comes:with:the:TA) but more indexes means more buckets, more overhead of managing the indexes and so on.

And if you add remote storage options... it's getting complicated.

So it's usually best to stick with as few indexes as you can reasonably manage. But yes, you might sometimes think ahead and separate data "logically" because you expect that in foreseeable future there will be - for example - a separate team using some specific data and it will need to be restricted to this data only.

So there are general guidelines but in the end it comes to a balance between manageability, useability and performance and needs to be analyzed case by case.

Get Updates on the Splunk Community!

What’s new on Splunk Lantern in August

This month’s Splunk Lantern update gives you the low-down on all of the articles we’ve published over the past ...

Welcome to the Future of Data Search & Exploration

You have more data coming at you than ever before. Over the next five years, the total amount of digital data ...

This Week's Community Digest - Splunk Community Happenings [8.3.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...