Solved: Index with one sourcetype - search performance / b...

splunkmagu · ‎06-29-2022

Hello,

I have created a few indexes, each containing data only from one source with one sourcetype.
From a search performance point of view - Is it necessary to include the sourcetype in each search if there is only one sourcetype "associated" with a specific index ?
Is Splunk's internal "engine" slower when I do not specify that ?

PickleRick · ‎06-29-2022

It's not that easy. Of course it might make it easier for the user to search data (it's easier to remember index=my_firewall instead of sourcetype=whatever:sourcetype:comes:with:the:TA) but more indexes means more buckets, more overhead of managing the indexes and so on.

And if you add remote storage options... it's getting complicated.

So it's usually best to stick with as few indexes as you can reasonably manage. But yes, you might sometimes think ahead and separate data "logically" because you expect that in foreseeable future there will be - for example - a separate team using some specific data and it will need to be restricted to this data only.

So there are general guidelines but in the end it comes to a balance between manageability, useability and performance and needs to be analyzed case by case.

View solution in original post

splunkmagu · ‎06-29-2022

Thank you everyone for quick replies but special thanks goes to @PickleRick 😃

PickleRick · ‎06-29-2022

As a general rule of thumb, there's not much point* of splitting events between different indexes unless you need to differentiate permissions or retention periods for events. In other cases you simply chose events from an index by other means (like sourcetype). Since sourcetype is an indexed field there is no significant performance penalty for using it to select events.

* There is a possible one other factor that might come into play in some particular cases - you might want to split data into different indexes due to performance reasons if you have sources which produce greatly different amounts of data (like milions of events per day vs. several dozens of event per day). But that's a different issue.

isoutamo · ‎06-29-2022

As other already said in most cases it’s much better to use as few indexes as possible. The only reasons to create separate indexes are:

access rights
retention time
cardinality/amount of data per sourcetype/source/host

The sourcetype is there just for separate those events inside one index also source could/should use for that. Sourcetype describes the the lexical format of log file, so no need for create separate indexes for those. Usually adding a new indexes add lot of additional work and complexity to your environment instead of helping you.

gcusello · ‎06-29-2022

Hi @splunkmagu,

if you have one sourcetype in each index, it isn't important to specify it in searches.

The question should be: whay do you have a sourcetype for each index?

Usually an index is created for data with the same retention and the same access grants.

Ciao.

Giuseppe

splunkmagu · ‎06-29-2022

Hi @gcusello,

some time ago we asked Support about some indexes' creation best practices and they replied (and provided some documentation links) that we should create many indexes if the data that was being ingested differed in "structure".
And it does in our case, it's a custom application data that had a custom sourcetype created (majority of cases) because globally available sourcetypes (through add-ons, TAs, etc) didn't extract fields correctly.

PickleRick · ‎06-29-2022

It's not that easy. Of course it might make it easier for the user to search data (it's easier to remember index=my_firewall instead of sourcetype=whatever:sourcetype:comes:with:the:TA) but more indexes means more buckets, more overhead of managing the indexes and so on.

And if you add remote storage options... it's getting complicated.

So it's usually best to stick with as few indexes as you can reasonably manage. But yes, you might sometimes think ahead and separate data "logically" because you expect that in foreseeable future there will be - for example - a separate team using some specific data and it will need to be restricted to this data only.

So there are general guidelines but in the end it comes to a balance between manageability, useability and performance and needs to be analyzed case by case.

Index with one sourcetype - search performance / best practices

other

search job inspector

How to Get Started with Splunk Data Management Pipeline Builders (Edge Processor & ...

Out of the Box to Up And Running - Streamlined Observability for Your Cloud ...

Splunk Smartness with Brandon Sternfield | Episode 3