Splunk Search

Indexing internals

egutesman
Engager

Hi,

I'm planning the event sources for Splunk and I'd like to know (if someone could give an answer) how does splunk decide which index to use. I know from database theory that everytime a query is processed, the query plan is built and there is an opportunity for improving the performance of a query. I know in Splunk one can define new indexes but I really don't know how specific one can be (i.e., telling which field is indexed, which type of index, etcetera).

I'm about to handle a lot of events and would love to fine-tune these aspects.

One more small question: Once an index has been used in the first part of a search, the opportunity of using indexes on successive piped operations is lost (same happens once you use and index on a relational database and then perform sub-queries or joins with the remaining entries), Is this correct?

Any pointer to index internals, how indexes are chosen for a query, would be of great help.

0 Karma
1 Solution

Ayn
Legend

I think you're confusing some things.

First of all, an "index" in Splunk is what you would otherwise call a "database" in an SQL world. Splunk's indexes are however not very much like SQL databases. Splunk indexes data without using a set database schema. Instead it identifies unique segments in data and stores metadata about in which events these segments can be found. Fields do not really exist until when you issue a search - a very few select fields are set at index-time (such as host, source, sourcetype, _time and a number of other internal fields) but the rest are created at search-time and it is important to note that creating more index-time fields does NOT boost performance other than in few special situations - often it's rather the opposite. This is usually a difficult concept to accept if you come from the SQL world.

For more information on how Splunk handles incoming data, read dwaddle's explanation about it here: http://answers.splunk.com/answers/54207/slow-search-when-evaluating-a-numeric-value?page=1&focusedAn...

View solution in original post

Ayn
Legend

I think you're confusing some things.

First of all, an "index" in Splunk is what you would otherwise call a "database" in an SQL world. Splunk's indexes are however not very much like SQL databases. Splunk indexes data without using a set database schema. Instead it identifies unique segments in data and stores metadata about in which events these segments can be found. Fields do not really exist until when you issue a search - a very few select fields are set at index-time (such as host, source, sourcetype, _time and a number of other internal fields) but the rest are created at search-time and it is important to note that creating more index-time fields does NOT boost performance other than in few special situations - often it's rather the opposite. This is usually a difficult concept to accept if you come from the SQL world.

For more information on how Splunk handles incoming data, read dwaddle's explanation about it here: http://answers.splunk.com/answers/54207/slow-search-when-evaluating-a-numeric-value?page=1&focusedAn...

egutesman
Engager

Thanks for the quick and precise response!

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...