Solved: Indexing internals

egutesman · ‎12-04-2013

Hi,

I'm planning the event sources for Splunk and I'd like to know (if someone could give an answer) how does splunk decide which index to use. I know from database theory that everytime a query is processed, the query plan is built and there is an opportunity for improving the performance of a query. I know in Splunk one can define new indexes but I really don't know how specific one can be (i.e., telling which field is indexed, which type of index, etcetera).

I'm about to handle a lot of events and would love to fine-tune these aspects.

One more small question: Once an index has been used in the first part of a search, the opportunity of using indexes on successive piped operations is lost (same happens once you use and index on a relational database and then perform sub-queries or joins with the remaining entries), Is this correct?

Any pointer to index internals, how indexes are chosen for a query, would be of great help.

Ayn · ‎12-04-2013

I think you're confusing some things.

First of all, an "index" in Splunk is what you would otherwise call a "database" in an SQL world. Splunk's indexes are however not very much like SQL databases. Splunk indexes data without using a set database schema. Instead it identifies unique segments in data and stores metadata about in which events these segments can be found. Fields do not really exist until when you issue a search - a very few select fields are set at index-time (such as host, source, sourcetype, _time and a number of other internal fields) but the rest are created at search-time and it is important to note that creating more index-time fields does NOT boost performance other than in few special situations - often it's rather the opposite. This is usually a difficult concept to accept if you come from the SQL world.

For more information on how Splunk handles incoming data, read dwaddle's explanation about it here: http://answers.splunk.com/answers/54207/slow-search-when-evaluating-a-numeric-value?page=1&focusedAn...

View solution in original post

Ayn · ‎12-04-2013

I think you're confusing some things.

First of all, an "index" in Splunk is what you would otherwise call a "database" in an SQL world. Splunk's indexes are however not very much like SQL databases. Splunk indexes data without using a set database schema. Instead it identifies unique segments in data and stores metadata about in which events these segments can be found. Fields do not really exist until when you issue a search - a very few select fields are set at index-time (such as host, source, sourcetype, _time and a number of other internal fields) but the rest are created at search-time and it is important to note that creating more index-time fields does NOT boost performance other than in few special situations - often it's rather the opposite. This is usually a difficult concept to accept if you come from the SQL world.

For more information on how Splunk handles incoming data, read dwaddle's explanation about it here: http://answers.splunk.com/answers/54207/slow-search-when-evaluating-a-numeric-value?page=1&focusedAn...

egutesman · ‎12-04-2013

Thanks for the quick and precise response!

Indexing internals

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms