Solved: Re: Index Best Practice

ipicbc · ‎03-07-2017

I am ingesting events from log files. There are 50 log files, each with 10,000 lines a day, and they get rolled daily with retention of 10 days. The file formats are identical, so there is only 1 source type. So I have 500 files in total of which 50 are changing at any time, and maybe 5,000,000 total events in Splunk.

My question relates to best practice for indexing for query performance. I don't believe that there are good reasons in my use case for go in any particular direction due to access control or retention.

At the moment I just have 1 index for everything. But I could create a new index each day across all log files, including the date in the index name. Alternatively I could have a separate index for each log file. Or both.

I would like to hear about what would be best practice in terms of theory and your practical experience, please.

mattymo · ‎03-07-2017

Hi ipicbc!

Based in what you have advised, I would suggest you are already set up for success.

If the data is truly all the same format, then one sourcetype is the way to go.

If there are any logical segregations in the hosts/files..perhaps the service or function the hosts provide, the group who will be searching the data (although you already eluded to no need for access control), or any other grouping, then maybe I'd split up the indexes accordingly.

Otherwise I would keep the one index and rely on writing searches that are explicit in targetting the events I want to see. Creating tons of indexes will lead to a bad time. Whatever perf you might gain will be easily be outweighed by admin overhead.

Splunk creates index time fields like _time, host, sourcetype, source that allow you filter your events down efficiently. The Search processing language (SPL) should be able to write very efficient searches that will make sifting through those events real easy and performant.

If the events contain fields that you want to report on and the searches need to be even faster, the next levers to pull for ensuring quick search/report results would be summary indexing and data modelling/creation of tsidx files, which will help prepare the info you want to work with and shed some of the data you don't need to gain insight into your data.

There are many ways to ensure your search performance is optimum, but in short, based in what you have advised, I wouldn't chase segementing indexes as one of them.

- MattyMo

View solution in original post

mattymo · ‎03-07-2017

Hi ipicbc!

Based in what you have advised, I would suggest you are already set up for success.

If the data is truly all the same format, then one sourcetype is the way to go.

If there are any logical segregations in the hosts/files..perhaps the service or function the hosts provide, the group who will be searching the data (although you already eluded to no need for access control), or any other grouping, then maybe I'd split up the indexes accordingly.

Otherwise I would keep the one index and rely on writing searches that are explicit in targetting the events I want to see. Creating tons of indexes will lead to a bad time. Whatever perf you might gain will be easily be outweighed by admin overhead.

Splunk creates index time fields like _time, host, sourcetype, source that allow you filter your events down efficiently. The Search processing language (SPL) should be able to write very efficient searches that will make sifting through those events real easy and performant.

If the events contain fields that you want to report on and the searches need to be even faster, the next levers to pull for ensuring quick search/report results would be summary indexing and data modelling/creation of tsidx files, which will help prepare the info you want to work with and shed some of the data you don't need to gain insight into your data.

There are many ways to ensure your search performance is optimum, but in short, based in what you have advised, I wouldn't chase segementing indexes as one of them.

- MattyMo

mattymo · ‎03-07-2017

As a follow up topic that can also help ensure your indexes are configured as best as they can be, definitely get comfortable with the concept of buckets and how they age:

http://docs.splunk.com/Documentation/Splunk/6.5.2/Indexer/HowSplunkstoresindexes

General rule, if the index is doing more that 10GB a day, you want to ensure auto_high_volume is used.

See docs for the full story.

- MattyMo

ipicbc · ‎03-07-2017

Great advice, very much appreciated!

ryan_gates · ‎02-02-2018

I downvoted this post because this should be a comment rather than an answer.

ppablo · ‎02-05-2018

Hey @ryan_gates

Just fyi, please reserve downvoting for proposed solutions that could possibly be harmful in a Splunk environment or is against known best practices, not posting something in the wrong area. We want to encourage an environment in the forum where people don't feel afraid to contribute. Just commenting that the answer should have been a comment would have been fine for something like this, and we can just get it converted from there.

Thanks for being a part of the Answers community, and hope to see some questions and answers from you in the near future 🙂

Patrick

Index Best Practice

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

From Data to Insight: Announcing the Winners of the Splunk Dashboard Contest

Splunk Developers: Construct Your Future at the .conf26 Builder Bar

Quick connection discovery mode for forwarders

Join the Conversation

Index Best Practice

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

From Data to Insight: Announcing the Winners of the Splunk Dashboard Contest

Splunk Developers: Construct Your Future at the .conf26 Builder Bar

Quick connection discovery mode for forwarders