Solved: Re: What would be the best practice for creating i...

chawagon03 · ‎08-20-2015

I'm looking to get information on what is the best way to make indexes for data.

Background: Setting up a clustered environment with both cluster indexers (replication factor 3) and clustered (distributed? Still confused on clustered search head vs distributed search) search heads. Our current approach is to put an "application" in its own index collecting anything the Universal Forwarder will send us including log data and basic OS data.

Question: Our new approach is to create an 'OS' index to handle all UF OS stats from all servers and use tags to help with search, and to create multiple indexes per app to make an index have common data. We have around 170 applications we want to start monitoring as they are our class 1 and class 2 apps so that would be 3 x 170 = 510 indexes. Is this a horrid approach?

Example:
Application X
- Index OS
- Index X
- Index X_Critical (for when we need to ramp up interval time for troubleshooting real time and would be cleaned out after

jensonthottian · ‎08-20-2015

Worked with a huge financial company (in the fortune100).

The followed a similiar approach:

An index "OS" for all events related to OS be it windows, linux, unix solaris.
An index "app" for all events related to applications
An index "network" for all network events

Used tags and eventtypes to help with Search. Worked for them well.

I am not sure about index x_critical -- Wouldn't this be controlled by the log levels.

View solution in original post

maciep · ‎08-20-2015

I'm not sure I can provide the best advice but that does seem like a lot of indexes. I think generally speaking the two questions to ask when creating a new index are:

Do I need to control access differently to this data?
Will this data have different retention requirements?

If the answer to those are no, then you don't necessarily need a new index. So if every application has to be limited to only those app owners/support folks, then having them separate might make sense. Or if you want to keep data from different applications longer/shorter period time, then it might make sense. Otherwise, it might not be necessary.

That said, we do have some indexes for just applications. We have some for specific support teams. We have others for like infrastructures. Unfortunately or fortunately, we don't have any hard and fast rules. We try to use common sense as our guide posts, but that may not be the best approach.

I would just caution to be careful of handcuffing yourself down the line. Make sure that whatever path you choose, you leave wiggle room to adjust for those unknown scenarios.

jensonthottian · ‎08-20-2015

Worked with a huge financial company (in the fortune100).

The followed a similiar approach:

An index "OS" for all events related to OS be it windows, linux, unix solaris.
An index "app" for all events related to applications
An index "network" for all network events

Used tags and eventtypes to help with Search. Worked for them well.

I am not sure about index x_critical -- Wouldn't this be controlled by the log levels.

chawagon03 · ‎08-20-2015

Also in fortune 100 in manufactoring 🙂

Seems I'm thinking of the same approach you have done for you company and glad it worked well. x_critical is mainly for my team as we are a performance and reliability team aka when an important app goes down, we get called and troubleshoot the issue to get it back up as soon as possible. May not be needed as it would be at a log level already (great point)

Do you have any replication factor or retention policy in place?

jensonthottian · ‎08-20-2015

replication factor - 3.

Retention Policy
60 Days OS Data - then archive . Archive max to 1 year
90 Days App Data - then archive, max depends
60 Days Network Data - then archive, max to 1 year

What would be the best practice for creating indexes in our distributed search environment with indexer clustering?

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!

Join the Conversation

What would be the best practice for creating indexes in our distributed search environment with indexer clustering?

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!