Solved: Re: Index planning

curtisb1024 · ‎04-02-2014

My company is just starting it's deployment of Splunk and one of the pieces of advice I've heard repeatedly from existing Splunk users is that we shouldn't just throw all our logs in one index. What I'm struggling with though is, what's the best way or what are good rules of thumb when deciding how to split our data up?

For example, we have several in house applications that I'd like to set up comprehensive monitoring for (data from application logs, perfmon data, logs from RabbitMQ, etc). I'm thinking we could set up the indexes for this data grouped either by application:

Application 1 Index

Application 1 logs
Application 1 server perfmon data
Application 1 RabbitMQ logs

Application 2 Index

Application 2 logs
Application 2 server perfmon data
Application 2 RabbitMQ logs

Or we could set up the indexes by the type of data contained in them:

Application logs index

Application 1 logs
Application 2 logs

Server perfmon index

Application 1 server perfmon data
Application 2 server perfmon data

RabbitMQ index

Application 1 rabbitMQ logs
Application 2 rabbitMQ logs

Of these two setups, which one is generally considered to be the better option and why? Or is there some other index partitioning I should consider instead?

mloven_splunk · ‎04-02-2014

You should probably base your indexes off of two criteria:

Retention time. How long do you need to keep the data in this index.
Security. If a group of users should be able to search only specific sets of data, then those sets of data need to be in their own index(es).

Other than that, either of the two methods you mentioned above would work. If it were me, and all other things being equal, I'd definitely go with method 2.

View solution in original post

somesoni2 · ‎04-02-2014

One thing your should consider is about data access. If its OK for Application 2 users to access Application 1 data, your second approach will work just fine. If the data has to be secured from inter application user access, approach 1 would be good.

mloven_splunk · ‎04-02-2014

You should probably base your indexes off of two criteria:

Retention time. How long do you need to keep the data in this index.
Security. If a group of users should be able to search only specific sets of data, then those sets of data need to be in their own index(es).

Other than that, either of the two methods you mentioned above would work. If it were me, and all other things being equal, I'd definitely go with method 2.

mloven_splunk · ‎04-02-2014

part 2 -

Retention time - it's likely that you'll want to keep certain data longer than others. This is an index-level setting. If you want perfmon data for 60 days, but application data for 180 days, you'll need to separate your indexes accordingly.

mloven_splunk · ‎04-02-2014

Sure. Two parts because it's a long answer -

Basically, it's the two criteria I mentioned; retention time and security.
Security - Say you have a group of sysadmins that need to be able to search server perfmon data, but don't need any access to any of the other stuff. With method 1, you couldn't give the sysadmins access to just the perfmon data and nothing else. With method 2, you'd simply limit the sysadmins role to only be able to search the perfmon index.

curtisb1024 · ‎04-02-2014

Can you elaborate on why you'd pick method 2 over method 1?

Index planning

Mastering Data Pipelines: Unlocking Value with Splunk

The Latest Cisco Integrations With Splunk Platform!

AI Adoption Hub Launch | Curated Resources to Get Started with AI in Splunk