Deployment Architecture

Index planning

curtisb1024
Path Finder

My company is just starting it's deployment of Splunk and one of the pieces of advice I've heard repeatedly from existing Splunk users is that we shouldn't just throw all our logs in one index. What I'm struggling with though is, what's the best way or what are good rules of thumb when deciding how to split our data up?

For example, we have several in house applications that I'd like to set up comprehensive monitoring for (data from application logs, perfmon data, logs from RabbitMQ, etc). I'm thinking we could set up the indexes for this data grouped either by application:

Application 1 Index

  • Application 1 logs
  • Application 1 server perfmon data
  • Application 1 RabbitMQ logs

Application 2 Index

  • Application 2 logs
  • Application 2 server perfmon data
  • Application 2 RabbitMQ logs

Or we could set up the indexes by the type of data contained in them:

Application logs index

  • Application 1 logs
  • Application 2 logs

Server perfmon index

  • Application 1 server perfmon data
  • Application 2 server perfmon data

RabbitMQ index

  • Application 1 rabbitMQ logs
  • Application 2 rabbitMQ logs

Of these two setups, which one is generally considered to be the better option and why? Or is there some other index partitioning I should consider instead?

Tags (2)
1 Solution

mloven_splunk
Splunk Employee
Splunk Employee

You should probably base your indexes off of two criteria:

  1. Retention time. How long do you need to keep the data in this index.
  2. Security. If a group of users should be able to search only specific sets of data, then those sets of data need to be in their own index(es).

Other than that, either of the two methods you mentioned above would work. If it were me, and all other things being equal, I'd definitely go with method 2.

View solution in original post

somesoni2
Revered Legend

One thing your should consider is about data access. If its OK for Application 2 users to access Application 1 data, your second approach will work just fine. If the data has to be secured from inter application user access, approach 1 would be good.

0 Karma

mloven_splunk
Splunk Employee
Splunk Employee

You should probably base your indexes off of two criteria:

  1. Retention time. How long do you need to keep the data in this index.
  2. Security. If a group of users should be able to search only specific sets of data, then those sets of data need to be in their own index(es).

Other than that, either of the two methods you mentioned above would work. If it were me, and all other things being equal, I'd definitely go with method 2.

mloven_splunk
Splunk Employee
Splunk Employee

part 2 -

Retention time - it's likely that you'll want to keep certain data longer than others. This is an index-level setting. If you want perfmon data for 60 days, but application data for 180 days, you'll need to separate your indexes accordingly.

mloven_splunk
Splunk Employee
Splunk Employee

Sure. Two parts because it's a long answer -

Basically, it's the two criteria I mentioned; retention time and security.
Security - Say you have a group of sysadmins that need to be able to search server perfmon data, but don't need any access to any of the other stuff. With method 1, you couldn't give the sysadmins access to just the perfmon data and nothing else. With method 2, you'd simply limit the sysadmins role to only be able to search the perfmon index.

0 Karma

curtisb1024
Path Finder

Can you elaborate on why you'd pick method 2 over method 1?

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...