My company is just starting it's deployment of Splunk and one of the pieces of advice I've heard repeatedly from existing Splunk users is that we shouldn't just throw all our logs in one index. What I'm struggling with though is, what's the best way or what are good rules of thumb when deciding how to split our data up?
For example, we have several in house applications that I'd like to set up comprehensive monitoring for (data from application logs, perfmon data, logs from RabbitMQ, etc). I'm thinking we could set up the indexes for this data grouped either by application:
Application 1 Index
Application 2 Index
Or we could set up the indexes by the type of data contained in them:
Application logs index
Server perfmon index
RabbitMQ index
Of these two setups, which one is generally considered to be the better option and why? Or is there some other index partitioning I should consider instead?
You should probably base your indexes off of two criteria:
Other than that, either of the two methods you mentioned above would work. If it were me, and all other things being equal, I'd definitely go with method 2.
One thing your should consider is about data access. If its OK for Application 2 users to access Application 1 data, your second approach will work just fine. If the data has to be secured from inter application user access, approach 1 would be good.
You should probably base your indexes off of two criteria:
Other than that, either of the two methods you mentioned above would work. If it were me, and all other things being equal, I'd definitely go with method 2.
part 2 -
Retention time - it's likely that you'll want to keep certain data longer than others. This is an index-level setting. If you want perfmon data for 60 days, but application data for 180 days, you'll need to separate your indexes accordingly.
Sure. Two parts because it's a long answer -
Basically, it's the two criteria I mentioned; retention time and security.
Security - Say you have a group of sysadmins that need to be able to search server perfmon data, but don't need any access to any of the other stuff. With method 1, you couldn't give the sysadmins access to just the perfmon data and nothing else. With method 2, you'd simply limit the sysadmins role to only be able to search the perfmon index.
Can you elaborate on why you'd pick method 2 over method 1?