I have a large index cluster with bare metal machines that have different hardware configurations. The number of SDD's, their size, and performance specs differ across the indexers. So what is the best way to use volume tags to abstract these details from the indexes?
My thought is to start with a "hotwarm" volume tag, like the example in the indexes.conf spec, that would be defined in $SPLUNKHOME/etc/syste/local/indexes.conf and would point to the indexers fastest device of a moderate size. But that leaves me with an variable array of devices for the "cold#" volume tags.
Is there a way to add a list of devices under one volume tag? Otherwise the $SPLUNKHOME/etc/master-apps/cluster/local/indexes.conf file, managed by the index master, will not address all the devices on the indexers.
Any advice is welcome. Thanks.
you are in a not ideal situation
use the weakest link, e.g. your smallest size, weakest performance indexer as your guideline,
anything else and you are risking loosing data
Risk loosing data? These are file operations when it comes down to indexers storing data on the FS. How is that that heterogeneous HW configuration can cause an indexer to loose data?
Let me break down your question in order to answer it :
So what is the best way to use volume tags to abstract these details from the indexes? My thought is to start with a "hot_warm" volume tag, like the example in the indexes.conf spec, that would be defined in $SPLUNK_HOME/etc/syste/local/indexes.conf and would point to the indexers fastest device of a moderate size. But that leaves me with an variable array of devices for the "cold#" volume tags.
This approach could work but since you only have on volume you can define for cold buckets then you'll need to pick one and work with it. Also be careful to set a limit for the max disk size to be used by hot/warm/cold to avoid getting some disks on some machine full while others on other machines are still okay volume-wise.
Is there a way to add a list of devices under one volume tag? Otherwise the $SPLUNK_HOME/etc/master-apps/_cluster/local/indexes.conf file, managed by the index master, will not address all the devices on the indexers.
You can create the volumes and then on each indexer you can chose to use them or not, but the best way would be to use local configuration in order to keep the same name for disk type volume across all indexers which will allow you to manage configuration locally via the cluster master. This will make it easier for you to know which volume is what.
That being said, you should be aware that what you're trying to do does not follow best practice and that all indexers will be limited to the slowest one. Also all disk usage for hot/warm/cold will be limited to the indexer with the lowest disk space. This means you'll be wasting ressources and under utilizing most of your hosts. Better get a setup that's homogenous in order to avoid complicating the cluster configuration for nothing.
I hope this helps. Let me know if you need further details !
Hey @DavidHourani thanks for the answer. Make a lot of sense. I appreciate you taking the time to answer.
I do have one follow up question based on this comment, "all indexers will be limited to the slowest one"...how is this possible? The indexers are not synchronized in file I/O. These are file operations on the indexers as they just write files to the local FS.
Is this an issue with the Index Master and syncing metadata? Do the indexers synchronize on metadata across the cluster like HDFS NN/JN? I hope not.
Can you explain what you're experienced based on your comment?
Again, much appreciate your time and effort.
Hi @thormanrd, you're absolutely right there is no synchronization at file I/O.
This high level example might help you understand why the cluster is as slow as the slowest indexer:
Say you have three indexers A, B and C each with primary copy of a bucket 1, 2 and 3 respectively. So A has bucket 1, B has bucket 2 and C has bucket 3. Now suppose indexer A and B take 1 second to read data from a bucket and indexer C takes 2 seconds. Then suppose you run a search to fetch data from bucket 1, 2 and 3. Each indexer will grab its corresponding bucket and send it out to the SH, indexer A will take one second indexer B will take one second and indexer C will take two seconds making the total read time 2 seconds. This means you have to wait for the slowest to return results before your search completes which keeps any search you run as slow as your slowest indexer.
let me know if this this helps !
That makes perfect sense. However, I'm coming at this from the other side of volume tags. Where I want to maximize the storage size and lifespan of my data based on the devices available on my machines. And the devices are not all the same on my indexers. So using volume tags at the individual indexer level to control maxVolumeDataSizeMB and the path (i.e. device) makes sense to me. I don't think there is a pacing issue of least common denominator in the write cycle or transition cycles (e.g. warm-to-cold) using this approach.
But I agree with you completely on the read cycle. Slowest indexer does pace the search results.
Thanks for clarifying.
Sorry my explanation was focused on the performance. To answer your question on volume :
1- No need to change the path individually for each indexer as you could use symlinks instead making your config a lot easier.
2- If you consider configuring different volume sizes on individual indexer then that might end up backfiring. Since the volume of data forwarded to your indexers is the same --considering you are load balancing data equally on all indexers -- then the indexers that have higher volume allocation will retain data longer than the others. This will make holes in your historic data, where some of your indexers roll out the data to frozen because disk space limit was reached while others hold on to the data because they have more disk space. You will have a similar behavior on warm to cold rotation if you configure different sizes there as well. This will make your historic data unreliable since you're losing some of it and you're left with only partial data.
let me know if that is clear or if you need more clarification, happy to help out. It's best practice to keep the cluster simple and to avoid managing indexers individually, changing the max volume config individually on indexers will actually harm more than it will help.
Great insight, I didn't even think about gaps in the timeline due to different archival lifespans due to size. Yep, you're right.
I'll work on #1 and configure the least common denominator.
You're welcome @thormanrd ! Let me know if you need anything else 🙂