Deployment Architecture

Do Splunk indexers in an indexer cluster have to run Splunk on the same partition on every machine?

hettervik
Builder

Hi!

I have two VMs with Splunk that I want to make an indexer cluster out of. The VMs are almost identical, but the partitioning are somewhat different on the two machines. On one of them, Splunk runs on a filesystem /dev/folders/centos-splunk mounted on /opt/splunk, while on the other I think Splunk runs on a filesystem /dev/folders/centos-root mounted on /. I'll attach the output from df -H on the two machines. Does anyone know if this will cause any problems when initiating the indexer cluster?

Any help would be much appreciated! Thanks!

Filesystem                                  Size      Used  Avail      Use%    Mounted on
/dev/mapper/centos-root                      54G      4.0G    50G        8%           /
/dev/folders/centos-splunk                  496G    461G   35G        93%         /opt/splunk
/dev/sda1                                   521M  151M    371M    29%         /boot


Filesystem                                  Size       Used    Avail     Use%    Mounted on
/dev/folders/centos-root                     54G     31G        24G      57%          /
/dev/sdc1                                    521M   151M     371M   29%          /boot
/dev/folders/centos-home                   496G   416G      81G      84%           /home

Best regards,
Martin

0 Karma
1 Solution

nnmiller
Contributor

In both cases, Splunk will be installed in /opt/splunk, so it shouldn't matter at all. Splunk doesn't look at the underlying device it's being installed onto.

That said, it's generally best to make /opt/splunk/var/lib/splunk a separate file system, so that when you upgrade, you can just use tar to make a quick backup of $SPLUNK_HOME without grabbing the index files. Having this tar file will make roll-back from a problematic upgrade very easy.

Given how much data is in use in your examples, I would question whether that machine will be performant enough for Splunk. How much data are you indexing? How many users are searching the data? Or are these already in use as indexers?

If they are in use as indexers, I would not advise converting them to a cluster, but building a cluster on new instances. The configuration changes necessary for index clustering means that all configurations on all indexers need to be consistent. Having non-clustered data and configs on the indexers will make building the cluster a lot more difficult than the fact that the file systems have different mount points.

View solution in original post

nnmiller
Contributor

In both cases, Splunk will be installed in /opt/splunk, so it shouldn't matter at all. Splunk doesn't look at the underlying device it's being installed onto.

That said, it's generally best to make /opt/splunk/var/lib/splunk a separate file system, so that when you upgrade, you can just use tar to make a quick backup of $SPLUNK_HOME without grabbing the index files. Having this tar file will make roll-back from a problematic upgrade very easy.

Given how much data is in use in your examples, I would question whether that machine will be performant enough for Splunk. How much data are you indexing? How many users are searching the data? Or are these already in use as indexers?

If they are in use as indexers, I would not advise converting them to a cluster, but building a cluster on new instances. The configuration changes necessary for index clustering means that all configurations on all indexers need to be consistent. Having non-clustered data and configs on the indexers will make building the cluster a lot more difficult than the fact that the file systems have different mount points.

hettervik
Builder

Thanks a lot for a detailed and well thought out answer!

I had not thought of putting the indexers in a seperate filesystem, smart! That being said, I'm still a bit novice in dealing with partitions. I can't quite wrap my head around the concepts, but I'll get there.

Yes, the machines are not ideal. I hope that we'll get some better ones when upgrading. Then again, someone will have to take the cost, us or the customer. You know how it is.

The indexers are already in use, but this shouldn't be a problem, right? The only "problem" is that data already indexed on the indexers aren't clustered, as far as I know. Hm. I'll have to look into this as well. Thanks.

0 Karma

nnmiller
Contributor

Clustered indexers have a 'cluster master' (CM) that manage their configuration files. So a hybrid set-up is going to complicate things with respect to configuration file precedence and overall management. Ideally, you'd migrate the existing configurations to the CM.

If you're going to convert, I would suggest setting up a set of small test VMs with IDXes having existing data, and convert them to familiarize yourself with the process. Testing is going to be especially important if you want to migrate your existing configurations to the cluster, since I doubt that the two IDXes have consistent configurations right now, given the partitioning scheme.

As for the partitioning aspect, do you have a UNIX system administrator in your organization? If so, you might want to sit down with him/her and discuss your requirements. Since you are in a VM situation, you could get a small-ish partition (20GB or so), and mount that at a temporary mount point, move /opt/splunk excluding /opt/splunk/var/lib to the new, smaller partition (or copy to new and then delete old). Then change /etc/fstab to mount the new partition at /opt; mount the existing partition at /opt/splunk/var/lib.

Essentially, *NIX doesn't care where a partition is mounted; you just need to make sure when you set the fsck order that partitions that are closer to root are have lower fsck numbers, since the system can't mount the partition until the fsck is complete.

Is it possible to build new systems with consistent partitioning and migrate the data over to them? That might be cleaner, overall. It would consume temporary resources but allow you to release the existing resources when the migration is completed.

mtranchita
Communicator

The right answer I think is don't do that. The practical answer is it depends. As long as you only use references to $SPLUNK_HOME and $SPLUNK_DB in configurations that get distributed it might work.
If you can't move things around to normalize it you might be able to try symbolic links or something like that to symbolically normalize it to the same path.

Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...