To add a work-around to dwaddle's answer, a viable approach on non-windows systems is to have multiple instances of Splunk write to different directories on the same logical volume.
In other words, lets say you have:
[root@foobar ~]# df -h /data
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00 130G 73G 51G 60% /data
Install one instance of Splunk in /data/splunk1 and another in /data/splunk2. Each instance will be writing to discreet compartments of the same logical volume, thus avoiding the issues dwaddle describes above.
OK, I'm trying to understand this. I have two indexers, with only one running web, but doing distributed search.
so right now, this has no redundancy. I need to be able to search all logs even if one goes down, although I understand that performance will be reduced.
How do I accomplish this?
This should probably be a separate question.
Please review the following documentation:
http://docs.splunk.com/Documentation/Splunk/4.2.4/Installation/Highavailabilityreferencearchitecture
To add a work-around to dwaddle's answer, a viable approach on non-windows systems is to have multiple instances of Splunk write to different directories on the same logical volume.
In other words, lets say you have:
[root@foobar ~]# df -h /data
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00 130G 73G 51G 60% /data
Install one instance of Splunk in /data/splunk1 and another in /data/splunk2. Each instance will be writing to discreet compartments of the same logical volume, thus avoiding the issues dwaddle describes above.
One other complication, what if one indexer goes down (/opt/splunk1)? Will the data in /data/splunk1 still be picked up in a search that subsequently goes through /opt/splunk2 indexer?
araitz, thanks again for the response. I want to make sure I understand -- forgive the potentially obvious question.
When you say install one instance of Splunk in /data/splunk1 and another in /data/splunk2 -- are you saying to have 2 indexer instances of Splunk installed in say /opt/splunk1 and /opt/splunk2 each having an index called 'sample' writing to separate db files located respectively in /data/splunk1 and /data/splunk2.
And a search head will search across the two indexers (/opt/splunk1 & /opt/splunk2) on the same index 'sample'.
Yes?
Thanks, James.
You are conflating the directories on an indexer's disks with the notion of a Splunk index. An index is an abstract entity that represents a data container, and may be composed of one or many components on the underlying file systems. The way to scale Splunk is to add more instances and use distributed search head to search across each instance. Thus, the search "index=main" run on a search head that distributes searches across several index servers is a search against one index ("main"), regardless of how many index servers are involved.
Thank you for the response. Writing to two different directories is essentially having two indexes though isn't it?
If I understand your question, this is not a workable topology.
Being pedantic, a Splunk index is comprised of one or more buckets -- each bucket is a shard of the total index. A bucket can only be written to by a single splunkd instance at a time. Depending on your configuration, a single indexer can have multiple buckets for an index which it writes to in parallel. Similarly, multiple indexers can each have their own buckets for an index and the data can be sprayed across all of the buckets on all of the indexers in parallel. But, even in this topology, each bucket is only being written to by a single splunkd process.
Also, there is no advantage to this. If you have the IO capacity, you can set up multiple splunk instances instead and get better performance on both search and indexing. If you don't have extra IO capacity, this doesn't help you anyway.
You absolutely should not have multiple indexers write to the same index. This is not a supported configuration, and in fact is explicitly recommended against on this thread and elsewhere. Your tests might show that it is possible for a short time and in under certain conditions, but you face serious data integrity and other unknown conditions if you insist on taking this approach.
This answer is logical but my tests have shown otherwise. I am writing to the same index (same directory and db) from multiple indexers w/out seeing any issues. Recommendations on how to confirm this?
Regards, James.