Getting Data In

Can multiple indexers write to a single index

jamesoconnell
Path Finder

My question is about Splunk topology.

Can multiple indexer processes write to a single physical index? Or is there a file I/O conflict that occurs in this setup?

Regards,
James O'Connell.

Tags (3)
1 Solution

araitz
Splunk Employee
Splunk Employee

To add a work-around to dwaddle's answer, a viable approach on non-windows systems is to have multiple instances of Splunk write to different directories on the same logical volume.

In other words, lets say you have:

 [root@foobar ~]# df -h /data
 Filesystem                       Size  Used Avail Use% Mounted on
 /dev/mapper/VolGroup00-LogVol00  130G   73G   51G  60% /data

Install one instance of Splunk in /data/splunk1 and another in /data/splunk2. Each instance will be writing to discreet compartments of the same logical volume, thus avoiding the issues dwaddle describes above.

View solution in original post

mmattek
Path Finder

OK, I'm trying to understand this. I have two indexers, with only one running web, but doing distributed search.

so right now, this has no redundancy. I need to be able to search all logs even if one goes down, although I understand that performance will be reduced.

How do I accomplish this?

0 Karma

mmattek
Path Finder
0 Karma

araitz
Splunk Employee
Splunk Employee

This should probably be a separate question.

Please review the following documentation:

http://docs.splunk.com/Documentation/Splunk/4.2.4/Installation/Highavailabilityreferencearchitecture

0 Karma

araitz
Splunk Employee
Splunk Employee

To add a work-around to dwaddle's answer, a viable approach on non-windows systems is to have multiple instances of Splunk write to different directories on the same logical volume.

In other words, lets say you have:

 [root@foobar ~]# df -h /data
 Filesystem                       Size  Used Avail Use% Mounted on
 /dev/mapper/VolGroup00-LogVol00  130G   73G   51G  60% /data

Install one instance of Splunk in /data/splunk1 and another in /data/splunk2. Each instance will be writing to discreet compartments of the same logical volume, thus avoiding the issues dwaddle describes above.

jamesoconnell
Path Finder

One other complication, what if one indexer goes down (/opt/splunk1)? Will the data in /data/splunk1 still be picked up in a search that subsequently goes through /opt/splunk2 indexer?

0 Karma

jamesoconnell
Path Finder

araitz, thanks again for the response. I want to make sure I understand -- forgive the potentially obvious question.

When you say install one instance of Splunk in /data/splunk1 and another in /data/splunk2 -- are you saying to have 2 indexer instances of Splunk installed in say /opt/splunk1 and /opt/splunk2 each having an index called 'sample' writing to separate db files located respectively in /data/splunk1 and /data/splunk2.

And a search head will search across the two indexers (/opt/splunk1 & /opt/splunk2) on the same index 'sample'.

Yes?

Thanks, James.

0 Karma

araitz
Splunk Employee
Splunk Employee

You are conflating the directories on an indexer's disks with the notion of a Splunk index. An index is an abstract entity that represents a data container, and may be composed of one or many components on the underlying file systems. The way to scale Splunk is to add more instances and use distributed search head to search across each instance. Thus, the search "index=main" run on a search head that distributes searches across several index servers is a search against one index ("main"), regardless of how many index servers are involved.

0 Karma

jamesoconnell
Path Finder

Thank you for the response. Writing to two different directories is essentially having two indexes though isn't it?

0 Karma

dwaddle
SplunkTrust
SplunkTrust

If I understand your question, this is not a workable topology.

Being pedantic, a Splunk index is comprised of one or more buckets -- each bucket is a shard of the total index. A bucket can only be written to by a single splunkd instance at a time. Depending on your configuration, a single indexer can have multiple buckets for an index which it writes to in parallel. Similarly, multiple indexers can each have their own buckets for an index and the data can be sprayed across all of the buckets on all of the indexers in parallel. But, even in this topology, each bucket is only being written to by a single splunkd process.

gkanapathy
Splunk Employee
Splunk Employee

Also, there is no advantage to this. If you have the IO capacity, you can set up multiple splunk instances instead and get better performance on both search and indexing. If you don't have extra IO capacity, this doesn't help you anyway.

araitz
Splunk Employee
Splunk Employee

You absolutely should not have multiple indexers write to the same index. This is not a supported configuration, and in fact is explicitly recommended against on this thread and elsewhere. Your tests might show that it is possible for a short time and in under certain conditions, but you face serious data integrity and other unknown conditions if you insist on taking this approach.

jamesoconnell
Path Finder

This answer is logical but my tests have shown otherwise. I am writing to the same index (same directory and db) from multiple indexers w/out seeing any issues. Recommendations on how to confirm this?

Regards, James.

0 Karma
Get Updates on the Splunk Community!

Splunk is Nurturing Tomorrow’s Cybersecurity Leaders Today

Meet Carol Wright. She leads the Splunk Academic Alliance program at Splunk. The Splunk Academic Alliance ...

Part 2: A Guide to Maximizing Splunk IT Service Intelligence

Welcome to the second segment of our guide. In Part 1, we covered the essentials of getting started with ITSI ...

Part 1: A Guide to Maximizing Splunk IT Service Intelligence

As modern IT environments continue to grow in complexity and speed, the ability to efficiently manage and ...