Hello. I'm a new Splunk user, and I'm quite uncertain about how to index some distributed data. I have one SH and multiple Indexers located around the globe. Each of these Indexers has a local log file that I want to be able search through with SH. Today I have configured a file input on each Indexer and I Forward this data to one Indexer that has the actual index.
I did it this way because it was intuitive and easy for me to understand - and it works. But I started wondering if this is the "right" or "best" (or only?) way to do this. Instead of Forwarding to a single Indexer, is it possible (preferable?) to create local indexes on each Indexer, index it locally, and configure the SH to spread the query across all the Indexers? If it's possible, what are the factors that go into this decision? I assume the local indexes just need to be the same name?
Thanks.
The second method that you describe is the proper way to handle distributed search since it won't eat up the hard disk space of one indexer. The second method also will help with performance since the indexers won't have to perform extra work indexing and then forwarding the data on to another indexer. It is quite simple to enable as well.
You can do this from the UI (easiest method) or using .conf files. Just create the index on each of your indexers. Once this has been done, go to the SH's UI. Click the Settings dropdown and find the Section with the title "DISTRIBUTED ENVIRONMENT". Under here, click the link for "Distributed Search". On this new page, go to Search Peers and add new ones (this is where you point your SH at each indexer). Just fill in the fields and click save. Do this for each indexer that you have. Once this is done, you should be able to write a search and that search will be distributed across all the indexers that you have specified in the distributed search page.
If you care to read more about this, here are the docs:
distsearch.conf: http://docs.splunk.com/Documentation/Splunk/6.3.3/admin/Distsearchconf
About Distributed Search: http://docs.splunk.com/Documentation/Splunk/6.3.3/DistSearch/Whatisdistributedsearch
The second method that you describe is the proper way to handle distributed search since it won't eat up the hard disk space of one indexer. The second method also will help with performance since the indexers won't have to perform extra work indexing and then forwarding the data on to another indexer. It is quite simple to enable as well.
You can do this from the UI (easiest method) or using .conf files. Just create the index on each of your indexers. Once this has been done, go to the SH's UI. Click the Settings dropdown and find the Section with the title "DISTRIBUTED ENVIRONMENT". Under here, click the link for "Distributed Search". On this new page, go to Search Peers and add new ones (this is where you point your SH at each indexer). Just fill in the fields and click save. Do this for each indexer that you have. Once this is done, you should be able to write a search and that search will be distributed across all the indexers that you have specified in the distributed search page.
If you care to read more about this, here are the docs:
distsearch.conf: http://docs.splunk.com/Documentation/Splunk/6.3.3/admin/Distsearchconf
About Distributed Search: http://docs.splunk.com/Documentation/Splunk/6.3.3/DistSearch/Whatisdistributedsearch
Thank you very much, it is nice to hear a definitive response. I have read the manual, but I don't think I was at the point in my deployment where it made much sense to me. Now that I'm a little farther down the road, the doc sinks in a little deeper each time I read it.
Thanks again.
This was so easy. I already had distributed search set up, so all I had to do was create the indexes.conf, remove the outputs.conf, and restart.
This makes so much more sense given our global network architecture. Thanks so much!!!
You're welcome! 🙂