I've just installed splunk on about 12 of our staging servers and have it indexing some of our system as well as our application logs.
I need help with the following three items:
1- I want to set up a cluster so I only have to go on one instance of splunk and it will reach out to the rest for results.
2- I want to run the splunk indexing agent locally WITHOUT the splunk-web part. In other words no GUI on each instance (except the master in the cluster).
3- When I start splunk for the first time (with our inputs.conf) it creates a MASSIVE CPU SPIKE, eating up 100% of one of the cores. I can't deploy this into production if it's going to cause an event spike like this.
Can you please assist me with these three items?
It sounds like you are planning on building out a fairly complex Splunk environment. Here is the documentation describing several different scenarios of clusters. Hopefully it helps.
I would guess that the CPU spike is due to the Splunk process going through all of your inputs and processing the backed up items.
1) What you are looking to do is to set up multiple indexers and one or more search heads. As justinhart mentioned, there is some info in the docs about capacity planning and deployment scenarios. I would start there and get a basic understanding of splunk infrastructure, then take a look at the rest of the docs, especially the Admin chapters. Another very valuable source of information are the splunk sales engineers -- have your sales person set up a meeting with an engineer to discuss your environment and requirements.
2) What you are looking for are lightweight forwarders.
3) Upon initial start up Splunk does a lot of work such as checking all indexes (essentially databases), verify all config files, reach out to all defined inputs, etc. Some of these actions are CPU intensive which will cause you to see the % utilization spike. I am not aware of any splunk internal methods of throttling this. You may want to look into using OS tools if this really is of that great concern to you.
Keep in mind that searches can (and likely will) spike your CPU % utilization as well.
While you can create a cluster of splunk search nodes, it would be highly atypical to install those search nodes on your appserver nodes or other system task clusters. To be valuable, splunk searches must be low latency, and thus must use available resources.
Instead, as ftk suggests, use lightweight forwarders to collect data and ship it to some machine or set of machines (scale as necessary) where you can perform searches.