My current system is (vastly underpowered, 3.5gig a day tops) a single indexer/search head combo, and 2 heavy forwarders.
I have recently been given a requirement to bump this up to ~120GB a day indexed.
I am looking at this document to determine hardware requirements: http://docs.splunk.com/Documentation/Splunk/6.5.1/Capacity/Referencehardware but nowhere in here does it comment on a heavy forwarder.
My reading tells me that the HF does parsing before it ever sends data to the indexer. So, does that mean if I have a small lightweight VM acting as a heavy forwarder sending 100GB a day to the indexer with 12 cores+64gig ram, my indexer performance is mostly pointless, because my heavy forwarder is my bottleneck?
Should I plan my heavy forwarder to be the same spec as the indexer, or make my indexer underpowered and beef up the HF? (No logs go directly to the indexer.)
Or, do I keep my underpowered heavy forwarder VM and just convert it to use the universal forwarder? I would then make sure that all transforms/props/etc get placed on the indexers, not the forwarder.
The only thing on the forwarder I do that isn't just passthrough is adding a metadata tag "forwarder=locationX", which I guess I would have to find a substitute for. It is useful for me to track where a log originated, though.
It will definitely improve performance of your indexers slightly, but only for things such as metadata transforms and other index time transformations. The majority of indexer load is related to searching and writing data to disk correctly so you'll still want your indexers at a high spec. I started to use HF to parse our networking data (~ 80 GB a day) instead of doing it at the indexer level like I was before and it reduced the load on the indexers (in terms of median CPU utilization and median Memory utilization) by about ~10-15%. These logs all had to be "transformed" from the syslog sourcetype to their respective real sourcetypes (cisco logs, palo alto logs ,etc) and in some cases had other transforms applied after that.
In my case we run a 3xIndexer Cluster, and our primary bottlenecks is primarily disk IO, CPU utilization, and network saturation generally in that order. Please take it as a rough estimate since you're environment is running on a standalone box, you may see more gains since your load may be more lopsided to the parsing however it could go the other way since you've got both the search head load and indexer load on the same box.
It will definitely improve performance of your indexers slightly, but only for things such as metadata transforms and other index time transformations. The majority of indexer load is related to searching and writing data to disk correctly so you'll still want your indexers at a high spec. I started to use HF to parse our networking data (~ 80 GB a day) instead of doing it at the indexer level like I was before and it reduced the load on the indexers (in terms of median CPU utilization and median Memory utilization) by about ~10-15%. These logs all had to be "transformed" from the syslog sourcetype to their respective real sourcetypes (cisco logs, palo alto logs ,etc) and in some cases had other transforms applied after that.
In my case we run a 3xIndexer Cluster, and our primary bottlenecks is primarily disk IO, CPU utilization, and network saturation generally in that order. Please take it as a rough estimate since you're environment is running on a standalone box, you may see more gains since your load may be more lopsided to the parsing however it could go the other way since you've got both the search head load and indexer load on the same box.
But how was your HF specced out? Was it comparable to your indexer?
Or would you have felt comfortable setting the up the HF with ~15% of the resources compared to the indexer.
Oh sorry I didn't answer that part of the question, our HF is only at 4 vCPU and 8GB Ram, and even then it doesn't ever sustain high load. Since it's only doing the parsing and it's not dealing with a ton of data the lower specs are fine. You should be fine with a pretty low spec box, but it doesn't hurt to watch the utilization and adjust as needed, great use case for VMs if that's your plan.
This is compared with our index cluster which is three boxes of 16 vCPU and 16 GB RAM.
In my opinion, I would bypass the heavy forwarder and substitute it out for another indexer. More indexers give you the ability to scale and will massively improve the search performance. You could use a VM for your secondary indexer, but it would be better to have a physical server.
You should be able to comfortably index 120GB per day using 2 indexers
So, if I WERE going to use HF, the specs on the box would have to be the same as the indexer, but if I just use a UF, then I really don't have any hardware requirements to meet.
Yes, the HF is basically a full instance of Splunk whereas a universal forwarder is not..
From my understanding, the primary purpose of a HF is to parse the data before sending it to the indexer. The advantage of this would be to get rid of "junk" data before it hits the indexer so save some license room (You can send it to nullqueue)
I need to keep some sort of forwarder to separate my indexer from the external systems. The forwarder suite this purpose nicely.
My final build will actually be 2 indexers and 2 search heads (1 at each of 2 locations) with a forwarder at each site acting as the external interface.
You don't really need a HF, especially when indexing such low volumes of data. A universal forwarder would most likely do the trick. The UF is low profile and consumes very little CPU compared to the HF
2 indexers would work good for indexing 120GB/day and will also give some head room if you wanted to increase that in the near future. If you wanted to up your data ingestion by a lot at a later date then it would be super easy to scale by adding another indexer to the cluster. Increase the amount of indexers, not HF's