Getting Data In

How many events per second a heavy forwarder can ingest with the below high/medium/low specifications?

thirumaleshsplu
Explorer

We wanted to ingest 20000 eps minimum now 1 year later we wanted to go with the 50000 eps to give me some documentation for the heavy forwarder spec with the eps. These logs will go from on perm heavy forwarder to Splunk cloud.

High-performance 
Intel 64-bit chip architecture
48 CPU cores at 2 GHz or greater speed per core
128GB RAM
Disk subsystem capable of a minimum of 1200 average IOPS
A 1Gb Ethernet NIC with optional second NIC
A 64-bit Linux or Windows distribution

Mid-range
Intel 64-bit chip architecture
24 CPU cores at 2GHz or greater speed per core
64GB RAM
Disk subsystem capable of a minimum of 800 average IOPS
A 1Gb Ethernet NIC, with optional second NIC for a management network
A 64-bit Linux or Windows distribution

LOW
Heavy Forwarder Intel 64-bit chip architecture
4 CPUs, 2 cores per CPU, at least 2 Ghz per core
12 GB RAM
2 x 300 GB, 10,000 RPM SAS hard disks, configured in RAID 1
Standard 1Gb Ethernet NIC, optional 2nd NIC for a management network
Standard 64-bit Linux or Windows distribution

Thanks in advance.

0 Karma
1 Solution

FrankVl
Ultra Champion

If you're currently ingesting 20000 EPS, can't you just assess the performance of the current infrastructure and use that determine how much you need to scale out?

In general the throughput a heavy forwarder can sustain will highly depend on the kind of data collection mechanisms and the complexity of the props/transforms it needs to apply. I've seen heavy forwarders (Virtual, 4 cores, 16GB ram) process in excess of 200GB/day (don't remember the EPS rates on that, but you can make your own calculations on that).

Disk IO is typically not the key factor (unless you have data collection methods that somehow cause the HF to read and write from it's own disk a lot).

CPU core requirements also depend on how many pipelines you enable. With the default single pipeline or an optional second pipeline, there is not much of a point in using a machine with much more than 4 cores. If you want to go beyond 2 pipelines, more cores are useful.

In general it might be much more beneficial to scale horizontally and deploy a large pool of HFs, and somehow spread the data feeds over those. That will also improve the data distribution across your Indexers, which is very important for spreading out indexing and search load. If you set that up properly with some load balancing mechanism between data sources and HFs, it also makes it much easier to scale up and down in the future based on changes in event volumes.

View solution in original post

FrankVl
Ultra Champion

If you're currently ingesting 20000 EPS, can't you just assess the performance of the current infrastructure and use that determine how much you need to scale out?

In general the throughput a heavy forwarder can sustain will highly depend on the kind of data collection mechanisms and the complexity of the props/transforms it needs to apply. I've seen heavy forwarders (Virtual, 4 cores, 16GB ram) process in excess of 200GB/day (don't remember the EPS rates on that, but you can make your own calculations on that).

Disk IO is typically not the key factor (unless you have data collection methods that somehow cause the HF to read and write from it's own disk a lot).

CPU core requirements also depend on how many pipelines you enable. With the default single pipeline or an optional second pipeline, there is not much of a point in using a machine with much more than 4 cores. If you want to go beyond 2 pipelines, more cores are useful.

In general it might be much more beneficial to scale horizontally and deploy a large pool of HFs, and somehow spread the data feeds over those. That will also improve the data distribution across your Indexers, which is very important for spreading out indexing and search load. If you set that up properly with some load balancing mechanism between data sources and HFs, it also makes it much easier to scale up and down in the future based on changes in event volumes.

thirumaleshsplu
Explorer

Thank you so much for answering this and I mean we are about to do this, we haven't done this yet.

when you are talking about the (Virtual, 4 cores, 16GB ram) process in excess of 200GB/day....do you have any link or documentation that does the EPS ? (I haven't done this before)
or
could you please help me with this specification (this specification is indexers and i think it is good with the heavy forwarder also and also i am not able to get the documentation links on splunk documentations, if you have anything please share hear and also for EPS calculation)
or
i am thinking to go with this with the Mid-range:
can you directly answer this, how many events per second can i ingest with the below specifications:

Intel 64-bit chip architecture
24 CPU cores at 2GHz or greater speed per core
64GB RAM
Disk subsystem capable of a minimum of 800 average IOPS
A 1Gb Ethernet NIC, with optional second NIC for a management network
A 64-bit Linux or Windows distribution

0 Karma

FrankVl
Ultra Champion

No, that question cannot be answered without information on what types of data you want to collect and process. I don't recall ever seeing much detailed information on processing capacity of HFs, but if you can find specs for an indexer, you can assume a HF can do more, since HF does the same except it doesn't have to write to disk and does not have to deal with searches. How many EPS are equivalent to 200GB/day depends on event size of course, but with for instance an average event size of 1KB, it results in ~2400 EPS or so.

But again: a single/few very beefy box(es) like you're asking for probably doesn't make much sense for a Heavy Forwarder layer in between your data sources and Splunk Cloud. From a data distribution point of view, you typically want at least as many forwarders as you have indexers, preferably more (can also be accomplished by enabling additional pipelines on forwarders).

You may want to check with your Splunk Sales / PS rep for the latest guidelines on that and an assessment on what would best fit your environment.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...