I have distributed environment with 2 indexers (each has 48 vCPU, 64gb RAM), which are ingesting 200 gb logs/day (each indexer).
I want to send to them another 200 gb syslog logs per day (for each indexer), but I want to filter the logs before indexing. I would index only 10% of 200gb of that additional syslog logs at each indexer, so 90% would be rejected.
Could you please tell me what are hardware requirements for such setup? I couldn't find any hints.
If you want to filter syslog data, I'd advise additional syslog processing layer. You most probably need additional network-level data (like source IP) which splunk cannot easily provide so you'll need to use some rsyslog or sc4s anyway. And if you're gonna be using that, filtering in the syslog layer is much more straightforward.
Usually such processing does not need much memory (unless you want to do some heavy buffering), and the hardware needed will be highly dependant on how complicated your filtering rules are. I have a 32CPU machine which does some very simple "receive, enrich and forward" syslog operations and the load is usually around 4-5 while running rsyslog for around 700-800GB/day.
On the other hand, the same amount of data on a next step of processing where there is a relatively complicated set of rules involved uses around 15-16 vcpus on the next layer. (I have three 8-cpu machines just to have some space).
at first the number of Indexers is calculated with a max load for each Indexers of around 200 GB/day for Splunk Enterprise and around 150 GB/day if you have Splunk Enterprise Security, so if you want to add more logs it's better to add at least one additional Indexer.
Then the hardware reference depends on the logs volume (as I already described) and on the users and number of searches, if you don't have many users, probably the hardware you are using is over dimensioned for the usual needs, here you can find some indication for hardware reference https://docs.splunk.com/Documentation/Splunk/8.2.5/Capacity/Referencehardware
Then, do you want to filter logs on Indexers or on Heavy Forwarders (in both cases before indexing!)?
If on Indexers I think that if you don't have so too many users, you could use mid range indexers and run without problems.
If instead you want to use Heavy Forwarders (I usually hint this solution to separate roles), at first you have to use at least two of them to avoid Single Point of Failure, then for HFs you can use the hardware reference of stand alone Splunk server.
Then, I suppose that you have an Indexer Cluster, but this hasn't a great impact on hardware reference.
Hi @WonnyJack ,
maybe you should describe to the customer the advantages of the new logestion.
Then that, adding a new indexer isn't a great cost for the advantages the you could have, and the improved security giving from the analysis of the new data.
Anyway, see next time and tell me if i can help you more or, please accept one answer for the other people of Community..
Ciao and happy splunking.
P.S.: Karma Points are appreciated by all the Contributors