I am on-boarding data from 6 different locations the data flow is
Splunk Forwarder ------> DMZ Server (Intermediate Forwarder) -----------> Indexer
Initially I was getting aggqueue, parsingqueue, indexqueue and typingqueue blocked.
I had to set all this queue sizes in server.conf to maxSize = 2048MB,.
In limits.conf
[thruput]
maxKBps = 0
This worked for 5 locations. But for 1 location CPU consumption became very high which led to system freeze.
My question is, is there a way to onboard data without using much resources.
please help
As @richgalloway hinted - it might depend on the type of inputs you're using and amount of data to be ingested. You might, for example, have a case where you have a lot of backlog to be ingested and the UF will cause high load untill it catches up to the current events and then it will ease. For some inputs you might be able to just start at current events and ignore the older ones. It's hard to say without knowing your full setup.
@kgiri253 , I was never a great fan of minimal throughput from the forwarders and tiny memory buffers on the Splunk servers. In my mind, these strict low quotas lead to issues, like yours in many cases.
Making buffers too big and removing thruput limits may not yield great results. Try flushing several gigabytes worth of buffers on forwarder's close. Try getting sudden peak of data when a site with several hundreds of endpoints comes back from a site network outage...
There are pros and cons to everything 🙂
As @richgalloway hinted - it might depend on the type of inputs you're using and amount of data to be ingested. You might, for example, have a case where you have a lot of backlog to be ingested and the UF will cause high load untill it catches up to the current events and then it will ease. For some inputs you might be able to just start at current events and ignore the older ones. It's hard to say without knowing your full setup.
Thanks for your answer, we had a big backlog of data to be ingested that's why more resources were getting consumed.
We need some more information.
Are all location using the Splunk Universal Forwarder? If not, that is what they should be using.
Is the Intermediate Forwarder a heavy forwarder or universal forwarder?
Where did the queue blocking occur?
Where did you make the changes to server.conf and limits.conf? Did you restart each Splunk instance after making the changes?
What inputs are enabled on the location with high CPU usage? Which TAs/apps are installed there?