Getting Data In

Data Onboarding CPU consumption 150% system stuck.

kgiri253
Explorer

I am on-boarding data from 6 different locations the data flow is 

Splunk Forwarder  ------> DMZ Server (Intermediate Forwarder) -----------> Indexer

Initially I was getting aggqueue, parsingqueue, indexqueue and typingqueue blocked.

I had to set all this queue sizes in server.conf to maxSize = 2048MB,.

In limits.conf 

[thruput]

maxKBps = 0

 

This worked for 5 locations. But for 1 location CPU consumption became very high which led to system freeze.

 

My question is, is there a way to onboard data without using much resources.

please help

Labels (3)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

As @richgalloway hinted - it might depend on the type of inputs you're using and amount of data to be ingested. You might, for example, have a case where you have a lot of backlog to be ingested and the UF will cause high load untill it catches up to the current events and then it will ease. For some inputs you might be able to just start at current events and ignore the older ones. It's hard to say without knowing your full setup.

View solution in original post

ddrillic
Ultra Champion

@kgiri253 , I was never a great fan of minimal throughput from the forwarders and tiny memory buffers on the Splunk servers. In my mind, these strict low quotas lead to issues, like yours in many cases.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Making buffers too big and removing thruput limits may not yield great results. Try flushing several gigabytes worth of buffers on forwarder's close. Try getting sudden peak of data when a site with several hundreds of endpoints comes back from a site network outage...

There are pros and cons to everything 🙂

0 Karma

PickleRick
SplunkTrust
SplunkTrust

As @richgalloway hinted - it might depend on the type of inputs you're using and amount of data to be ingested. You might, for example, have a case where you have a lot of backlog to be ingested and the UF will cause high load untill it catches up to the current events and then it will ease. For some inputs you might be able to just start at current events and ignore the older ones. It's hard to say without knowing your full setup.

anish
Engager

Thanks for your answer, we had a big backlog of data to be ingested that's why more resources were getting consumed. 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

We need some more information.

Are all location using the Splunk Universal Forwarder?  If not, that is what they should be using.

Is the Intermediate Forwarder a heavy forwarder or universal forwarder?

Where did the queue blocking occur?

Where did you make the changes to server.conf and limits.conf?  Did you restart each Splunk instance after making the changes?

What inputs are enabled on the location with high CPU usage?  Which TAs/apps are installed there?

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

  🚀 Your data just got a serious AI upgrade — are you ready? Say hello to the Agentic Era with the ...

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...