Re: CPU load on Search Peers increased significant...

christeraustad · ‎06-06-2018

Hi,

We upgraded our Splunk cluster last Friday. We have 1 Master node, 3 Search Heads in a search head cluster, 2 Search Peers in an indexer cluster and 2 HFs in front of the Search Peers. All have been upgraded to 7.1.1. The SH and HF were previously running 6.5.0 and the Search Peers were running 6.3.2.

After the upgrade, the CPU usage has increased significantly. During office hours the CPU usage peaks at 100% on the Search Peers. This was not the case before the upgrade.

In the "READ THIS FIRST" I can't see anything that should relate to our deployment in terms of increasing the CPU load.

I know that we have some searches that should be optimized to limit the usage of subsearches. But it was working ok before the upgrade, so I can't see why upgrading should have such a huge impact. From what I can see in the DMC it looks like all searches use more CPU than before. But no rotten apples that I can identify.

Can anyone help me out on what I should look for when troubleshooting?

scannon4 · ‎07-05-2018

There are bugs in 7.1.1. The best thing to do on this issue is to open a ticket with support. We have some issues that are not going to be fixed until 7.1.2. Moral of the story is to research more carefully what the issues are with a version before upgrading to it. We learned that the hard way. Good luck!

jkat54 · ‎07-05-2018

Early adopters always pay the highest price.

scannon4 · ‎07-05-2018

Very true. We would have never gone to 7.1.1 but Enterprise Security 5.1 required it and we needed to upgrade it. ES 5.1 is working GREAT so there is at least that good news. 🙂

john_dagostino · ‎06-27-2018

We ran into the same issue when we upgraded last week. The fix was to set the acceleration.backfill_time as specified in the highlighted issues in the release notes.

http://docs.splunk.com/Documentation/Splunk/7.1.1/ReleaseNotes/Knownissues#Highlighted_issues

jlucius · ‎07-06-2018

To which value does it needs to be set?

john_dagostino · ‎07-09-2018

It's arbitrary, but with no value it defaults to "all time". We set ours to 7 days default, but for some DM's we set it higher (such as 30 days).

This is only the initial acceleration, so if you set it for 7 days but have the data model accelerated for 3 months, it will initially accelerate the first 7 days worth of data but after 3 months all data will be accelerated.

jonaslar · ‎06-08-2018

We also experience and still are...increased CPU usage on the indexers after upgrading from 7.0.3 to 7.1.1.

christeraustad · ‎06-09-2018

Hi jonaslar,

Have you had some success in troubleshooting this? We still have big problems with the CPU usage hitting the roof, and causing some searches to take up to 20-30 minutes. Searches that took maybe a minute to complete before upgrading.

jkat54 · ‎06-06-2018

What’s in index=_internal log_level=error OR log_level =warn*

?

Maybe you have an error or warning that started after the upgrade?

christeraustad · ‎06-06-2018

Well. There is a lot of warnings from the DispatchManager:

WARN DispatchManager - enforceQuotas: username=USERNAME, search_id=SID - QUEUED (reason='The maximum number of concurrent historical searches for this user based on their role quota has been reached. concurrency_limit=3')

But my guess is that's because of the longer runtime of the searches. So when loading a dashboard with multiple searches that limit will be reached pretty quickly.

jkat54 · ‎06-06-2018

In the Monitoring Console there are dashboards that show search performance. Top long running searches etc. have you checked that to see if anything stands out?

christeraustad · ‎06-06-2018

It looks like most of the searches has a longer run time. None that truly stands out. Some of the searches that take up to 10 minutes are simple searches that should return within seconds.

jkat54 · ‎06-06-2018

Is disk io up?

Try iostat 5 and see what iowait% is.

Curious where the bottle neck is, that will help us figure out what the culprit is.

christeraustad · ‎06-06-2018

Here's a snapshot. From what I've seen, iowait has been fine since the upgrade. Just for giggles, I applied the configuration bundle once more. And now the cluster looks more like it should be. Will keep an eye on the load and see if it stabilizes.

Snapshot:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          91.62    0.00    5.31    0.12    0.00    2.95

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sdf               9.80         0.00        95.20          0        476
sdg              60.20         0.00       534.40          0       2672
sda               0.00         0.00         0.00          0          0
sdb               0.00         0.00         0.00          0          0
sdd               0.00         0.00         0.00          0          0
sde               0.00         0.00         0.00          0          0
sdc               0.00         0.00         0.00          0          0
dm-0              0.00         0.00         0.00          0          0
dm-1             92.00         0.00       534.40          0       2672

EDIT: Nope. After a while, the CPU usage is back up at close to 100%.

CPU load on Search Peers increased significantly after upgrade to 7.1.1

What’s New & Next in Splunk SOAR

Your Voice Matters! Help Us Shape the New Splunk Lantern Experience

September Community Champions: A Shoutout to Our Contributors!

Are you a member of the Splunk Community?

CPU load on Search Peers increased significantly after upgrade to 7.1.1

What’s New & Next in Splunk SOAR

Your Voice Matters! Help Us Shape the New Splunk Lantern Experience

September Community Champions: A Shoutout to Our Contributors!