We are a big customer. We hit a big issue in upgrading from 9.1.7 to 9.2.4 and it took a long time for the issue to be resolved.
We have a large stack with many indexers. Our current operating system is RedHat 7; we are in the process of migrating to RedHat 8.
On upgrade from 9.1.7 to 9.2.4, one of the indexer clusters that ingests the most amount of data, suddenly had aggregation and parsing queues filled at 100% during our peak logging hours. The indexers were not using much more cpu or memory it’s just that the queues were very full.
It turns out that Splunk has enabled profiling starting in 9.2: specifically cpu time profiling. These settings are controlled in limits.conf: https://docs.splunk.com/Documentation/Splunk/9.2.4/Admin/Limitsconf. There are 6 new profiling metrics and these are all enabled by default.
In addition, the agg_cpu_profiling runs a lot of time of day routines. A lot.
There are several choices for clocksource in RedHat https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_real_time/7/html/reference_gui...
It turns out that we had set our clock source to use the clocksource “hpet” some number of years ago. This clocksource, while high precision, is much slower than using “tsc”. Once we switched to using tsc, the problem with our aggregation and parsing queues at 100% during peak hours was fixed.
Even if you don't have the clock source issue, the change in profiling is something to be aware of in the upgrade to 9.2