Using Splunk Enterprise 9.4.3 on Windows 2019. Our single search head is having some performance issues. Whilst searches in the UI are generally responsive and performant, try to configure anything in the UI (a search macro for example) can take up to 1 minute to display the dialog. This is common across nearly all configuration elements in the UI.
Basic analysis I have done on my end show a high rate of file acivity by Splunk processes particualy in the users/apps folders
Procmon trace for 90 seconds
Showing a large amount of file activity(Note: Windows Defender EDR/AV will scan most of this). Drilling down into the file activity
Hey shocko — your Procmon read is spot on: this is the metadata/permission merge getting hammered by on-access scanning. The asymmetry is the giveaway. Searches hit big sequential index I/O and feel fine, but opening any config dialog makes splunkd walk the whole knowledge-object tree across system, app and user contexts — hundreds of thousands of tiny .meta/.conf opens plus a Get ACL on each. That small-file-plus-ACL workload is exactly what Windows on-access AV punishes hardest, and your trace shows it: the .meta reads and the Get ACL storm across etc\users, etc\apps and etc\slave-apps.
Two things to go after, in order:
1. Get Splunk out of Defender's on-access path. This is Splunk's own recommendation for Windows and almost certainly your biggest win. Exclude the Splunk processes (splunkd.exe and the splunk-*.exe helpers) and the whole install dir ($SPLUNK_HOME, plus $SPLUNK_DB if it's separate) from Defender real-time/on-access scanning — and make sure your EDR layer honors the same exclusions, since EDR often keeps scanning even when the plain AV exclusion is set. Quick proof: in a maintenance window, temporarily turn off real-time protection, open a macro dialog, and see if it snaps open. If it does, you've found it.
2. Shrink what splunkd has to scan every time. Exclusions cut the per-file scan cost; you can also cut the number of files:
One more to rule out: make sure $SPLUNK_HOME\etc is on fast local disk, not a SAN or network volume — the many-tiny-files merge is brutal over any added latency.
Worth a read:
How many user directories are under etc\users, and roughly how many apps in etc\apps? That'll tell us whether the AV exclusion alone does it, or whether you've also got an etc\users cleanup on your hands.