Splunk Enterprise

Search Head Performance - Metadata scanning

shocko
Contributor

Using Splunk Enterprise 9.4.3 on Windows 2019. Our single search head is having some performance issues. Whilst searches in the UI are generally responsive and performant, try to configure anything in the UI (a search macro for example) can take up to 1 minute to display the dialog. This is common across nearly all configuration elements in the UI. 

Basic analysis I have done on my end show a high rate of file acivity by Splunk processes particualy in the users/apps folders 

Procmon trace for 90 seconds

 

shocko_0-1779609265278.png

Showing a large amount of file activity(Note: Windows Defender EDR/AV will scan most of this). Drilling down into the file activity

 

shocko_1-1779609265280.png

 

shocko_2-1779609265280.png

 

shocko_3-1779609265281.png

 

 

 

Labels (1)
Tags (1)
0 Karma

natecrisler
Path Finder

Hey shocko — your Procmon read is spot on: this is the metadata/permission merge getting hammered by on-access scanning. The asymmetry is the giveaway. Searches hit big sequential index I/O and feel fine, but opening any config dialog makes splunkd walk the whole knowledge-object tree across system, app and user contexts — hundreds of thousands of tiny .meta/.conf opens plus a Get ACL on each. That small-file-plus-ACL workload is exactly what Windows on-access AV punishes hardest, and your trace shows it: the .meta reads and the Get ACL storm across etc\users, etc\apps and etc\slave-apps.

Two things to go after, in order:

1. Get Splunk out of Defender's on-access path. This is Splunk's own recommendation for Windows and almost certainly your biggest win. Exclude the Splunk processes (splunkd.exe and the splunk-*.exe helpers) and the whole install dir ($SPLUNK_HOME, plus $SPLUNK_DB if it's separate) from Defender real-time/on-access scanning — and make sure your EDR layer honors the same exclusions, since EDR often keeps scanning even when the plain AV exclusion is set. Quick proof: in a maintenance window, temporarily turn off real-time protection, open a macro dialog, and see if it snaps open. If it does, you've found it.

2. Shrink what splunkd has to scan every time. Exclusions cut the per-file scan cost; you can also cut the number of files:

  • etc\users is usually the culprit. Every user directory carries its own local.meta and any private knowledge objects, and the permission merge walks all of them on each config load. Count them (dir etc\users) — if you've got hundreds of stale/orphaned user dirs from people who've left or from SSO churn, that's a lot of dead weight. Archive the orphaned ones (back up first), and reassign or clean up private KOs that don't need to be per-user.
  • Trim unused apps. More apps in etc\apps means more .conf/.meta to merge on every load — remove anything you're not actually using.
  • slave-apps means this box is also an indexer cluster peer (or has leftover peer apps from one). Those .meta are in the scan too, so it's worth being clear on the box's full role — a combined search head plus peer is carrying both footprints.

One more to rule out: make sure $SPLUNK_HOME\etc is on fast local disk, not a SAN or network volume — the many-tiny-files merge is brutal over any added latency.

Worth a read:

How many user directories are under etc\users, and roughly how many apps in etc\apps? That'll tell us whether the AV exclusion alone does it, or whether you've also got an etc\users cleanup on your hands.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Quantify Your Splunk Investment Impact: Introducing Savings Metrics to Value Insights

Building on the foundation established in our initial Value Insights releases, we are introducing the Savings ...

Event Series: Telemetry Pipeline Management

Balancing Scale and Spend: Gaining Control Over High-Volume Metrics in Splunk Observability Cloud As ...

Kick the Tires Before You Commit: A Hands-On Tour of the Splunk Observability Cloud ...

Evaluating an enterprise observability platform usually goes like this: fill out a form, get a free trial with ...