Monitoring Splunk
Highlighted

Best Practices to Measure Performance improvement after Splunk Migration.

Builder

Hi,

We are moving a 3 tier clustered splunk env from an on prem environment to a cloud instance - where we have been told we will be getting much better performance all round.
My question is how do we measure this ? what KPI's should we be measuring before and after the migration and what would be the best way ?
My initial thoughts are disk IO, search response, mem. cpu usage etc.

Any recommendations gratefully received.

0 Karma
Highlighted

Re: Best Practices to Measure Performance improvement after Splunk Migration.

Splunk Employee
Splunk Employee

I dont see disk i/o / mem / cpu usage as good KPIs. Mainly because in a cloud environment, these should be watched by the SaaS provider. However, on premise, yes these are good metrics, but again its hard to compare these to SaaS.. { different types of storage and compute tiers.. }

You're better metrics would be to watch:

1) Search performance, get a baseline of your onprem searches vs what they run in your cloud
2) Index vs ingest times (latency)
3) Queues... Backed up indexing queues would represent potential I/o bottlenecks, typing queues, parsing etc for related Splunk bottlenecks
4) Skipped / Deferred searches

Those are a few major indicators to look out for and compare between instances.. Hope that helps.

0 Karma
Highlighted

Re: Best Practices to Measure Performance improvement after Splunk Migration.

Splunk Employee
Splunk Employee

I agree, use some of the dashboards / searches built into the DMC (Distributed Management Console) to give you some info on Searches, Index Pipelines, Etc.

0 Karma
Highlighted

Re: Best Practices to Measure Performance improvement after Splunk Migration.

Esteemed Legend

Run 4 searches on each system and use the Job Inspector from the Job -> Inspect job to examine how long each step took and the overall response time. Run these:

1: A long search like for Last 2 years, that uses something complicated like |timechart span=1mon avg(_time) AS junk,
2: A short search like for Last 24 hours, that uses something complicated like |timechart span=1h avg(_time) AS junk.
3: A long search, like for Last 2 years, that uses something easy and reduceable like dedup host.
4: A short search, like for Last 24 hours, that uses something easy and reduceable like dedup host.

Also, you chould use DMC to see what your "worst" search is and run that both places. You obviously have some idea of what "isn't working" so just run that both places and compare the Job Inspector.

View solution in original post

0 Karma
Highlighted

Re: Best Practices to Measure Performance improvement after Splunk Migration.

Builder

Hi I cannot find "worst search" within DMC - any pointers ?

DMC only appears on the indexers and the Long-running searches have No results found.

there is no DMC on the SH cluster

thanks.

0 Karma