Monitoring Splunk

Best Practices to Measure Performance improvement after Splunk Migration.

Esky73
Builder

Hi,

We are moving a 3 tier clustered splunk env from an on prem environment to a cloud instance - where we have been told we will be getting much better performance all round.
My question is how do we measure this ? what KPI's should we be measuring before and after the migration and what would be the best way ?
My initial thoughts are disk IO, search response, mem. cpu usage etc.

Any recommendations gratefully received.

0 Karma
1 Solution

woodcock
Esteemed Legend

Run 4 searches on each system and use the Job Inspector from the Job -> Inspect job to examine how long each step took and the overall response time. Run these:

1: A long search like for Last 2 years, that uses something complicated like |timechart span=1mon avg(_time) AS junk,
2: A short search like for Last 24 hours, that uses something complicated like |timechart span=1h avg(_time) AS junk.
3: A long search, like for Last 2 years, that uses something easy and reduceable like dedup host.
4: A short search, like for Last 24 hours, that uses something easy and reduceable like dedup host.

Also, you chould use DMC to see what your "worst" search is and run that both places. You obviously have some idea of what "isn't working" so just run that both places and compare the Job Inspector.

View solution in original post

0 Karma

woodcock
Esteemed Legend

Run 4 searches on each system and use the Job Inspector from the Job -> Inspect job to examine how long each step took and the overall response time. Run these:

1: A long search like for Last 2 years, that uses something complicated like |timechart span=1mon avg(_time) AS junk,
2: A short search like for Last 24 hours, that uses something complicated like |timechart span=1h avg(_time) AS junk.
3: A long search, like for Last 2 years, that uses something easy and reduceable like dedup host.
4: A short search, like for Last 24 hours, that uses something easy and reduceable like dedup host.

Also, you chould use DMC to see what your "worst" search is and run that both places. You obviously have some idea of what "isn't working" so just run that both places and compare the Job Inspector.

0 Karma

Esky73
Builder

Hi I cannot find "worst search" within DMC - any pointers ?

DMC only appears on the indexers and the Long-running searches have No results found.

there is no DMC on the SH cluster

thanks.

0 Karma

hmclaren_splunk
Splunk Employee
Splunk Employee

I agree, use some of the dashboards / searches built into the DMC (Distributed Management Console) to give you some info on Searches, Index Pipelines, Etc.

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

I dont see disk i/o / mem / cpu usage as good KPIs. Mainly because in a cloud environment, these should be watched by the SaaS provider. However, on premise, yes these are good metrics, but again its hard to compare these to SaaS.. { different types of storage and compute tiers.. }

You're better metrics would be to watch:

1) Search performance, get a baseline of your onprem searches vs what they run in your cloud
2) Index vs ingest times (latency)
3) Queues... Backed up indexing queues would represent potential I/o bottlenecks, typing queues, parsing etc for related Splunk bottlenecks
4) Skipped / Deferred searches

Those are a few major indicators to look out for and compare between instances.. Hope that helps.

0 Karma
Get Updates on the Splunk Community!

Video | Welcome Back to Smartness, Pedro

Remember Splunk Community member, Pedro Borges? If you tuned into Episode 2 of our Smartness interview series, ...

Detector Best Practices: Static Thresholds

Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within ...

Expert Tips from Splunk Education, Observability in Action, Plus More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...