Splunk IT Service Intelligence

How do base searches and service templates affect capacity planning in Splunk IT Service Intelligence?

sail4lot
Path Finder

Hi.

The ITSI capacity planning manual talks about planning for capacity based on the number of entities per KPI.

It is not clear to me how the abstractions available help or play into that.

Does anyone know if the number of entities per KPI guidelines apply as written when employing base searches for KPIs and service templates for the services?

I thought that the base search ran once for all services using it, not once per service. Can anyone speak to this?

0 Karma
1 Solution

yannK
Splunk Employee
Splunk Employee

The ultimate limits are :

  • the number of stats commands and split by kpi and service and entity when running a shared base search.

example :
this shared base search
... | stats count(kpi1) max(kpi2) last(kpi3) by entity
has 3 metrics results per entity

if the SBS is used by 3 KPIs per service
and you have 10 services using those 3 kpis

then if you have a split by entity, and about 200 entities per services ....
then you are looking ar 3 x 3 x 10 x 200 ~ 18000 combinations of cardinality each time the SBS runs.

And if this is too much, because the search is taking too long, or because it hits a stats command limit...
then you may have to create another identical SBS, and spread the services to use one of the other, and stay under control.

  • the search is long to run
    if you kpi runs every minute, but it takes more than 60 seconds to completes, you are in trouble.
    You may also have to optimize your search or reduce the number for the split by.

  • the REST call to create the entity filter in the SPL query is very long
    I saw some long list of "host=a OR host=b OR host=c ...." that made the search so massive that it was very slow to parse the search.

View solution in original post

0 Karma

yannK
Splunk Employee
Splunk Employee

The ultimate limits are :

  • the number of stats commands and split by kpi and service and entity when running a shared base search.

example :
this shared base search
... | stats count(kpi1) max(kpi2) last(kpi3) by entity
has 3 metrics results per entity

if the SBS is used by 3 KPIs per service
and you have 10 services using those 3 kpis

then if you have a split by entity, and about 200 entities per services ....
then you are looking ar 3 x 3 x 10 x 200 ~ 18000 combinations of cardinality each time the SBS runs.

And if this is too much, because the search is taking too long, or because it hits a stats command limit...
then you may have to create another identical SBS, and spread the services to use one of the other, and stay under control.

  • the search is long to run
    if you kpi runs every minute, but it takes more than 60 seconds to completes, you are in trouble.
    You may also have to optimize your search or reduce the number for the split by.

  • the REST call to create the entity filter in the SPL query is very long
    I saw some long list of "host=a OR host=b OR host=c ...." that made the search so massive that it was very slow to parse the search.

0 Karma

sail4lot
Path Finder

Thanks @yannK. This makes good sense. In our case, I think the actual issue has more to do with your last bullet point. Specifically, the resulting base search is quite long with all of the host=a OR host=b OR host=c. If the parsing of that (which I'll check) is causing the bottle neck, that's probably part of the answer.

As far as cardinality goes though, how would that affect performance or capacity planning? Our math on a per service basis looks something like 3 metrics x 3 kpis x 100 services x 40 entities = 36,000 for a given base search. Besides parsing, where else could a bottleneck exist for a base search like this? Note, the actual search (without all the extra stuff at the end) runs well under 30 seconds and only runs on a 15 minute frequency.

0 Karma

yannK
Splunk Employee
Splunk Employee

usually, the problem with long list of entities filters, are long search terms, or scripts timing out when waiting for the REST call to return the list.
But if you reach that point, you will be working with a support engineer to tune splunk.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...