Splunk ITSI

How do base searches and service templates affect capacity planning in Splunk IT Service Intelligence?

sail4lot
Path Finder

Hi.

The ITSI capacity planning manual talks about planning for capacity based on the number of entities per KPI.

It is not clear to me how the abstractions available help or play into that.

Does anyone know if the number of entities per KPI guidelines apply as written when employing base searches for KPIs and service templates for the services?

I thought that the base search ran once for all services using it, not once per service. Can anyone speak to this?

0 Karma
1 Solution

yannK
Splunk Employee
Splunk Employee

The ultimate limits are :

  • the number of stats commands and split by kpi and service and entity when running a shared base search.

example :
this shared base search
... | stats count(kpi1) max(kpi2) last(kpi3) by entity
has 3 metrics results per entity

if the SBS is used by 3 KPIs per service
and you have 10 services using those 3 kpis

then if you have a split by entity, and about 200 entities per services ....
then you are looking ar 3 x 3 x 10 x 200 ~ 18000 combinations of cardinality each time the SBS runs.

And if this is too much, because the search is taking too long, or because it hits a stats command limit...
then you may have to create another identical SBS, and spread the services to use one of the other, and stay under control.

  • the search is long to run
    if you kpi runs every minute, but it takes more than 60 seconds to completes, you are in trouble.
    You may also have to optimize your search or reduce the number for the split by.

  • the REST call to create the entity filter in the SPL query is very long
    I saw some long list of "host=a OR host=b OR host=c ...." that made the search so massive that it was very slow to parse the search.

View solution in original post

0 Karma

yannK
Splunk Employee
Splunk Employee

The ultimate limits are :

  • the number of stats commands and split by kpi and service and entity when running a shared base search.

example :
this shared base search
... | stats count(kpi1) max(kpi2) last(kpi3) by entity
has 3 metrics results per entity

if the SBS is used by 3 KPIs per service
and you have 10 services using those 3 kpis

then if you have a split by entity, and about 200 entities per services ....
then you are looking ar 3 x 3 x 10 x 200 ~ 18000 combinations of cardinality each time the SBS runs.

And if this is too much, because the search is taking too long, or because it hits a stats command limit...
then you may have to create another identical SBS, and spread the services to use one of the other, and stay under control.

  • the search is long to run
    if you kpi runs every minute, but it takes more than 60 seconds to completes, you are in trouble.
    You may also have to optimize your search or reduce the number for the split by.

  • the REST call to create the entity filter in the SPL query is very long
    I saw some long list of "host=a OR host=b OR host=c ...." that made the search so massive that it was very slow to parse the search.

0 Karma

sail4lot
Path Finder

Thanks @yannK. This makes good sense. In our case, I think the actual issue has more to do with your last bullet point. Specifically, the resulting base search is quite long with all of the host=a OR host=b OR host=c. If the parsing of that (which I'll check) is causing the bottle neck, that's probably part of the answer.

As far as cardinality goes though, how would that affect performance or capacity planning? Our math on a per service basis looks something like 3 metrics x 3 kpis x 100 services x 40 entities = 36,000 for a given base search. Besides parsing, where else could a bottleneck exist for a base search like this? Note, the actual search (without all the extra stuff at the end) runs well under 30 seconds and only runs on a 15 minute frequency.

0 Karma

yannK
Splunk Employee
Splunk Employee

usually, the problem with long list of entities filters, are long search terms, or scripts timing out when waiting for the REST call to return the list.
But if you reach that point, you will be working with a support engineer to tune splunk.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...