All Apps and Add-ons

Splunk ITSI Infrastructure Overview status for Entities

vijaybaskarss
Loves-to-Learn Lots

Hello,

The Infrastructure overview in Splunk ITSI shows entities list like active, unstable, inactive and N/A. Can you help me what is reference point for all these status, in our environment it is showing many in N/A and unstable. But we are still receiving data for whichever showing N/A and unstable, also added recurring import using available modules. But still that is not reflecting as active.

Please advise.

Regards,

Vj

Labels (2)
Tags (2)
0 Karma

yannK
Splunk Employee
Splunk Employee

My understanding that you have entities detected in ITSI, in version 4.9.*
and some of those entities status are often flagged as "unstable" or "inactive" 

The logic to change the status of an entity if that when the scheduled import ran, it did no find data for that entity (for the health metric)  in the timerange of the search. It will flag it as "unstable" if the entity is not constantly detected, and "inactive" if this is constant.
see https://docs.splunk.com/Documentation/ITSI/4.9.3/Entity/InfraOverview#Monitor_entity_status

Out of the box, the ITSI entity import are very frequent and aggressive, this also may impact the detection
see this remark in the documentation
>Note:If you have a large number of entities, the recurring bulk import can take a longer time to complete. Tune the cron schedule of the recurring import searches to search less frequently in order to ensure your entity status updates on time.

By example for the Windows entities, the scheduled saved search doing the import is called "ITSI Import Objects - Perfmon"
It looks for key metric "metric_name=Processor.* OR metric_name=processor.*"
And it runs every minute, and look back 90seconds.

So the root cause for unstable entities may be :

  • the host matching this entity is not sending data consistently
    • To address this you want to check the data ingestion and frequency (maybe send data more frequently)
  • Or the data has lag, therefore is out of the search window.
    • To address it, measure your average lag, and decide if you can improve or adjust the search windows to account for that lag.
  • or the search used for the import has a too short timerange, that does not mach the metric collection interval.
    • To address this one, you could change the import search to run less often, but look back for longer period.

By example for the windows entities, if you collect perfmon data every 5 minutes, with an average lag of 1 minute, change the search to run every 5m and look back maybe 7 minutes to account for delay.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!