Splunk Search

mstats with Splunk_TA_Nix

rdhdr
Explorer

Hello experts, 

I have a dashboard in simple xml that shows single number charts which reflect, by host and application, whether OS tools are running on a particular set of servers. The hosts and applications are maintained in a lookup file and the metrics are contained in index em_metrics which is populated by Splunk_TA_Nix. The number charts represent how many servers associated with an application are currently NOT running that particular tool. Clicking on that number will drill down to a table which lists all of the servers and their status vis-a-vis that tool.

The lookup file format is:
host,application
host01,app01
host02,app01
host03,app02

A sample from the em_metrics index (populated by Splunk_TA_Nix):
| mstats count WHERE index="em_metrics" AND metric_name=ps_metric* host IN (host01) BY host, COMMAND span=15m 
"_time",host,COMMAND,count
"2025-09-12T11:45:00.000-0400",host01,"(sd-pam)",32
"2025-09-12T11:45:00.000-0400",host01,NetworkManager,16
"2025-09-12T11:45:00.000-0400",host01,tool01,224
"2025-09-12T11:45:00.000-0400",host01,tool01,32
"2025-09-12T11:45:00.000-0400",host01,"[acpi_thermal_pm]",16
"2025-09-12T11:45:00.000-0400",host01,"[audit_prune_tre]",16
"2025-09-12T11:45:00.000-0400",host01,"[blkcg_punt_bio]",16
"2025-09-12T11:45:00.000-0400",host01,"[bnx2i_thread/0]",16
"2025-09-12T11:45:00.000-0400",host01,"[bnx2i_thread/10]",16

Here is a Splunk query which works, but which only considers the hosts which show up in the em_metrics index. In other words, if an application has 8 serrvers associated with it in the lookup file, but only 6 of them have metrics in the em_metrics index, and only 5 of those servers are running the tool, then the single number chart should show a 3, reflecting the 1 server not running the tool, as well as the 2 servers missing from the index (because they are not reporting any metrics at all).

base query:
| mstats count WHERE index="em_metrics" AND metric_name=ps_metric* host IN (*) BY host, COMMAND span=1h 

Query for single:
| lookup hostlist host
| where application="app01"
| rename COMMAND AS process
| eval expected_process_found=if(match(process,"(?i)tool01"),1,0)
| stats max(expected_process_found) AS expected_process_found first(process) AS Process BY host
| eval Process=if(expected_process_found=1, "tool01 Found Running", "tool01 not running")
| search Process="tool01 not running"
| stats count
Query for table;
| lookup hostlist host
| where application="app01"
| rename COMMAND AS process
| eval expected_process_found=if(match(process,"(?i)tool01"),1,0)
| stats max(expected_process_found) AS expected_process_found first(process) AS Process BY host
| eval Process=if(expected_process_found=1, "tool01 Found Running", "tool01 not running")
| table host Process expected_process_found application

I have two changes I need to make.
1. How can I modify this to check for also include hosts in the lookup file that do not have metrics in the em_metrics index?
2. How can I convert the single chart into a timechart so that it can show the current count, but also a trendline underneath the number?
Labels (2)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @rdhdr 

To include the hosts not found in the em_metrics index you can append an inputlookup before the stats line, such as:

| mstats count WHERE index="em_metrics" AND metric_name=ps_metric* host IN (host01) BY host, COMMAND span=15m 
| lookup hostlist host
| where application="app01"
| rename COMMAND AS process
| eval expected_process_found=if(match(process,"(?i)tool01"),1,0)
| append [| inputlookup hostlist | eval expected_process_found=0]
| stats max(expected_process_found) AS expected_process_found first(process) AS Process BY host
| eval Process=if(expected_process_found=1, "tool01 Found Running", "tool01 not running")
| table host Process expected_process_found application

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

To make the timeline I think you will need to change your two stats commands to timechart commands, you already have the mstats with a _time span so this should work quite easily.

Get Updates on the Splunk Community!

SOC4Kafka - New Kafka Connector Powered by OpenTelemetry

The new SOC4Kafka connector, built on OpenTelemetry, enables the collection of Kafka messages and forwards ...

Your Voice Matters! Help Us Shape the New Splunk Lantern Experience

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Building Momentum: Splunk Developer Program at .conf25

At Splunk, developers are at the heart of innovation. That’s why this year at .conf25, we officially launched ...