Splunk Search

mstats with Splunk_TA_Nix

rdhdr
Explorer

Hello experts, 

I have a dashboard in simple xml that shows single number charts which reflect, by host and application, whether OS tools are running on a particular set of servers. The hosts and applications are maintained in a lookup file and the metrics are contained in index em_metrics which is populated by Splunk_TA_Nix. The number charts represent how many servers associated with an application are currently NOT running that particular tool. Clicking on that number will drill down to a table which lists all of the servers and their status vis-a-vis that tool.

The lookup file format is:
host,application
host01,app01
host02,app01
host03,app02

A sample from the em_metrics index (populated by Splunk_TA_Nix):
| mstats count WHERE index="em_metrics" AND metric_name=ps_metric* host IN (host01) BY host, COMMAND span=15m 
"_time",host,COMMAND,count
"2025-09-12T11:45:00.000-0400",host01,"(sd-pam)",32
"2025-09-12T11:45:00.000-0400",host01,NetworkManager,16
"2025-09-12T11:45:00.000-0400",host01,tool01,224
"2025-09-12T11:45:00.000-0400",host01,tool01,32
"2025-09-12T11:45:00.000-0400",host01,"[acpi_thermal_pm]",16
"2025-09-12T11:45:00.000-0400",host01,"[audit_prune_tre]",16
"2025-09-12T11:45:00.000-0400",host01,"[blkcg_punt_bio]",16
"2025-09-12T11:45:00.000-0400",host01,"[bnx2i_thread/0]",16
"2025-09-12T11:45:00.000-0400",host01,"[bnx2i_thread/10]",16

Here is a Splunk query which works, but which only considers the hosts which show up in the em_metrics index. In other words, if an application has 8 serrvers associated with it in the lookup file, but only 6 of them have metrics in the em_metrics index, and only 5 of those servers are running the tool, then the single number chart should show a 3, reflecting the 1 server not running the tool, as well as the 2 servers missing from the index (because they are not reporting any metrics at all).

base query:
| mstats count WHERE index="em_metrics" AND metric_name=ps_metric* host IN (*) BY host, COMMAND span=1h 

Query for single:
| lookup hostlist host
| where application="app01"
| rename COMMAND AS process
| eval expected_process_found=if(match(process,"(?i)tool01"),1,0)
| stats max(expected_process_found) AS expected_process_found first(process) AS Process BY host
| eval Process=if(expected_process_found=1, "tool01 Found Running", "tool01 not running")
| search Process="tool01 not running"
| stats count
Query for table;
| lookup hostlist host
| where application="app01"
| rename COMMAND AS process
| eval expected_process_found=if(match(process,"(?i)tool01"),1,0)
| stats max(expected_process_found) AS expected_process_found first(process) AS Process BY host
| eval Process=if(expected_process_found=1, "tool01 Found Running", "tool01 not running")
| table host Process expected_process_found application

I have two changes I need to make.
1. How can I modify this to check for also include hosts in the lookup file that do not have metrics in the em_metrics index?
2. How can I convert the single chart into a timechart so that it can show the current count, but also a trendline underneath the number?
Labels (2)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @rdhdr 

To include the hosts not found in the em_metrics index you can append an inputlookup before the stats line, such as:

| mstats count WHERE index="em_metrics" AND metric_name=ps_metric* host IN (host01) BY host, COMMAND span=15m 
| lookup hostlist host
| where application="app01"
| rename COMMAND AS process
| eval expected_process_found=if(match(process,"(?i)tool01"),1,0)
| append [| inputlookup hostlist | eval expected_process_found=0]
| stats max(expected_process_found) AS expected_process_found first(process) AS Process BY host
| eval Process=if(expected_process_found=1, "tool01 Found Running", "tool01 not running")
| table host Process expected_process_found application

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

To make the timeline I think you will need to change your two stats commands to timechart commands, you already have the mstats with a _time span so this should work quite easily.

Get Updates on the Splunk Community!

Observe and Secure All Apps with Splunk

  Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

Splunk Decoded: Business Transactions vs Business IQ

It’s the morning of Black Friday, and your e-commerce site is handling 10x normal traffic. Orders are flowing, ...

Fastest way to demo Observability

I’ve been having a lot of fun learning about Kubernetes and Observability. I set myself an interesting ...