Splunk Search

mstats with Splunk_TA_Nix

rdhdr
Explorer

Hello experts, 

I have a dashboard in simple xml that shows single number charts which reflect, by host and application, whether OS tools are running on a particular set of servers. The hosts and applications are maintained in a lookup file and the metrics are contained in index em_metrics which is populated by Splunk_TA_Nix. The number charts represent how many servers associated with an application are currently NOT running that particular tool. Clicking on that number will drill down to a table which lists all of the servers and their status vis-a-vis that tool.

The lookup file format is:
host,application
host01,app01
host02,app01
host03,app02

A sample from the em_metrics index (populated by Splunk_TA_Nix):
| mstats count WHERE index="em_metrics" AND metric_name=ps_metric* host IN (host01) BY host, COMMAND span=15m 
"_time",host,COMMAND,count
"2025-09-12T11:45:00.000-0400",host01,"(sd-pam)",32
"2025-09-12T11:45:00.000-0400",host01,NetworkManager,16
"2025-09-12T11:45:00.000-0400",host01,tool01,224
"2025-09-12T11:45:00.000-0400",host01,tool01,32
"2025-09-12T11:45:00.000-0400",host01,"[acpi_thermal_pm]",16
"2025-09-12T11:45:00.000-0400",host01,"[audit_prune_tre]",16
"2025-09-12T11:45:00.000-0400",host01,"[blkcg_punt_bio]",16
"2025-09-12T11:45:00.000-0400",host01,"[bnx2i_thread/0]",16
"2025-09-12T11:45:00.000-0400",host01,"[bnx2i_thread/10]",16

Here is a Splunk query which works, but which only considers the hosts which show up in the em_metrics index. In other words, if an application has 8 serrvers associated with it in the lookup file, but only 6 of them have metrics in the em_metrics index, and only 5 of those servers are running the tool, then the single number chart should show a 3, reflecting the 1 server not running the tool, as well as the 2 servers missing from the index (because they are not reporting any metrics at all).

base query:
| mstats count WHERE index="em_metrics" AND metric_name=ps_metric* host IN (*) BY host, COMMAND span=1h 

Query for single:
| lookup hostlist host
| where application="app01"
| rename COMMAND AS process
| eval expected_process_found=if(match(process,"(?i)tool01"),1,0)
| stats max(expected_process_found) AS expected_process_found first(process) AS Process BY host
| eval Process=if(expected_process_found=1, "tool01 Found Running", "tool01 not running")
| search Process="tool01 not running"
| stats count
Query for table;
| lookup hostlist host
| where application="app01"
| rename COMMAND AS process
| eval expected_process_found=if(match(process,"(?i)tool01"),1,0)
| stats max(expected_process_found) AS expected_process_found first(process) AS Process BY host
| eval Process=if(expected_process_found=1, "tool01 Found Running", "tool01 not running")
| table host Process expected_process_found application

I have two changes I need to make.
1. How can I modify this to check for also include hosts in the lookup file that do not have metrics in the em_metrics index?
2. How can I convert the single chart into a timechart so that it can show the current count, but also a trendline underneath the number?
Labels (2)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @rdhdr 

To include the hosts not found in the em_metrics index you can append an inputlookup before the stats line, such as:

| mstats count WHERE index="em_metrics" AND metric_name=ps_metric* host IN (host01) BY host, COMMAND span=15m 
| lookup hostlist host
| where application="app01"
| rename COMMAND AS process
| eval expected_process_found=if(match(process,"(?i)tool01"),1,0)
| append [| inputlookup hostlist | eval expected_process_found=0]
| stats max(expected_process_found) AS expected_process_found first(process) AS Process BY host
| eval Process=if(expected_process_found=1, "tool01 Found Running", "tool01 not running")
| table host Process expected_process_found application

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

To make the timeline I think you will need to change your two stats commands to timechart commands, you already have the mstats with a _time span so this should work quite easily.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...