Splunk Search

How to optimize an automated correlation search for process monitoring?

jrnortonjr
New Member

I am utilizing a correlation search to schedule the delivery of application performance metrics against running processes on remote hosts. Whether the host has reported using the winhostmon:process stanza or on *nix boxes using ps within a given amount of time is a good enough place to start.

I have been tasked with creating a template to monitor processes in our enterprise. We want one search that we can use with any process and any OS and we want to generate an event when the process is broken and when the process returns to "good". My attempt follows. Please recommend a better way to accomplish this or how I can solve the existing problems with my solution.

I have managed to cobble together a query that gives good enough results that we can use, however there is one problem with a column being potentially ridiculously large.

earliest=-10m ((sourcetype=WinHostMon source=Process) OR sourcetype=ps) | rex field=_raw "CommandLine=(?.+[^\n])" | eval full_command=coalesce(CmdLine,app), Process=coalesce(Name,process_name) | search [| inputlookup Customer_test_processes.csv] | stats latest(_time) AS last_reported by host Process full_command source| eval age = now() - last_reported | search age > 300 | sort - age | convert ctime(last_reported) | eval timestamp=now() | convert ctime(timestamp) | eval Category="Process" | eval Severity="INFO" | eval Value="5" | eval message="The Splunk Process heartbeat for ".host." ".Process." is Unreachable. Last reported at: ".last_reported | eval event_title=host."-".Process."-Heartbeat" | table timestamp host Category Process Severity full_command Value source message event_title

the included inputlookup table looks like this:

host,Process,full_command
XXXYYYZZZ,mdrv,custom_commandline
XXXYYYZZZ,SiteScope,-service
XXXYYYZZZ,splunkd,*


The problem stems from the resulting "full_command" field. I would prefer that the output from the command not be so big. In some cases where possibly *nix boxes are running apache tomcat servers or jboss, the process to monitor would be java, however the full command would be an ungodly amount of classpath configurations and java flags.

I would like to know if there is a way to get the "full_command" as it appears in the lookup table to marry up with the corresponding search result that is returned from the query as opposed to the entire command.

0 Karma

valiquet
Contributor

Add at the end
| lookup mylookup host AS host, Process AS Process OUTPUT full_command AS short_command

0 Karma
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...