I am utilizing a correlation search to schedule the delivery of application performance metrics against running processes on remote hosts. Whether the host has reported using the winhostmon:process stanza or on *nix boxes using ps within a given amount of time is a good enough place to start.
I have been tasked with creating a template to monitor processes in our enterprise. We want one search that we can use with any process and any OS and we want to generate an event when the process is broken and when the process returns to "good". My attempt follows. Please recommend a better way to accomplish this or how I can solve the existing problems with my solution.
I have managed to cobble together a query that gives good enough results that we can use, however there is one problem with a column being potentially ridiculously large.
earliest=-10m ((sourcetype=WinHostMon source=Process) OR sourcetype=ps) | rex field=_raw "CommandLine=(?.+[^\n])" | eval full_command=coalesce(CmdLine,app), Process=coalesce(Name,process_name) | search [| inputlookup Customer_test_processes.csv] | stats latest(_time) AS last_reported by host Process full_command source| eval age = now() - last_reported | search age > 300 | sort - age | convert ctime(last_reported) | eval timestamp=now() | convert ctime(timestamp) | eval Category="Process" | eval Severity="INFO" | eval Value="5" | eval message="The Splunk Process heartbeat for ".host." ".Process." is Unreachable. Last reported at: ".last_reported | eval event_title=host."-".Process."-Heartbeat" | table timestamp host Category Process Severity full_command Value source message event_title
The problem stems from the resulting "full_command" field. I would prefer that the output from the command not be so big. In some cases where possibly *nix boxes are running apache tomcat servers or jboss, the process to monitor would be java, however the full command would be an ungodly amount of classpath configurations and java flags.
I would like to know if there is a way to get the "full_command" as it appears in the lookup table to marry up with the corresponding search result that is returned from the query as opposed to the entire command.