As shown in the below picture, those are the events with a timestamp. I want when a "Kafka" service or "Jps" services are down, I will get an alert. How to write a search query for this when any of the below services are down, I will get an alert.
if you have few words to search, you can insert them in your main search:
<your_search> (Kafka OR Jps OR <other_words>)
if these words are in a field, you can use the field to have more performat searches (e.g. they are in a field called "service"):
<your_search> (service=Kafka OR service=Jps OR servide=<other_words>)
If instead you have many words to check, you can put them in a lookup (called e.g. "patterns.csv" with a single column called "pattern"), if you haven't them in a field:
<your_search> [ | inputlookup patterns | rename pattern AS query | fields query]
if you have them in a field:
<your_search> [ | inputlookup patterns | fields pattern ]
At the end I hint to follow the basic training in Splunk : Fundamentals i course (https://www.splunk.com/en_us/training/free-courses/splunk-fundamentals-1.html) that's a free course and the Search Tutorial (https://docs.splunk.com/Documentation/Splunk/8.1.0/SearchTutorial/WelcometotheSearchTutorial) that help you to understand how Splunk works.
@gcusello there is no fields to segregate.
Actually, the question is In a Linux machined using JPS command some services is are running, ex: Kafka, JPS etc with PID, if any services are stopped we need to get an alert.
Here some tricky idea I have, so if the keyword "Kafka" is not seen in events for more than 1 minute I want to get that alert, so based on this the application team to know oh! the Kafka services are not running in that particulate host.
Here is the Query:
index="main" host="linux machine" source="logs" "Kafka"
Please suggest the query to get the alert when "Kafka" word is seen more than 1 minitue.
as I said, if you have to search the presence of few word (e.g. only "Kafka"), you can use the search you shared:
index="main" host="linux machine" source="logs" "Kafka" earliest-1m@m latest now
and save it as an alert scheduled to run every minute (cron * * * * *)
then configure it to send an email or make another action.
Only one meditation: meybe a time period of one minute is too frequent and not efficient, because you and your team probably haven't a reaction time of one minute, so you could also use a little larger time frame (e.g. 5 minutes).
@gcusello Thanks for helping!
If you see the below query and latest log in that able to see the "Kafka" is running, So in the same case I need if the service "Kafka" is not present in that list I want to know with alert.
what are all you suggesting it's not working for me, you can see the sample alert I got.
Please help, if the keyword in the below list is not in any events will get mail. Can you provide a solution for this?
I suspect the problem is that, based on your data, that your event contains 1...n of the processes that are present, so event 1 may contain
and you are trying to see if Jps OR Kafka OR Bootstrap OR other OR other are not present for a minute
I suggest you do the following
<search> earliest=-6m@m latest=-1m@m | rex field=_raw max_match=0 "(?<pid>\d+)\s(?<process_name>[\w\s]*) | mvexpand process_name | stats count by process_name | append [ | inputlookup append=t your_list_of_reqiured_processes | eval count = 0 ] | stats max(count) as count by process_name | where count=0
This is extracting your PID/Process name from the _raw event (you will have to confirm that this creates a Splunk multivalue field with all the process names in it, per event).
Then it expands all the process names to their separate events.
It then counts the occurrences of each process.
To then work out which ones are missing you just append your lookup file to the end of the results with a count of 0 and then look for the largest count value per process and if it's 0, you have your list.
OK, so the key is the rex statement and the regex that extracts the PID and process names from the _raw message.
Assuming there will be no space in the process name, then this should work
| rex field=_raw max_match=0 "(?<pid>\d+)\s(?<process_name>[\w]*)\s?"
which is saying extract
pid = sequence of digits
followed by a single whitespace
process_name = sequence of word characters followed by
If you do a
| table _raw pid process_name
after the rex statement without the rest of the query, you can see what the rex is extracting. If that shows the pid and process names as multi values in the field then it's good and the rest of the query will work.
sorry but I don't understand your need, so I try to summarize it:
is it correct?
If this is your need the below search is correct:
index="main" host="linux machine" source="logs" (Kafka OR Jps) earliest=-1m@m latest=now
and you have to configure your alert to trigger when there's no result.
What's your problem:
T@gcusello: The application team creates manually.
services.txt file contains, below logs, it looks like this. These logs are ingested with Splunk. So based on this I can't able to correlate the file because of logs like this.