Splunk Search

Splunk Alert for Response Time- How to add the time condition value?

asplunk789
Loves-to-Learn Everything

Hi Team, I want a splunk search query for alert creation. My requirement is service Response time is > 3 seconds and  if it is continuous more than 10 min (> 10 min), then only I need to raise an alert.

In search query i tried to use the where option for the response time, but for time condition can't able to write the query. Below is my search query. please help me how to add the time condition value in query itself.

 

index=kpidata | eval ProcessingTime=ProcessingTimeMS/1000
| where ProcessingTime > 3

Tags (2)
0 Karma

johnhuang
Motivator

Here is one way of implementing it (assuming that ProcessingTimeMS is the field that represents response time).

This example generates and analyze the last 30 minutes of sample day. 

  • Summarizes the last 10 minutes with min, avg, and max response time and populates each event for later filtering.
  • You can adjust the filter based on how strict you want it, for example 
    • Every response time within last 10 minutes > 3
      • last_10m_min_secs>3
    • If you want to use average response time:  last_10m_avg_secs>3

The reason to expand the search to say 30 minutes is to provide some historical data for reference. Having additional data will make a standalone alert more meaningful/useful. Of course, this creates some overhead which you need to consider.

 

 

 

| makeresults | eval start=now()-1800, end=now()
| eval range=MVAPPEND(start, end)
| mvexpand range | eval _time=range | fields _time
| makecontinuous _time span=30s | sort 0 -_time
| eval ProcessingTimeMS=(random() % 20000) + 1000

| eval time_secs=ProcessingTimeMS/1000 
| bucket _time span=1m | stats avg(time_secs) AS avg_secs max(time_secs) AS max_secs min(time_secs) AS min_secs BY _time
| eval stats_name=strftime(_time, "%Y-%m-%d %H:%M:%S")
| appendpipe [| where _time>relative_time(now(), "-10m@m")      | rename avg_secs AS time_secs
| stats avg(time_secs) AS avg_secs max(time_secs) AS max_secs min(time_secs) AS min_secs 
| foreach * [| eval <<FIELD>>=ROUND(<<FIELD>>) | eval last_10m_<<FIELD>>=<<FIELD>>]
| eval stats_name="_Last 10 Min Stats_"]
| eventstats max(last_10m_min_secs) AS last_10m_min_secs max(last_10m_max_secs) AS last_10m_max_secs max(last_10m_avg_secs) AS last_10m_avg_secs
| sort 0 - stats_name 
| foreach *secs [| eval <<FIELD>>=ROUND(<<FIELD>>)]
| eval stats_type="RESPONSE TIME"
| table stats_name stats_type avg_secs min_secs max_secs last_10m*

| where last_10m_avg_secs>3
| fields - last_*

 

 

johnhuang_0-1666882155185.png

 

0 Karma

johnhuang
Motivator

The field to calculate response time is ProcessingTimeMS correct?
How often does it poll/events received? e.g. 30 seconds, minute, etc...
Do you need to calculate this for multiple series of host/processes within the same search?

0 Karma

asplunk789
Loves-to-Learn Everything

Thanks @gcusello  for the quick response. 

  • is there a message or an eventcode that says if the service is up or down? --> its a just a message having more response time
  • is the time condition an additional condition to the one for response time or is it a different one? --> It's different one where the response is >3 seconds and it's continues more than 10 min (more than 10 min having >3 seconds  then  my service is having some issue and keep on responding with more response time. so need alert for this condition)

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @asplunk789,

it isn't stilll so clear: have you an event containing a string e.g. "system up"?

if yes, you want to know if you're receiving events with "system up" for more than 10 minutes, is it correct?

if this is your need you could try something like this:

<your_search> "system up"
| stats 
   earliest(_time) AS earliest
   latest(_time) AS latest
   BY response_time
| eval duration=latest-earliest
| where response_time>3 AND duration>600

Ciao.

Giuseppe

 

0 Karma

asplunk789
Loves-to-Learn Everything

No I don't have anything string like that "system up".

My requirement is only, response time for any service greater than 3 seconds that to it is continuing for 10 min. Then only I need to raise an alert for this issue.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @asplunk789,

to try to answer to yur question I need two information:

  • is there a message or an eventcode that says if the service is up or down?
  • is the time condition an additional condition to the one for response time or is it a different one?

Ciao.

Giuseppe

0 Karma

asplunk789
Loves-to-Learn Everything

Thanks @gcusello for the quick response.

  • is there a message or an eventcode that says if the service is up or down? --> its a just a message having more response time.
  • is the time condition an additional condition to the one for response time or is it a different one? --> It's different one where the response is >3 seconds and it's continues more than 10 min (more than 10 min having >3 seconds then my service is having some issue and keep on responding with more response time. so need alert for this condition)
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...