Splunk Search

Splunk Alert for Response Time- How to add the time condition value?

asplunk789
Loves-to-Learn Everything

Hi Team, I want a splunk search query for alert creation. My requirement is service Response time is > 3 seconds and  if it is continuous more than 10 min (> 10 min), then only I need to raise an alert.

In search query i tried to use the where option for the response time, but for time condition can't able to write the query. Below is my search query. please help me how to add the time condition value in query itself.

 

index=kpidata | eval ProcessingTime=ProcessingTimeMS/1000
| where ProcessingTime > 3

Labels (1)
Tags (2)
0 Karma

johnhuang
Motivator

Here is one way of implementing it (assuming that ProcessingTimeMS is the field that represents response time).

This example generates and analyze the last 30 minutes of sample day. 

  • Summarizes the last 10 minutes with min, avg, and max response time and populates each event for later filtering.
  • You can adjust the filter based on how strict you want it, for example 
    • Every response time within last 10 minutes > 3
      • last_10m_min_secs>3
    • If you want to use average response time:  last_10m_avg_secs>3

The reason to expand the search to say 30 minutes is to provide some historical data for reference. Having additional data will make a standalone alert more meaningful/useful. Of course, this creates some overhead which you need to consider.

 

 

 

| makeresults | eval start=now()-1800, end=now()
| eval range=MVAPPEND(start, end)
| mvexpand range | eval _time=range | fields _time
| makecontinuous _time span=30s | sort 0 -_time
| eval ProcessingTimeMS=(random() % 20000) + 1000

| eval time_secs=ProcessingTimeMS/1000 
| bucket _time span=1m | stats avg(time_secs) AS avg_secs max(time_secs) AS max_secs min(time_secs) AS min_secs BY _time
| eval stats_name=strftime(_time, "%Y-%m-%d %H:%M:%S")
| appendpipe [| where _time>relative_time(now(), "-10m@m")      | rename avg_secs AS time_secs
| stats avg(time_secs) AS avg_secs max(time_secs) AS max_secs min(time_secs) AS min_secs 
| foreach * [| eval <<FIELD>>=ROUND(<<FIELD>>) | eval last_10m_<<FIELD>>=<<FIELD>>]
| eval stats_name="_Last 10 Min Stats_"]
| eventstats max(last_10m_min_secs) AS last_10m_min_secs max(last_10m_max_secs) AS last_10m_max_secs max(last_10m_avg_secs) AS last_10m_avg_secs
| sort 0 - stats_name 
| foreach *secs [| eval <<FIELD>>=ROUND(<<FIELD>>)]
| eval stats_type="RESPONSE TIME"
| table stats_name stats_type avg_secs min_secs max_secs last_10m*

| where last_10m_avg_secs>3
| fields - last_*

 

 

johnhuang_0-1666882155185.png

 

0 Karma

johnhuang
Motivator

The field to calculate response time is ProcessingTimeMS correct?
How often does it poll/events received? e.g. 30 seconds, minute, etc...
Do you need to calculate this for multiple series of host/processes within the same search?

0 Karma

asplunk789
Loves-to-Learn Everything

Thanks @gcusello  for the quick response. 

  • is there a message or an eventcode that says if the service is up or down? --> its a just a message having more response time
  • is the time condition an additional condition to the one for response time or is it a different one? --> It's different one where the response is >3 seconds and it's continues more than 10 min (more than 10 min having >3 seconds  then  my service is having some issue and keep on responding with more response time. so need alert for this condition)

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @asplunk789,

it isn't stilll so clear: have you an event containing a string e.g. "system up"?

if yes, you want to know if you're receiving events with "system up" for more than 10 minutes, is it correct?

if this is your need you could try something like this:

<your_search> "system up"
| stats 
   earliest(_time) AS earliest
   latest(_time) AS latest
   BY response_time
| eval duration=latest-earliest
| where response_time>3 AND duration>600

Ciao.

Giuseppe

 

0 Karma

asplunk789
Loves-to-Learn Everything

No I don't have anything string like that "system up".

My requirement is only, response time for any service greater than 3 seconds that to it is continuing for 10 min. Then only I need to raise an alert for this issue.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @asplunk789,

to try to answer to yur question I need two information:

  • is there a message or an eventcode that says if the service is up or down?
  • is the time condition an additional condition to the one for response time or is it a different one?

Ciao.

Giuseppe

0 Karma

asplunk789
Loves-to-Learn Everything

Thanks @gcusello for the quick response.

  • is there a message or an eventcode that says if the service is up or down? --> its a just a message having more response time.
  • is the time condition an additional condition to the one for response time or is it a different one? --> It's different one where the response is >3 seconds and it's continues more than 10 min (more than 10 min having >3 seconds then my service is having some issue and keep on responding with more response time. so need alert for this condition)
0 Karma
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...