Splunk Search

Alert search query to monitor for fluctuation in system performance times between today and yesterday

alexandermunce
Communicator

I am working with a set of transactions data where in each transaction could relate to any of our numerous systems/products.

The data could be simplified for the sake of this question to contain fields similar to the below:

Event 1:

TransationID: 1000
Product: System1
ResponseTime: 1234

Event 2:

TransactionID: 1001
Product: System2
ResponseTime: 4321

Event 3:

TransactionID: 1002
Product: System3
ResponseTime: 5678

Etc.

So I am interested in setting up an alert which works in the following manner:

The search query will return the average response time for each product for yesterday, and compare that to the average response time for each product today and for any products where there has been an X% increase - send an email alert - If possible, the alert will advise the product in question which has experienced the fluctuation in response time.

I feel like perhaps a custom trigger condition may be suitable for this instance?

The search query below will successfully provide me with the average response times per product for yesterday:

source="transactionLog" type="report" earliest=-1d@d latest=@d | stats avg(responsetime) by product

I saw a slightly similar question on Splunk Answers in which the person was attempting to use a subsearch in order to generate his alert query - however my situation differs as I am breaking down by products, so I am not sure that I can use a Number of Results trigger condition.

UPDATE:

I have put together the below query - I feel this is suitable as it actually compares the responsetime - I am having trouble understanding the logic from some of the comments in the answers below.

source="transactionLog" type="report" earliest=-1w@w latest=@w | 
stats avg(responsetime) as Previous_Week_Response | 
appendcols [search source="transactionLog" type="report" earliest=@w latest=now | 
stats avg(responsetime) as Current_Week_Response] | 
eval Week=if(_time>=relative_time(now(),"@w"),"Current_Week","Previous_Week") |
chart avg(responsetime) over product by Week |
where Current_Week_Response>=2*Previous_Week_Response
0 Karma
1 Solution

somesoni2
Revered Legend

I think you can achieve this alert for each product via single search only. Try something like this (example of checking 50% increase over yesterday)

source="transactionLog" type="report" earliest=-1d@d latest=now | eval Day=if(_time>=relative_time(now(),"@d"),"Today","Yesterday") 
| chart avg(responsetime) over product by Day | where Today>1.5*Yesterday

Above will list all the products where avg response time of today is 1.5 (50% ) time more than yesterday. You can then setup alert as , Alert options, "Once per search" (one alert for all products triggering the alert condition) OR "once per result" for one email for each product triggering the alert condition.

View solution in original post

somesoni2
Revered Legend

I think you can achieve this alert for each product via single search only. Try something like this (example of checking 50% increase over yesterday)

source="transactionLog" type="report" earliest=-1d@d latest=now | eval Day=if(_time>=relative_time(now(),"@d"),"Today","Yesterday") 
| chart avg(responsetime) over product by Day | where Today>1.5*Yesterday

Above will list all the products where avg response time of today is 1.5 (50% ) time more than yesterday. You can then setup alert as , Alert options, "Once per search" (one alert for all products triggering the alert condition) OR "once per result" for one email for each product triggering the alert condition.

alexandermunce
Communicator

@somesoni2

I am thinking something like the below is more suitable as it actually is comparing the responsetime

source="transactionLog" type="report" earliest=-1w@w latest=@w | 
stats avg(responsetime) as Previous_Week_Response | 
appendcols [search source="transactionLog" type="report" earliest=@w latest=now | stats avg(responsetime) as Current_Week_Response] | 
eval Week=if(_time>=relative_time(now(),"@w"),"Current_Week","Previous_Week") |
chart avg(responsetime) over product by Week |
where Current_Week_Response>=2*Previous_Week_Response
0 Karma

alexandermunce
Communicator

@somesoni2 - what are your thoughts on my query above?

0 Karma

somesoni2
Revered Legend

This will work too, but as I told earlier, you're running two searches here so if the performance is not an issue for you, above will just do the task,.

One problem that you may run with your query is that you're relying on the order of the product (appendcols merges column row by row), so if there is a missing product this week OR new product this week, the results will not be 100% correct. Whereas other query will take care of it.

alexandermunce
Communicator

@somesoni2

I have a concern about the query you have provided above.

It appears that the where clause is actually comparing Today being 1.5x Yesterday - however, Today and Yesterday have no relation to the responsetime - they are only relating to the _time field???

Is this correct?

0 Karma

somesoni2
Revered Legend

After the chart command 'Today' is the column which stores the avg response time for today and 'Yesterday' is the avg response time cor yesterday. Using this you're getting avg response time for both period in single search hence its the optimum way. The same can be achieved by running two searches and joining them using appendcols or similar.

0 Karma

alexandermunce
Communicator

@somesoni2

I dont understand how "Today" or "Yesterday" can contain response time data as the eval for those only refer's to the _time field:

eval Day=if(_time>=relative_time(now(),"@d"),"Today","Yesterday")
0 Karma

alexandermunce
Communicator

@somesoni2

Should it not say something like:

eval Day=responsetime if(_time>=relative_time(now(),"@d"),"Today","Yesterday")
0 Karma

alexandermunce
Communicator

If you understand what I am saying -

If I inspect the "Day" or "Week" field - I can see it only has 2 values, Today and Yesterday

So how can we use the where clause comparing them as though they are numbers?

where Today>1.5*Yesterday
0 Karma

alexandermunce
Communicator

@somesoni2

I am just curious on how I can set the alert trigger conditions for this alert to function correctly.

The problem is, my search is returning over 75,000 events, and these are not restricted to the events which fit the where clause - it is simply returning all events in that time period.

However, the chart command is correctly displaying the products that fit the where clause.

Do I need to set a CUSTOM trigger as below;

Trigger condition: Custom
Custom condition: Current_Week>=2*Previous_Week

0 Karma

somesoni2
Revered Legend

Use the whole search as your alert search

source="transactionLog" type="report" earliest=-1d@d latest=now | eval Day=if(_time>=relative_time(now(),"@d"),"Today","Yesterday") 
 | chart avg(responsetime) over product by Day | where Today>1.5*Yesterday

In alert condition, use "if number of events greater than 0", as above search will only return results when there are products satisfying where clause.

alexandermunce
Communicator

@somesoni2

If I wanted to edit this to compare this week's average vs last weeks - could I so by changing to the following:

earliest=-1w@w latest=now | eval Week=if(_time>=relative_time(now(),"@w"),"This week","Last week")
0 Karma

somesoni2
Revered Legend

Absolutely. I would avoid using spaces in the labels though so that comparison later is easier. Elae you need to keep them within single quotes in where clause.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...