Splunk Search

Alert search query to monitor for fluctuation in system performance times between today and yesterday

Communicator

I am working with a set of transactions data where in each transaction could relate to any of our numerous systems/products.

The data could be simplified for the sake of this question to contain fields similar to the below:

Event 1:

TransationID: 1000
Product: System1
ResponseTime: 1234

Event 2:

TransactionID: 1001
Product: System2
ResponseTime: 4321

Event 3:

TransactionID: 1002
Product: System3
ResponseTime: 5678

Etc.

So I am interested in setting up an alert which works in the following manner:

The search query will return the average response time for each product for yesterday, and compare that to the average response time for each product today and for any products where there has been an X% increase - send an email alert - If possible, the alert will advise the product in question which has experienced the fluctuation in response time.

I feel like perhaps a custom trigger condition may be suitable for this instance?

The search query below will successfully provide me with the average response times per product for yesterday:

source="transactionLog" type="report" earliest=-1d@d latest=@d | stats avg(responsetime) by product

I saw a slightly similar question on Splunk Answers in which the person was attempting to use a subsearch in order to generate his alert query - however my situation differs as I am breaking down by products, so I am not sure that I can use a Number of Results trigger condition.

UPDATE:

I have put together the below query - I feel this is suitable as it actually compares the responsetime - I am having trouble understanding the logic from some of the comments in the answers below.

source="transactionLog" type="report" earliest=-1w@w latest=@w | 
stats avg(responsetime) as Previous_Week_Response | 
appendcols [search source="transactionLog" type="report" earliest=@w latest=now | 
stats avg(responsetime) as Current_Week_Response] | 
eval Week=if(_time>=relative_time(now(),"@w"),"Current_Week","Previous_Week") |
chart avg(responsetime) over product by Week |
where Current_Week_Response>=2*Previous_Week_Response
0 Karma
1 Solution

Revered Legend

I think you can achieve this alert for each product via single search only. Try something like this (example of checking 50% increase over yesterday)

source="transactionLog" type="report" earliest=-1d@d latest=now | eval Day=if(_time>=relative_time(now(),"@d"),"Today","Yesterday") 
| chart avg(responsetime) over product by Day | where Today>1.5*Yesterday

Above will list all the products where avg response time of today is 1.5 (50% ) time more than yesterday. You can then setup alert as , Alert options, "Once per search" (one alert for all products triggering the alert condition) OR "once per result" for one email for each product triggering the alert condition.

View solution in original post

Revered Legend

I think you can achieve this alert for each product via single search only. Try something like this (example of checking 50% increase over yesterday)

source="transactionLog" type="report" earliest=-1d@d latest=now | eval Day=if(_time>=relative_time(now(),"@d"),"Today","Yesterday") 
| chart avg(responsetime) over product by Day | where Today>1.5*Yesterday

Above will list all the products where avg response time of today is 1.5 (50% ) time more than yesterday. You can then setup alert as , Alert options, "Once per search" (one alert for all products triggering the alert condition) OR "once per result" for one email for each product triggering the alert condition.

View solution in original post

Communicator

@somesoni2

I am thinking something like the below is more suitable as it actually is comparing the responsetime

source="transactionLog" type="report" earliest=-1w@w latest=@w | 
stats avg(responsetime) as Previous_Week_Response | 
appendcols [search source="transactionLog" type="report" earliest=@w latest=now | stats avg(responsetime) as Current_Week_Response] | 
eval Week=if(_time>=relative_time(now(),"@w"),"Current_Week","Previous_Week") |
chart avg(responsetime) over product by Week |
where Current_Week_Response>=2*Previous_Week_Response
0 Karma

Communicator

@somesoni2 - what are your thoughts on my query above?

0 Karma

Revered Legend

This will work too, but as I told earlier, you're running two searches here so if the performance is not an issue for you, above will just do the task,.

One problem that you may run with your query is that you're relying on the order of the product (appendcols merges column row by row), so if there is a missing product this week OR new product this week, the results will not be 100% correct. Whereas other query will take care of it.

Communicator

@somesoni2

I have a concern about the query you have provided above.

It appears that the where clause is actually comparing Today being 1.5x Yesterday - however, Today and Yesterday have no relation to the responsetime - they are only relating to the _time field???

Is this correct?

0 Karma

Revered Legend

After the chart command 'Today' is the column which stores the avg response time for today and 'Yesterday' is the avg response time cor yesterday. Using this you're getting avg response time for both period in single search hence its the optimum way. The same can be achieved by running two searches and joining them using appendcols or similar.

0 Karma

Communicator

@somesoni2

I dont understand how "Today" or "Yesterday" can contain response time data as the eval for those only refer's to the _time field:

eval Day=if(_time>=relative_time(now(),"@d"),"Today","Yesterday")
0 Karma

Communicator

@somesoni2

Should it not say something like:

eval Day=responsetime if(_time>=relative_time(now(),"@d"),"Today","Yesterday")
0 Karma

Communicator

If you understand what I am saying -

If I inspect the "Day" or "Week" field - I can see it only has 2 values, Today and Yesterday

So how can we use the where clause comparing them as though they are numbers?

where Today>1.5*Yesterday
0 Karma

Communicator

@somesoni2

I am just curious on how I can set the alert trigger conditions for this alert to function correctly.

The problem is, my search is returning over 75,000 events, and these are not restricted to the events which fit the where clause - it is simply returning all events in that time period.

However, the chart command is correctly displaying the products that fit the where clause.

Do I need to set a CUSTOM trigger as below;

Trigger condition: Custom
Custom condition: Current_Week>=2*Previous_Week

0 Karma

Revered Legend

Use the whole search as your alert search

source="transactionLog" type="report" earliest=-1d@d latest=now | eval Day=if(_time>=relative_time(now(),"@d"),"Today","Yesterday") 
 | chart avg(responsetime) over product by Day | where Today>1.5*Yesterday

In alert condition, use "if number of events greater than 0", as above search will only return results when there are products satisfying where clause.

Communicator

@somesoni2

If I wanted to edit this to compare this week's average vs last weeks - could I so by changing to the following:

earliest=-1w@w latest=now | eval Week=if(_time>=relative_time(now(),"@w"),"This week","Last week")
0 Karma

Revered Legend

Absolutely. I would avoid using spaces in the labels though so that comparison later is easier. Elae you need to keep them within single quotes in where clause.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!