I want to create an alert which will find requests which have not received a response.
I have created the following search which will find the the requests which have no responses. The request has an id called transactionid and the response has the same identifier called originaltransaction_id.
eval tid = coalesce(transaction_id, original_transaction_id) | tid maxpause=2m startswith="request" endswith="response" keepevicted=true | where evicted=1
This works as a search however as an alert I'm currently getting false alerts as the response may take up to 10 seconds to be received from when the request has been posted. So the alert needs to take that into account and ignore any requests which are younger than 10 seconds old.
Can someone please help me to add the time restriction to the "request" event to prevent the false alerts
@stewartevans, for your use case you would be better off using stats instead of transaction for correlation. Refer to About event grouping and correlation documentation. This should give you more control over your correlation using search filter as per your need after the stats command. Also stats would perform better for longer duration/more events as compared to transaction.
Following is a run anywhere example with some sample data as based on points described in the question. It generates some events with Request and Response i.e. for transaction id
4567. But no response for
8910. For testing purpose it also add a request
1112, which has time set as
9 sec before current time. While testing you can change the ` | eval _time=time-9
,11` etc test out less than, equal to and greater than 10 sec scenarios.
searchmatch() evaluation function has been used to create
type fields for corresponding events, as the same does not seem to be present in your data as per transaction query you have run.
2. The stats command groups
type together for each transaction id i.e.
3. Although for the following use case you dont need this but as per the type for data
mvindex(type,0) should give
startswith condition i.e.
mvindex(type,1) should give
endswith condition i.e.
4. In your case you are interested in events where
request exist but there is not response i.e.
| search type="request" AND type!="response"
5. Further, such events will have same earliestTime and latestTime as there is no response.
now()-earliestTime has been used to get the time duration between request received and current time. So that we can filter only request received older than 10 seconds.
| makeresults | eval data="Time=\"2018/07/31 01:00:00\" some request transaction_id=1234;Time=\"2018/07/31 01:00:10\" some response original_transaction_id=1234;Time=\"2018/07/31 01:10:00\" some request transaction_id=4567;Time=\"2018/07/31 01:10:20\" some response original_transaction_id=4567;Time=\"2018/07/31 02:00:00\" some request transaction_id=8910;" | makemv data delim=";" | mvexpand data | rename data as _raw | KV | eval _time=strptime(Time,"%Y/%m/%d %H:%M:%S") | fields - Time | append [| makeresults | eval _time=_time-9 | eval _raw="some request transaction_id=1112" | KV] | eval tid=coalesce(transaction_id, original_transaction_id) | eval type=case(searchmatch("request"),"request",searchmatch("response"),"response",true(),"N/A") | stats list(type) as type min(_time) as earliestTime max(_time) as latestTime by tid | search type="request" AND type!="response" | eval requestTimeDuration=now()-earliestTime | where requestTimeDuration>10 | fieldformat earliestTime=strftime(earliestTime,"%Y/%m/%d %H:%M:%S") | fieldformat latestTime=strftime(latestTime,"%Y/%m/%d %H:%M:%S")
PS: fieldformat has been applied to convert epoch time to human readable string time format for earliest and latest time.
If you need events where
response was received however,
duration took longer than 10 seconds you search filter can be changed to the following:
| search type="request" AND type="response"` | eval duratino=latestTime-earliestTime | where duration>10
This is brilliant @niketnilay I've just tested out your recommendation and it appears to work perfectly. I also learnt a lot about stats and sample data generation at the same time. Thank you very much!
@stewartevans I am glad you found it useful. I have learnt these things by hanging out here on Splunk Answers 🙂 Now you need to "pass on" the knowledge.
The link that I provided is by Nick Mealy's and his flowchart for deciding event grouping and correlation is epic 🙂 There are more commands that have been introduced like union in Splunk 6.6 and previously undocumented gem multisearch. They would eventually be documented in above flowchart as well.