I have the following search that I'd like to schedule to run after changes. The goal is to detect a change in success rate by URL in a time that I define to be "pre-change window" versus a time that I define to be "post-change window". The below search would be for a change window happened 15 mins ago, and the pre-change time to compare it to was between 46 and 31 minutes ago as a control.
index=mywebservice sourcetype=access_custom earliest=-15m@m latest=now
| fillnull value=filled http_status_code
| eval post="1"
| eval success_post = case(match(http_status_code, "^(filled|2.*|3.*)$"), "1", true(), "0")
| stats sum(post) as "txns_post_window", sum(success_post) as "pct_success_post_window" by url
| eval pct_success_post_window = round((pct_success_post_window / txns_post_window) * 100,2)
| appendcols
[ search index=mywebservice sourcetype=access_custom earliest=-46m@m latest=-31m@m
| fillnull value=filled http_status_code
| eval pre="1"
| eval success_pre = case(match(http_status_code, "^(filled|2.*|3.*)$"), "1", true(), "0")
| stats sum(pre) as "txns_pre_window", sum(success_pre) as "pct_success_pre_window" by url
| eval pct_success_pre_window = round((pct_success_pre_window / txns_pre_window) * 100,2)]
| where (pct_success_post_window < pct_success_pre_window) AND (pct_success_post_window < 90) AND txns_post_window > 10 AND txns_pre_window > 10
| sort pct_success_post_window
The problem that I'm encountering here is that I'm getting incorrect data from the search within the appendcols subsearch. When I run the searches individually, they work as expected. With the appendcols, usually the "pre-window" data comes back as all ~100% success rate. The counts of transactions can be wildly off too. I have some suspicion that data that I did not intend is somehow making its way into this subsearch.
How can I modify this search to produce correct data?
I'm still not sure why the first search doesn't work but I produced a solution:
index=mywebservice sourcetype=access_custom earliest=-4h@m latest=now
| fillnull value=filled http_status_code
| eval secondsAgo=now() - _time
| eval post=case(secondsAgo <= (60 * 30), 1)
| eval prev=case(secondsAgo >= (60 * 120), 1)
| eval post_success = case(match(http_status_code, "^(filled|2.*|3.*)$") AND post==1, 1)
| eval prev_success = case(match(http_status_code, "^(filled|2.*|3.*)$") AND prev==1, 1)
| stats sum(prev) as "txns_pre_window", sum(prev_success) as "pct_success_pre_window", sum(post) as "txns_post_window", sum(post_success) as "pct_success_post_window" by url
| eval pct_success_post_window = round((pct_success_post_window / txns_post_window) * 100,2)
| eval pct_success_pre_window = round((pct_success_pre_window / txns_pre_window) * 100,2)
| where (pct_success_post_window < (pct_success_pre_window-20)) AND (pct_success_post_window < 75) AND txns_post_window > 10 AND txns_pre_window > 10
| sort pct_success_post_window
I'm still not sure why the first search doesn't work but I produced a solution:
index=mywebservice sourcetype=access_custom earliest=-4h@m latest=now
| fillnull value=filled http_status_code
| eval secondsAgo=now() - _time
| eval post=case(secondsAgo <= (60 * 30), 1)
| eval prev=case(secondsAgo >= (60 * 120), 1)
| eval post_success = case(match(http_status_code, "^(filled|2.*|3.*)$") AND post==1, 1)
| eval prev_success = case(match(http_status_code, "^(filled|2.*|3.*)$") AND prev==1, 1)
| stats sum(prev) as "txns_pre_window", sum(prev_success) as "pct_success_pre_window", sum(post) as "txns_post_window", sum(post_success) as "pct_success_post_window" by url
| eval pct_success_post_window = round((pct_success_post_window / txns_post_window) * 100,2)
| eval pct_success_pre_window = round((pct_success_pre_window / txns_pre_window) * 100,2)
| where (pct_success_post_window < (pct_success_pre_window-20)) AND (pct_success_post_window < 75) AND txns_post_window > 10 AND txns_pre_window > 10
| sort pct_success_post_window