Hi All , I am getting the logs from this query , But I need a query to get deviation of error count in two time periods
index="prod_k8s_onprem_dii--prod1" "k8s.namespace.name"="abc-secure-dig-servi-prod1" "k8s.container.name"="abc-cdf-cust-profile"
for this I need to consider volume of logs as well .
depending on deviation percentage I will decide , Need to promote deployment or stop the deployment
Something like this - obviously you will need to adjust it depending on your events and required time periods
index="prod_k8s_onprem_dii--prod1" "k8s.namespace.name"="abc-secure-dig-servi-prod1" "k8s.container.name"="abc-cdf-cust-profile" (earliest=first_earliest latest=first_latest) OR (earliest=second_earliest latest=second_latest)
| eval period=if(_time>=first_earliest AND _time<first_latest,"First","Second")
| stats count(eval(status="Error")) as error_count count as event_count by period
we are doing some API/app deployment in one region at 12: 00 PM EST ,
the 1 st time frame would be 11:30 AM to 12:00 PM EST ( I need to get the error count )
the 2nd time frame would be 12:00 PM to 12:30 PM EST( need to get error count )
we need to consider generated log volume as well .
and get the deviation on the error count on these two time frames .
let's say , if it exceeds certain threshold , I will further proceed /stop the deployment .
so the out put of query is deviation threshold or percentage
That's better. So, you are looking at adjacent, and equal time intervals. In this case, time bucket is perhaps the simplest. Let me first give you a hard-coded example.
index="prod_k8s_onprem_dii--prod1" "k8s.namespace.name"="abc-secure-dig-servi-prod1" "k8s.container.name"="abc-cdf-cust-profile" (earliest="07/25/2024:11:30:00" latest="07/25/2024:12:30:00")
| addinfo
| bin _time span=30m@m
| stats count(eval(status="Error")) as error_count by _time
| eventstats stdev(error_count)Is this something you are looking for?
index="prod_k8s_onprem_dii--prod1" "k8s.namespace.name"="abc-secure-dig-servi-prod1" "k8s.container.name"="abc-cdf-cust-profile" (earliest=first_earliest latest=first_latest) OR (earliest=second_earliest latest=second_latest)
| eval period=if(_time>=first_earliest AND _time<first_latest,"First","Second")
| stats count(eval(status="Error")) as error_count count as event_count by period
why I am not getting any results , i see there are events
index="prod_k8s_onprm_dig-k8-prod1" "k8s.namespace.name"="apl-secure-dig-svc-prod1" "k8s.container.name"="abc-def-cust-prof" NOT k8s.container.name=istio-proxy NOT log.level IN(DEBUG,INFO) (error OR exception)(earliest="07/25/2024:11:30:00" latest="07/25/2024:12:30:00")
| addinfo
| bin _time span=30m@m
| stats count(eval(log.level="ERROR")) as error_count by _time
| eventstats stdev(error_count)
This is because you have a multisegment data path and eval doesn't like it. Use a single quote to tell eval log.level is a field name not some random string.
index="prod_k8s_onprm_dig-k8-prod1" "k8s.namespace.name"="apl-secure-dig-svc-prod1" "k8s.container.name"="abc-def-cust-prof" NOT k8s.container.name=istio-proxy NOT log.level IN(DEBUG,INFO) (error OR exception)(earliest="07/25/2024:11:30:00" latest="07/25/2024:12:30:00")
| addinfo
| bin _time span=30m@m
| stats count(eval('log.level'="ERROR")) as error_count by _time
| eventstats stdev(error_count)
Thanks , I am able to get error count now , could you please let me know how to get this value in python code .if I run the code I am getting events instead of statistics , How to get statitics in the code
payload=f'search index="prod_k8s_onprem_vvvb_nnnn" "k8s.namespace.name"="apl-siii-iiiii" "k8s.container.name"="uuuu-dss-prog" NOT k8s.container.name=istio-proxy NOT log.level IN(DEBUG,INFO) (error OR exception)(earliest="07/25/2024:11:30:00" latest="07/25/2024:12:30:00")\n'
'| addinfo\n'
'| bin _time span=5m@m\n'
'| stats count(eval(log.level="ERROR")) as error_count by _time\n'
'| eventstats stdev(error_count)'
print(payload)
payload_escaped = f'search={urllib.parse.quote(payload)}'
headers = {
'Authorization': f'Bearer {splunk_token}',
'Content-Type': 'application/x-www-form-urlencoded'
}
url = f'https://{splunk_host}:{splunk_port}/services/search/jobs/export?output_mode=json'
response = requests.request("POST", url, headers=headers, data=payload_escaped, verify=False)
print(f'{response.status_code=}')
txt = response.text
if response.status_code==200:
json_txt = f'[\n{txt}]'
os.makedirs('data', exist_ok=True)
with open("data/output_deploy.json", "w") as f:
f.write(json_txt)
f.close()
else:
print(txt)
You need to tell volunteers what kind of "two time frames" are you concerned about. Two adjacent, equal time intervals? Two equal intervals days apart? Or some random intervals?
we are doing some API/app deployment in one region at 12: 00 PM EST ,
the 1 st time frame would be 11:30 AM to 12:00 PM EST ( I need to get the error count )
the 2nd time frame would be 12:00AM to 12:30 PM EST( need to get error count )
we need to consider generated log volume as well .
and get the deviation on the error count on these two time frames .
let's say , if it exceeds certain threshold , I will further proceed /stop the deployment .
so the out put of query is deviation threshold or percentage