I am trying to write a splunk query to show what percentage of traffic is split between my on premise and cloud. My splunk index receives events from my legacy on premise and from my cloud solution. My cloud solution can scale as needed. My cloud solution has a healthcheck which just submits a request every minute to test the service. From the service / event perspective there is no way to identify a real request from a healthcheck request
When counting the percentage of traffic I want to exclude the health check requests that occur in the cloud however these look like normal requests in my data. So each host in the cloud will do a health check every minute.
If i have 10 hosts in the cloud then over 10 minutes the total requests from the cloud need to have 100 substracted from them (10 hosts * 10 minutes) to give me the true count of real traffic the host processed. I then use this to calculate the percentage of traffic that was handled by on premise and the cloud solutions. However i cant just hardcode to subtract 100 because the number of hosts in the cloud can be dynamic and scaled as the demand is needed
I have written the following splunk query which does what i want but doesnt scale well. For a 1 hour window its fine but for a 24 hour window i run into limitations (10,000 records) due to my user role in splunk. Is there a better or simpler way to write this query.
My logic here was get all the events, create a new field for each minute, sort by host and minute then count events for each host and minute. for the events that were from a cloud server, subtract 1 from the count as that host will have a healthcheck, then get the max count from each host and minute and sum them up for the given enviroment (on prem or cloud). For here i just get the total (real) requests and calculate percentages
| eval eventMin=strftime(_time,"%M")
| sort host, eventMin
| streamstats count as hostcount reset_on_change=true by host, eventMin
| eval hostcount=if(like(host,"cloudhost%"), hostcount-1, hostcount)
| stats max(hostcount) as minutecount by eventMin, host, environment
| stats sum(minutecount) as envtotal by environment
| eventstats sum(envtotal) as total
| eval traffic_split=round(envtotal/total*100, 2)
| chart values(traffic_split) by environment
I feel I am probably missing a trick here or i have over complicated it after looking at the problem so long.