hi guys i'm looking for help around license usage.
i'm trying to troubleshoot a license violation we had recently where some of our warnings went unnoticed, because they had no context and were running at the incorrect time of day. i want to rewrite our searches now to be more robust / contextual - and the DMC searches either doesn't cut it for us, or i'm not sure how to effectively read / use them.
honestly, license usage in Splunk's a bit of a mess now (IMHO) - some searches are using rest, some "dmc" and others still using "internal" index with license logs... so i'm a bit lost. there's also a lot of time modifiers and joining of searches in the "stock" reports which seem to be overkill or are just confusing me entirely so i can't get what i need.
that said, i got close with the following search, but the calculated eval fields aren't showing up. earliest is -31d@d
and latest is @d
index=_internal source=*license_usage.log type="RolloverSummary"
| bin _time span=1d
| stats sum(b) AS used max(stacksz) AS quota by _time
| where used > quota
| eval usedGB=round(used/1024/1024/1024,3)
| eval quotaGB=round(quota/1024/1024/1024,3)
| table _time usedGB quotaGB
| eval percentage=round(usedGB / totalGB, 1) * 100
| eval usage = usedGB . " (" . percentage . "%)"
| fields _time usedGB quotaGB usage
i want to have two flavors for my alerts:
both reports should be pretty similar, so i'd like to meet the following requirements for both:
OK so i've broken this into a few different alerts to get what i need. hopefully as we scale out to using different license pools, this will translate nicely.
1 - a twice hourly warning if at any point in the day, we go above 90% utilisation.
| rest splunk_server_group=dmc_group_license_master /services/licenser/pools
| join type=outer stack_id splunk_server [rest splunk_server_group=dmc_group_license_master /services/licenser/groups | search is_active=1 | eval stack_id=stack_ids | fields splunk_server stack_id is_active]
| search is_active=1
| fields splunk_server, stack_id, used_bytes
| join type=outer stack_id splunk_server [rest splunk_server_group=dmc_group_license_master /services/licenser/stacks | eval stack_id=title | eval stack_quota=quota | fields splunk_server stack_id stack_quota]
| stats sum(used_bytes) as used_bytes max(stack_quota) as stack_quota by splunk_server
| eval usedGB=round(used_bytes/1024/1024/1024,1)
| eval totalGB=round(stack_quota/1024/1024/1024,1)
| eval percentage=round(usedGB / totalGB, 3)*100
| fields splunk_server, stack_id, percentage, usedGB, totalGB
| where percentage > 90
| rename splunk_server AS Instance, percentage AS "License quota used (%)", usedGB AS "License quota used (GB)", totalGB as "Total license quota (GB)"
2 - a once daily warning between 1 and 4 license violations in 30 day average (note search is "all time" because this log is only kept for 30-day average anyhow)
index=_internal source=*license_usage.log type="RolloverSummary"
| bin _time span=1d
| convert timeformat="%F" ctime(_time) AS date
| stats sum(b) AS used max(stacksz) AS quota by date, pool, stack
| eval usedGB=round(used/1024/1024/1024,3)
| eval quotaGB=round(quota/1024/1024/1024,3)
| eval usedPct = round(usedGB / quotaGB, 1) * 100
| where usedPct > 60
| eval violation_id=1
| eval usage = usedGB . " (" . usedPct . "%)"
| streamstats global=f sum(violation_id) AS violations
| fields date stack pool usedGB quotaGB usage violations
| rename usedGB AS "used", quotaGB AS "quota"
3 - a final alert (slightly different title and severity) for the 4th (and 5th if you get there) violations. same code as above, but different counts.
as for points num 2 and 3... i can add a tail
at the end to just grab the last line, so that in my alert system, i see the alerts come in one at a time. like:
...
| fields date stack pool usedGB quotaGB usage violations
| tail 1
| rename usedGB AS "used", quotaGB AS "quota"
...
OK so i've broken this into a few different alerts to get what i need. hopefully as we scale out to using different license pools, this will translate nicely.
1 - a twice hourly warning if at any point in the day, we go above 90% utilisation.
| rest splunk_server_group=dmc_group_license_master /services/licenser/pools
| join type=outer stack_id splunk_server [rest splunk_server_group=dmc_group_license_master /services/licenser/groups | search is_active=1 | eval stack_id=stack_ids | fields splunk_server stack_id is_active]
| search is_active=1
| fields splunk_server, stack_id, used_bytes
| join type=outer stack_id splunk_server [rest splunk_server_group=dmc_group_license_master /services/licenser/stacks | eval stack_id=title | eval stack_quota=quota | fields splunk_server stack_id stack_quota]
| stats sum(used_bytes) as used_bytes max(stack_quota) as stack_quota by splunk_server
| eval usedGB=round(used_bytes/1024/1024/1024,1)
| eval totalGB=round(stack_quota/1024/1024/1024,1)
| eval percentage=round(usedGB / totalGB, 3)*100
| fields splunk_server, stack_id, percentage, usedGB, totalGB
| where percentage > 90
| rename splunk_server AS Instance, percentage AS "License quota used (%)", usedGB AS "License quota used (GB)", totalGB as "Total license quota (GB)"
2 - a once daily warning between 1 and 4 license violations in 30 day average (note search is "all time" because this log is only kept for 30-day average anyhow)
index=_internal source=*license_usage.log type="RolloverSummary"
| bin _time span=1d
| convert timeformat="%F" ctime(_time) AS date
| stats sum(b) AS used max(stacksz) AS quota by date, pool, stack
| eval usedGB=round(used/1024/1024/1024,3)
| eval quotaGB=round(quota/1024/1024/1024,3)
| eval usedPct = round(usedGB / quotaGB, 1) * 100
| where usedPct > 60
| eval violation_id=1
| eval usage = usedGB . " (" . usedPct . "%)"
| streamstats global=f sum(violation_id) AS violations
| fields date stack pool usedGB quotaGB usage violations
| rename usedGB AS "used", quotaGB AS "quota"
3 - a final alert (slightly different title and severity) for the 4th (and 5th if you get there) violations. same code as above, but different counts.
as for points num 2 and 3... i can add a tail
at the end to just grab the last line, so that in my alert system, i see the alerts come in one at a time. like:
...
| fields date stack pool usedGB quotaGB usage violations
| tail 1
| rename usedGB AS "used", quotaGB AS "quota"
...