hi guys i'm looking for help around license usage.
i'm trying to troubleshoot a license violation we had recently where some of our warnings went unnoticed, because they had no context and were running at the incorrect time of day. i want to rewrite our searches now to be more robust / contextual - and the DMC searches either doesn't cut it for us, or i'm not sure how to effectively read / use them.
honestly, license usage in Splunk's a bit of a mess now (IMHO) - some searches are using rest, some "dmc" and others still using "internal" index with license logs... so i'm a bit lost. there's also a lot of time modifiers and joining of searches in the "stock" reports which seem to be overkill or are just confusing me entirely so i can't get what i need.
that said, i got close with the following search, but the calculated eval fields aren't showing up. earliest is
-31d@d and latest is
index=_internal source=*license_usage.log type="RolloverSummary" | bin _time span=1d | stats sum(b) AS used max(stacksz) AS quota by _time | where used > quota | eval usedGB=round(used/1024/1024/1024,3) | eval quotaGB=round(quota/1024/1024/1024,3) | table _time usedGB quotaGB | eval percentage=round(usedGB / totalGB, 1) * 100 | eval usage = usedGB . " (" . percentage . "%)" | fields _time usedGB quotaGB usage
i want to have two flavors for my alerts:
both reports should be pretty similar, so i'd like to meet the following requirements for both:
OK so i've broken this into a few different alerts to get what i need. hopefully as we scale out to using different license pools, this will translate nicely.
1 - a twice hourly warning if at any point in the day, we go above 90% utilisation.
| rest splunk_server_group=dmc_group_license_master /services/licenser/pools | join type=outer stack_id splunk_server [rest splunk_server_group=dmc_group_license_master /services/licenser/groups | search is_active=1 | eval stack_id=stack_ids | fields splunk_server stack_id is_active] | search is_active=1 | fields splunk_server, stack_id, used_bytes | join type=outer stack_id splunk_server [rest splunk_server_group=dmc_group_license_master /services/licenser/stacks | eval stack_id=title | eval stack_quota=quota | fields splunk_server stack_id stack_quota] | stats sum(used_bytes) as used_bytes max(stack_quota) as stack_quota by splunk_server | eval usedGB=round(used_bytes/1024/1024/1024,1) | eval totalGB=round(stack_quota/1024/1024/1024,1) | eval percentage=round(usedGB / totalGB, 3)*100 | fields splunk_server, stack_id, percentage, usedGB, totalGB | where percentage > 90 | rename splunk_server AS Instance, percentage AS "License quota used (%)", usedGB AS "License quota used (GB)", totalGB as "Total license quota (GB)"
2 - a once daily warning between 1 and 4 license violations in 30 day average (note search is "all time" because this log is only kept for 30-day average anyhow)
index=_internal source=*license_usage.log type="RolloverSummary" | bin _time span=1d | convert timeformat="%F" ctime(_time) AS date | stats sum(b) AS used max(stacksz) AS quota by date, pool, stack | eval usedGB=round(used/1024/1024/1024,3) | eval quotaGB=round(quota/1024/1024/1024,3) | eval usedPct = round(usedGB / quotaGB, 1) * 100 | where usedPct > 60 | eval violation_id=1 | eval usage = usedGB . " (" . usedPct . "%)" | streamstats global=f sum(violation_id) AS violations | fields date stack pool usedGB quotaGB usage violations | rename usedGB AS "used", quotaGB AS "quota"
3 - a final alert (slightly different title and severity) for the 4th (and 5th if you get there) violations. same code as above, but different counts.
as for points num 2 and 3... i can add a
tail at the end to just grab the last line, so that in my alert system, i see the alerts come in one at a time. like:
... | fields date stack pool usedGB quotaGB usage violations | tail 1 | rename usedGB AS "used", quotaGB AS "quota" ...