Splunk Search

Looking for a License Usage query to generate external scripted alerts with

awurster
Contributor

hi guys i'm looking for help around license usage.

i'm trying to troubleshoot a license violation we had recently where some of our warnings went unnoticed, because they had no context and were running at the incorrect time of day. i want to rewrite our searches now to be more robust / contextual - and the DMC searches either doesn't cut it for us, or i'm not sure how to effectively read / use them.

honestly, license usage in Splunk's a bit of a mess now (IMHO) - some searches are using rest, some "dmc" and others still using "internal" index with license logs... so i'm a bit lost. there's also a lot of time modifiers and joining of searches in the "stock" reports which seem to be overkill or are just confusing me entirely so i can't get what i need.

that said, i got close with the following search, but the calculated eval fields aren't showing up. earliest is -31d@d and latest is @d

index=_internal source=*license_usage.log type="RolloverSummary"
  | bin _time span=1d
  | stats sum(b) AS used max(stacksz) AS quota by _time
  | where used > quota
  | eval usedGB=round(used/1024/1024/1024,3) 
  | eval quotaGB=round(quota/1024/1024/1024,3)
  | table _time usedGB quotaGB
  | eval percentage=round(usedGB / totalGB, 1) * 100
  | eval usage = usedGB . " (" . percentage . "%)"
  | fields _time usedGB quotaGB usage

i want to have two flavors for my alerts:

  1. a violation "notice" that between 1 and 3 notices have occurred (we can group them on the receiving end)
  2. an outage is about to happen / has happened, with between 4 and 5 violations

both reports should be pretty similar, so i'd like to meet the following requirements for both:

  1. run as close to the "splunk" calculations as possible (should i run this at 12:30 AM, 1 AM, etc?), to avoid a lag or incorrect calculation
  2. include instance name (i.e. search head name)
  3. include pool / stack size
  4. include a usage percentage (i.e used GB / quota GB)
  5. have a count tally over the 30 day period to see how many violations i've had in that rolling period
1 Solution

awurster
Contributor

OK so i've broken this into a few different alerts to get what i need. hopefully as we scale out to using different license pools, this will translate nicely.

1 - a twice hourly warning if at any point in the day, we go above 90% utilisation.

| rest splunk_server_group=dmc_group_license_master /services/licenser/pools
| join type=outer stack_id splunk_server [rest splunk_server_group=dmc_group_license_master /services/licenser/groups | search is_active=1 | eval stack_id=stack_ids | fields splunk_server stack_id is_active] 
| search is_active=1 
| fields splunk_server, stack_id, used_bytes 
| join type=outer stack_id splunk_server [rest splunk_server_group=dmc_group_license_master /services/licenser/stacks | eval stack_id=title | eval stack_quota=quota | fields splunk_server stack_id stack_quota] 
| stats sum(used_bytes) as used_bytes max(stack_quota) as stack_quota by splunk_server 
| eval usedGB=round(used_bytes/1024/1024/1024,1) 
| eval totalGB=round(stack_quota/1024/1024/1024,1) 
| eval percentage=round(usedGB / totalGB, 3)*100 
| fields splunk_server, stack_id, percentage, usedGB, totalGB 
| where percentage > 90 
| rename splunk_server AS Instance, percentage AS "License quota used (%)", usedGB AS "License quota used (GB)", totalGB as "Total license quota (GB)"

2 - a once daily warning between 1 and 4 license violations in 30 day average (note search is "all time" because this log is only kept for 30-day average anyhow)

index=_internal source=*license_usage.log type="RolloverSummary"
  | bin _time span=1d
  | convert timeformat="%F" ctime(_time) AS date
  | stats sum(b) AS used max(stacksz) AS quota by date, pool, stack
  | eval usedGB=round(used/1024/1024/1024,3) 
  | eval quotaGB=round(quota/1024/1024/1024,3)
  | eval usedPct = round(usedGB / quotaGB, 1) * 100
  | where usedPct > 60
  | eval violation_id=1
  | eval usage = usedGB . " (" . usedPct . "%)"
  | streamstats global=f sum(violation_id) AS violations
  | fields date stack pool usedGB quotaGB usage violations
  | rename usedGB AS "used", quotaGB AS "quota"

3 - a final alert (slightly different title and severity) for the 4th (and 5th if you get there) violations. same code as above, but different counts.

as for points num 2 and 3... i can add a tail at the end to just grab the last line, so that in my alert system, i see the alerts come in one at a time. like:

...
  | fields date stack pool usedGB quotaGB usage violations
  | tail 1
  | rename usedGB AS "used", quotaGB AS "quota"
...

View solution in original post

awurster
Contributor

OK so i've broken this into a few different alerts to get what i need. hopefully as we scale out to using different license pools, this will translate nicely.

1 - a twice hourly warning if at any point in the day, we go above 90% utilisation.

| rest splunk_server_group=dmc_group_license_master /services/licenser/pools
| join type=outer stack_id splunk_server [rest splunk_server_group=dmc_group_license_master /services/licenser/groups | search is_active=1 | eval stack_id=stack_ids | fields splunk_server stack_id is_active] 
| search is_active=1 
| fields splunk_server, stack_id, used_bytes 
| join type=outer stack_id splunk_server [rest splunk_server_group=dmc_group_license_master /services/licenser/stacks | eval stack_id=title | eval stack_quota=quota | fields splunk_server stack_id stack_quota] 
| stats sum(used_bytes) as used_bytes max(stack_quota) as stack_quota by splunk_server 
| eval usedGB=round(used_bytes/1024/1024/1024,1) 
| eval totalGB=round(stack_quota/1024/1024/1024,1) 
| eval percentage=round(usedGB / totalGB, 3)*100 
| fields splunk_server, stack_id, percentage, usedGB, totalGB 
| where percentage > 90 
| rename splunk_server AS Instance, percentage AS "License quota used (%)", usedGB AS "License quota used (GB)", totalGB as "Total license quota (GB)"

2 - a once daily warning between 1 and 4 license violations in 30 day average (note search is "all time" because this log is only kept for 30-day average anyhow)

index=_internal source=*license_usage.log type="RolloverSummary"
  | bin _time span=1d
  | convert timeformat="%F" ctime(_time) AS date
  | stats sum(b) AS used max(stacksz) AS quota by date, pool, stack
  | eval usedGB=round(used/1024/1024/1024,3) 
  | eval quotaGB=round(quota/1024/1024/1024,3)
  | eval usedPct = round(usedGB / quotaGB, 1) * 100
  | where usedPct > 60
  | eval violation_id=1
  | eval usage = usedGB . " (" . usedPct . "%)"
  | streamstats global=f sum(violation_id) AS violations
  | fields date stack pool usedGB quotaGB usage violations
  | rename usedGB AS "used", quotaGB AS "quota"

3 - a final alert (slightly different title and severity) for the 4th (and 5th if you get there) violations. same code as above, but different counts.

as for points num 2 and 3... i can add a tail at the end to just grab the last line, so that in my alert system, i see the alerts come in one at a time. like:

...
  | fields date stack pool usedGB quotaGB usage violations
  | tail 1
  | rename usedGB AS "used", quotaGB AS "quota"
...
Get Updates on the Splunk Community!

Splunk Lantern | Spotlight on Security: Adoption Motions, War Stories, and More

Splunk Lantern is a customer success center that provides advice from Splunk experts on valuable data ...

Splunk Cloud | Empowering Splunk Administrators with Admin Config Service (ACS)

Greetings, Splunk Cloud Admins and Splunk enthusiasts! The Admin Configuration Service (ACS) team is excited ...

Tech Talk | One Log to Rule Them All

One log to rule them all: how you can centralize your troubleshooting with Splunk logs We know how important ...