Hi everybody
Can you help me and suggest me a solution about my context.
I have many servers Jboss, and each server host many instance/JVM (1 to 4).
Example a server jboss_server1 hosts 2 instances jb_instance1 and jb_instance2.
An instance is identified by a search time field JBOSS_INSTANCE.
I would like scheduled a search to monitor the daily volume log for each instance and trigger an alert when the volume exceed a daily quota (300 MB for example)
To do this I write a search :
index="jboss" earliest=-0d@d [search index=_internal source=*license_usage.log type=Usage earliest=-0d@d | stats sum(b) as bytes by h | rename h as host | where (bytes/1024/1024)>300
| fields - bytes] NOT ([|inputlookup high_volume_jboss | fields JBOSS_INSTANCE]) | fields + host,CTX,JBOSS_INSTANCE | eval raw_len=len(_raw) | stats sum(raw_len) As TotalSize by host,CTX,JBOSS_INSTANCE | eval TotalSizeMB=round(TotalSize/1024/1024,2),
quotaMB=250,newLogLevel="ERROR"| where TotalSizeMB > 250
I make search with a join on the license_usage.log to match the host which log more than 300MB (don't forget i want sizing by instance not by host)
I make a second filter to exclude instance that is matched in a input lookup containing pair host/instance matched by a precedent search in the day
I calculate the size of each raw matched
But I think this is not optimized. What do you think ?
I do charge back for Splunk at $WORK. This is the process I've developed to present the information to the respective data owners and to my administration team. I have a lookup table which contains the name of the index with the expected daily ingestion (a.k.a. "what they paid for"). The lookup table looks like this:
"daily_limit",idx
,"_audit"
,"_internal"
,"_introspection"
"0.002",bidata
,"anomaly_detection"
"1.5",app
1,array
"1.4",avaya
Then I have a scheduled report run every morning to generate a report with the daily usage versus what their limits.
index=_internal source=*license_usage.log type="Usage" idx=* pool="Production" | bucket span=1d _time | stats sum(b) as b by _time idx | eval daily_usage_mb=b/1024/1024 | lookup index_limits idx OUTPUT daily_limit as daily_limit_gb | eval daily_limit_mb=daily_limit_gb*1024 | eval warning=daily_limit_mb*.8 | eval Alert=case(daily_usage_mb>=daily_limit_mb, "1 overage", daily_usage_mb>=warning, "2 warning", 1==1, "3 ok") | table idx Alert daily_limit_mb daily_usage_mb | sort Alert, idx | fieldformat daily_usage_mb=tostring(daily_usage_mb,"commas") | fieldformat daily_limit_mb=tostring(daily_limit_mb,"commas")
In order to provide the same data to the data owners, I create a summary search of the license usage that I can present to the data owners.
index=_internal source=*license_usage.log type="Usage" idx=* | eval idx=if(len(idx)=0 OR isnull(idx),"(UNKNOWN)",idx) | stats sum(b) as volumeB by idx, pool| eval volumeMB=round(b/1024/1024,3)
I also provide an alert to the data owners if their usage the previous day exceeds 80% of their usage.
index=summary search_name="Summarize Daily License Usage by Index" idx=app| eval volumeMB=round(volumeB/1024/1024,3) | lookup index_limits idx OUTPUT daily_limit | timechart span=1d sum(volumeMB) avg(eval(daily_limit*1000)) as daily_limit_MB
And finally I also provide a report to each group for them to monitor their 30 day usage against their expected daily ingestion:
index=summary search_name="Summarize Daily License Usage by Index" idx=app| eval volumeMB=round(volumeB/1024/1024,3) | lookup index_limits idx OUTPUT daily_limit | timechart span=1d sum(volumeMB) avg(eval(daily_limit*1000)) as daily_limit_MB
Hope that helps.
Jim
Awesome post. The only suggestion I have is that the daily search could be collect
ed to a summary index for fast retrieval and later analysis.
how about using the len function and not touching the license usage logs, something like this:
earliest = -0d@d index = jboss JBOSS_INSTANCE=* | eval raw =len(_raw) | eval rawMB = raw/1024/1024 | stats sum(rawMB) by JBOSS_INSTANCE
now set an alert on the threshold you want
Hi adonio
Thanks for your reply. Without license_usage the search calculates volume for all instances of all jboss and it is not i want due to performance and duration too long. With license_usage allow me to have a first filter on host where quota overflow at instance level is possible.
Another idea ?
would you like to calculate the license per host or per instance? i understood that each host contains one or more instances and you would like to check those so if a host has 2 instances and one instance use 500mb per day and the other uses 100mb per day you would like to be alerted on the instance only.
did i understand correctly?
Hi adonio
I would like calculate per instance (per host would be easier with the source license_usage.log), the alert concern only instance which exceed a quota. Yes you undestand correcly
are the logs from each JBOSS instance have a different source value?
Since you're calculating daily volume of data using length of raw data, it's going to be on slower side anyways. You may be able to setup summary index on both data to collect what you're collecting at regular interval. See this http://docs.splunk.com/Documentation/SplunkCloud/6.5.1612/Knowledge/Usesummaryindexing
absolutely, well aware of the techniques and how to use the license usage data, the challenge here as i understand it is that @pmerlin1 wants to have a report on license per instance and there are multiple instances per host.
if there are naming conventions and each instance has different source you can split your license search by s (for source) and alert on that. something like:
earliest=-1d@d latest=@d index=_internal source=*license_usage.log* type=Usage s=<yourJBOSSnamingConvention>
| stats sum(b) AS Bytes by s
| eval MB = Bytes/1024/1024
| table s MB
| sort -MB
| addcoltotals
Hi somesoni,
I will watch this may be a good approach to deal with my problem
@pmerlin1... You should check out DMC or Monitoring Console from Settings menu in Splunk.
Indexing > License Usage should allow you to look at License Pool and split the usage by host, source, sourcetype or whatever you need.
Ideally, if you haven't done so already, you should have separate license pool created for JBOSS servers so that you can monitor, ration and control license volume only for those set of servers. Queries in Monitoring Console should be already optimized and may allow you to accelerate performance further.