All Apps and Add-ons

Accurate License usage when data is SQUASHED per host

plarsenDST
Explorer

Having a hard time getting accurate license usage when data is squashed. Is there a way to ensure an accurate measurement?
The numbers are not matching with windows data collection to the perfmon index. Reports only show a few megabytes per host but the index is 10gb daily. I assume the inaccuracy can be blamed on the data squashing of the license usage. If there anyway around this?

I found another post to use throughput as check but it seems you have to divide that number by 3 as an approximation and is still not an a curate measurement of license usage.

Throughput check one host
index="_internal" source="*metrics.log" group="per_host_thruput" host=HOSTHERE | chart sum(kb) by series

License usage for one host
index=_internal source="*license_usage.log" type=usage h=HOSTHERE | eval MB = round(b/1048576,2) | eval st_idx = st.": ".idx | timechart span=1d sum(MB) by st_idx | addtotals

acharlieh
Influencer

Unfortunately there's not really a good answer here if you want host based reporting of license usage when you hit the squash_threshold.

You can increase the squash_threshold in your indexers` server.conf files, but the tradeoff is of course that you're retaining more tuples of information during periodic reporting and therefore more overhead on your indexers before they dump their data to the license master. (There was a bug where this setting didn't work for a some versions: https://answers.splunk.com/answers/403171/higher-value-of-squash-threshold-no-longer-making.html but that should be resolved now )

Even if you hit the squash_threshold, your license usage by sourcetype, index, and indexer will be accurate. Only source and host are dropped from the reporting dimensions. (Depending on your index / indexer layout, this may be sufficient for your needs).

Searching for per_X_thruput is of course rather interesting... as it'll report the top 10 series per reporting period, of course host=HOSTHERE means you're getting the thruput off of your forwarder (it'll include any data dropped on the indexer), but host=INDEXER* series=HOSTHERE will only be included if HOSTHERE is in the top 10 hosts for that reporting period... and both of these include _internal logs which do not count toward your license.

A brute force method is of course:

index=* host=HOSTHERE sourcetype!=stash | eval size=len(_raw) | stats sum(size) by sourcetype index

This query kinda sucks when you have a lot of events, and it's making 2 assumptions 1) is that your event timestamp _time is being parsed properly and is close enough to your _indextime (as license usage is done based on _indextime) otherwise you need to search all time (which is awful for sizable indexes) and use _index_earliest and _index_latest to constrain your index time... 2) That all your events are using characters that are represented by single bytes in UTF-8. len gives you the number of characters in a string not bytes, so you wind up with a lower bound of license usage by this method.

.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!