Hi,
I have a customer who is challenging the license numbers being reported by Splunk for his hosts. Is there a way to actually count the number of bytes for all of his events over a time period?
Depending on the amount of data and what parts of Splunk's internal counting you trust or mistrust, there are several approaches.
In any case, you're probably comparing against the license usage view, so Settings -> License or something like that on your license master. That's nice visually, but underneath there is anactual log you want to look at: index=_internal source=*license_usage* component=LicenseUsage
type=RolloverSummary
has daily summaries, that's what is displayed in the 30-day view by default.
type=Usage
has detailed usage on 30-second intervals iirc, that's displayed in the 30-day view if you split by some of the more specific fields.
Assuming the view itself isn't broken and is reporting that log correctly, you'll want to compare other sources of information against that log. You can use short timespans and compare with Usage over that span, or whole days and compare with RolloverSummary, or both.
As for other sources of info, here are a few ideas.
brute force
Pick a suspicious, high-volume set of data like a certain sourcetype or index, and pipe it through | eval length = length(_raw) | stats sum(length)
, then compare that number to the data you get from license usage logging. The search may be unfeasible for larger sets of data, but should be most precise.
metrics logging
In the internal index there's also source=*metrics*
that provides a second source of data. It won't always line up with licensing, but for your larger indexes, sourcetypes, etc. the group=per_index_thruput
or group=per_sourcetype_thruput
should be pretty good data.
For a twist, forwarders also log metrics... but getting the right metric from the right set of forwarders can be tricky. I'd recommend starting with indexer metrics logs.
You might ask "but that's splunk counting, I don't trust splunk's counting!" ... well yeah, but assuming the license counter had a bug in your case, the metrics counter might not have that same bug.
dbinspect
This should be the fastest, least-splunk-counter-dependent way... but also the least accurate. When you run | dbinspect index=foo
you get a rawSize
field for each bucket. If you have an index that has never had buckets rolled out and you have license usage log data for the entire life of the index, comparing the total rawSize
should line up with the total license usage for that index.
Finally, make sure you're comparing apples with apples.
Have you looked at all indexers connected to your license master?
Have you looked at all data for the to-be-compared time range? There could be data for yesterday coming in today, so it'll be timerange-sorted into yesterday but license-reported into today.
Cross referencing similar post: https://answers.splunk.com/answers/476227/help-with-license-validation.html
Depending on the amount of data and what parts of Splunk's internal counting you trust or mistrust, there are several approaches.
In any case, you're probably comparing against the license usage view, so Settings -> License or something like that on your license master. That's nice visually, but underneath there is anactual log you want to look at: index=_internal source=*license_usage* component=LicenseUsage
type=RolloverSummary
has daily summaries, that's what is displayed in the 30-day view by default.
type=Usage
has detailed usage on 30-second intervals iirc, that's displayed in the 30-day view if you split by some of the more specific fields.
Assuming the view itself isn't broken and is reporting that log correctly, you'll want to compare other sources of information against that log. You can use short timespans and compare with Usage over that span, or whole days and compare with RolloverSummary, or both.
As for other sources of info, here are a few ideas.
brute force
Pick a suspicious, high-volume set of data like a certain sourcetype or index, and pipe it through | eval length = length(_raw) | stats sum(length)
, then compare that number to the data you get from license usage logging. The search may be unfeasible for larger sets of data, but should be most precise.
metrics logging
In the internal index there's also source=*metrics*
that provides a second source of data. It won't always line up with licensing, but for your larger indexes, sourcetypes, etc. the group=per_index_thruput
or group=per_sourcetype_thruput
should be pretty good data.
For a twist, forwarders also log metrics... but getting the right metric from the right set of forwarders can be tricky. I'd recommend starting with indexer metrics logs.
You might ask "but that's splunk counting, I don't trust splunk's counting!" ... well yeah, but assuming the license counter had a bug in your case, the metrics counter might not have that same bug.
dbinspect
This should be the fastest, least-splunk-counter-dependent way... but also the least accurate. When you run | dbinspect index=foo
you get a rawSize
field for each bucket. If you have an index that has never had buckets rolled out and you have license usage log data for the entire life of the index, comparing the total rawSize
should line up with the total license usage for that index.
Finally, make sure you're comparing apples with apples.
Have you looked at all indexers connected to your license master?
Have you looked at all data for the to-be-compared time range? There could be data for yesterday coming in today, so it'll be timerange-sorted into yesterday but license-reported into today.
This is spot on. Summarizing: You have bytes (b) broken down by source (s), sourcetype (st), host (h), etc... in the license_usage.log when type=Usage. It's possible that those get collapsed if usage is too intense. So as @martin_mueller points out, there are other data points that can be used OR go to the data itself with an eval size = len(_raw)
.
If these numbers add up and the customer is still pushing back, feel free to engage the account team from Splunk for assistance.
Gooooood info. Thanks!
Well you could use Splunk to count it for you, but if you suspect Splunk is reporting it incorrectly then this may not solve your problem.. You will have to go and tally up the file sizes on the servers between a specific time period and sum them up
I'd rather go the route of having Splunk count it, at least initially. How can I do that?
You can get an approximate count of character length in events by doing average of len(_raw) over a small set of events and then multiply the average length with the total eventcount using |tstats and convert your # of characters into bytes, MB and so on.. again this is clearly an approximation but gives you fair idea