Good day everyone,
I have an issue where my license usage seems to have shot through the roof and I'm receiving violations. While investigating the matter, trying to determine where the increase is, I came accross this search:
index=_internal source=*metrics.log group="per_index_thruput"
| eval MB=kb/1024
| chart sum(MB) by series
| sort - sum(MB)
I run this search on "Today"
This shows me how much data was indexed in each individual index. Problem is, when I add them all up there is still a discrepancy between the metrics log and the Splunk licensing page... by about 25% (I'm missing about 5.5GB)
Which logs should I trust? Where is the licensing page getting it's data from?
Update: Even when I sum everything for every host with this search:
index=_internal source=*metrics.log group="per_host_thruput"
| eval MB=kb/1024
| stats sum(MB)
I still get a number that's about 5.5GB less than what the licensing page is reporting.
I've tried "per_host_thruput" and "per_index_thruput" and they all give me more or less the same amount, but nowhere close to our licensing volume...
This is totally true, metrics.log does not show the license usage, for 2 reasons :
- the metrics are a top 10 sample. (top 10 indexes, top 10 source, top 10 sourcetypes, top 10 hosts...)
- the metrics do not distinguish the events counted on the license from the others. (internal logs, summary data ...)
If you want to measure your license usage, use license_usage.log
see http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume
This is totally true, metrics.log does not show the license usage, for 2 reasons :
- the metrics are a top 10 sample. (top 10 indexes, top 10 source, top 10 sourcetypes, top 10 hosts...)
- the metrics do not distinguish the events counted on the license from the others. (internal logs, summary data ...)
If you want to measure your license usage, use license_usage.log
see http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume
this is a classic mistake, we should rename metrics.log to almost_metrics.log 🙂
This is the correct answer. I had no idea that metrics.log only take a "Top 10" look at things by default. This would explain a lot
How many indexes and hosts do you have? One thing about the metrics log and the per_*_thruput data is that it only logs the most active hosts/indexes/sourcetypes/sources during each little time period. So if you have a ton of hosts and indexes, it could be systematically failing to report a long tail. It would admittedly be an odd coincidence if each discrepancy was 5.5GB, but it's one idea for you.