Ok, I am working to trim back some of our indexed data. I initially tried to drill down using a basic sum(len(_raw) for all index broken down by various other fields.
The problem is that the sum counts dont match the counts when compared to Splunk license usage for the index.
In this specific test case, I am comparing the Splunk license usage for ONE index for ONE day. I compare it to the byte sum of all of the _raw records for that SAME index for the SAME ONE day. . .
I expected the counts to at least be similar. . .
My query from a Splunk source to get license info. . .
index=_internal sourcetype=splunkd source=*license_usage.log [| rest splunk_server_group=dmc_group_indexer /services/server/info | rename guid AS i | fields i ] | eval gb=b/1024/1024/1024 | join i [|rest splunk_server_group=dmc_group_indexer /services/server/info | rename guid AS i | fields serverName i] | search serverName=*rtp* idx=xyzzy_logs | stats sum(gb) by serverName idx
...yields between 50gb to 53gb per indexer for that ONE index for that ONE day.
index=xyzzy_logs splunk_server=*rtp* | eval leng=len(_raw)/1024/1024/1024 | stats sum(leng) as totalgb by splunk_server | table splunk_server, totalgb
...which yields only 14.7gb to 15.66gb per indexer for the SAME index for the SAME day.
Again, i expected them not to be exactly the same but thought they should be closer than 300%+.
What is splunk licensing counting that does not seem to show up in my indexes?
I tried looking for answers for this. . . i found other posts using similar accepted answers with sum(len(_raw) as a "brute force" way to drill down on sizes. .See Splunk Answer: How to get license usage data for a particular index with a breakdown of usage by a field?
Eval's len function counts characters. Your license usage is measured in bytes. I've gotten burned by this one in the past as well.
"This function returns the character length of a string X."
thanks for this information, looks like it depends on the data, also license usage can increase if UF resends several times.
Did you find a valid solution to find the volume of a specific data/search (other than index/host/source breakdown)?