I'm trying to understand how Splunk calculates license. There is particular index "snort" which receives some JSON input and laucher reports this index has increased significantly. If I do this search:
index=_internal source=*license_usage.log type=Usage idx=snort | stats sum(b) as bytes | eval MB = round(bytes/1024/1024,1) | fields MB
it reports me 9GB for a given period. If I estimate the length of each event and sum these values in such a way
index=snort | eval len_raw = len(_raw) | stats sum(len_raw) as bytes | eval MB = round(bytes/1024/1024,1) | fields MB
it gives me 18MB. I.e, there is about 500 times difference. I understand there may be issues due to encoding (ASCII vs UTF8), yet it would make 2 times difference, not 500. There are other sources that allow me to estimate the size and number of events from these sources and it seems 18MB should be the right number. Any ideas why numbers reported in _internal log are so much different?
Edit: Ok, it seems that what happened is that collection script got stuck and it was continuously re-sending old data. Therefore index=snort query didn't see that data because of time-range restriction, but it was included into license estimation.
pls verify (edit the span=1d to your given period)...
index=_internal source=*license_usage.log* type=Usage
| timechart span=1d sum(b) AS volume_b by idx
and see if the snort idx also reports
Manually Counting event sizes over a time range -
Roughly, you can run a search where you look at all (or some) data over a range of indexedtime values, counting up the size of the actual events. For example, where the endpoints STARTTIME and END_TIME are numbers in seconds from the start of unix epoch, the search would be
This is a slow and expensive search, but when you really need to know, can be valuable. It must be run across a time range that can contain all possible events that were indexed at that time -- meaning regardless of timestamp regularity. Typically this means it must be run over all time. The stats computationg as well as initial filters can of course be adjusted to look at the problem more closely.
pls check the len(_raw) command from (i tried to include that command here, but its not showing at all.. you can see i tried to edit this answer around 10 times)
I updated original post. Your mention of time ranges gave me a hint that I'm looking only at fresh data, but it's old data which was indexed.