Without much context as to why, using
len(_raw) is an ok approximation of the size of a log... however you should know that len does not actually count bytes but rather it counts characters. If knowing bytes is crucial, I would refer you to looking at the License Usage Report View or actually just running
ls -l or similar utilities on the box where the log comes from.
To see this in action.... I made two files, one that contained
words and the other
كلمات I then put both in a directory and indexed them (taking good advantage of my dev-test license). Using
len() both come out to 5, but checking the index usage data, I can see that
words equals 5 bytes but
كلمات is 10 bytes. (In this case, each character, encoded UTF-8 is 2 bytes wide).
Now most system level logs, that you'd aggregate in Splunk tend to be US-ASCII so each character (UTF-8) happens to be 1 byte, but this might not be universally the case.
EDIT: A bit more of a rabbit hole, but I had one file containing
كلمات encoded UTF-8 (10 bytes long), and another encoded ISO8859-6 (5 byte long file on disk). Ingesting the 8859-6 file using a sourcetype that specifies the encoding as such (so the text is readable in Splunk), the license impact is still 10 bytes, because translation to UTF-8 happens before counting license.
check this out:
here's a search:
index=alloyaudit_core |eval raw_len=len(_raw) | eval raw_len_kb = raw_len/1024 | eval raw_len_mb = raw_len/1024/1024 | eval raw_len_gb = raw_len/1024/1024/1024 | stats sum(raw_len) as Bytes sum(raw_len_kb) as KB sum(raw_len_mb) as MB sum(raw_len_gb) as GB by source
hope it helps