On May 4th, the disk space used by our default index jumped from about 400G to about 3TB. This doesn't seem to be related to actual indexing-- we don't index anywhere near 2.5TB a day, and we didn't receive any license alerts.
I have used this search:
index=_internal source=*metrics.log | eval GB=kb/(1024*1024) | search group="per_sourcetype_thruput" | chart sum(GB) by series
...to look at the amount indexed per sourcetype for the days around the 4th, but there isn't any difference between them, and on none of these days did we index anywhere near that much data.
On the 5th, our disk usage returned to its old daily growth rate.
Is there a way to figure out what happened on the 4th, and why our index suddenly consumed so much disk space without warning?
Search and tabulate the entries by punctuation, and see of there's a spike in any particular pattern.
Tabulate by hour, and look for the start of the increase.
Find the source of the anomaly.
The answer to your question is in the very data itself...
One thing to look out for is that you're not doing some weekly rotation and ending up with all the renumbered rotated logs being reindexed. (If you're using log rotation you also need to implement appropriate black/whitelist rules.)