I have a high volume log file that I need to ingest with Splunk. I'd like to store the entire compressed log file with Splunk, but I only need it to index specific events. I assume by storing the raw log I will be able to index it in the future if I choose. I don't want to lose any data, but I can't have this log consuming my entire license!
Nope. If you sent the entire log file to Splunk, it will either index the entire log, or by configuring props and transforms you can have Splunk index only selected portions of the log and drop the rest. Only that which makes it to an index counts against your license but these dropped parts will not be stored in Splunk. You could always alter your props and transforms later and re-index the other portions of the log at a later date (but this requires retention of said log outside of Splunk).
Another potential option to investigate is to check out Hunk. This would be particularly attractive if you already have a Hadoop cluster or other NoSql data store. You could store the entire log file and others in the NoSql cluster, and then search the entire log file through Splunk Enterprise by setting up a virtual index. Note however that Hunk follows a separate licensing scheme than Splunk Enterprise (Hunk being is licensed on a per node basis, so there's additional cost, however data indexed on the fly to be pulled into your searches through Hunk doesn't count against your indexing volume for your Splunk Enterprise License).
Also from re-reading your question, I'm not sure if this was clear or not, but Splunk Enterprise is licensed based on bytes indexed per day not total. Say you have a 100GB license. On day 1 you could index a 50GB log file, and 50GB of other data and still be within your license. On day two you can index an additional 100GB of data from anywhere and still be ok.
Also, Splunk understands that backfilling and reindexing sometimes is necessary, so they let you go over your Enterprise license 4 times (Free license 2 times) in a rolling 30 day period, without any negative impact. but if you're going over that often you may want to look at if logs you're sending into Splunk are indeed valuable and if not follow the advice of the "Keeping the Junk out of Splunk" conf2014 talk. I definitely recommend watching this talk.
Thank you! I do understand how the license works. This file will be coming from many many different hosts daily for the foreseeable future so we will have to do what we can to minimize ingestion. Your answer is much appreciated.