For any given search, Splunk will only retain a limited number of raw events, i.e., actual event data pulled out of the index by the search command. The results of a search -- what is produced by evaluating the entire search string -- are completely preserved along side of field summary and timeline information. The data contained in the *.csv.gz files is capped by settings in limits.conf .
To illustrate the necessity for this, consider this simple example:
search source="*apache_access.log" 200
If your index contains 2 billion events that match the search, storing all 2 billion events every time you run that search would consume the entire storage system in no time. Generally speaking, the 2 billion row data set is not what you're after -- it's the summarized or transformed version that is of interest.
Note that the limitation described here does not mean that Splunk cannot handle lots of events. The search language will process all events asked of it, but will abide by these practical safety controls and not cache all of the raw data.
For perspective, search for a word like computer in Google. Even though Google reports that there are 705,000,000 hits, you will only ever be able to access up to 1,000 results because it is implausible to store that result in its entirely (aside from the fact that nobody is actually going to look at 705M links).
... View more