This documentation might help answer some of the questions you have regarding monitoring files and directories:
http://docs.splunk.com/Documentation/Splunk/6.5.2/Data/Monitorfilesanddirectorieswithinputs.conf
If your logs are generated from a script, try reading about scripted inputs:
http://docs.splunk.com/Documentation/Splunk/6.5.2/AdvancedDev/ScriptSetup
I think in your case, you might take a look at Example 2 from the documentation pertaining to monitoring files and directories:
To load anything in /apache/ that ends in .log.
[monitor:///apache/*.log]
or for Windows
[monitor://C:\path\to\your\stuff\*.log]
Splunk will monitor any new additions or changes to the files you set up on a monitoring stanza on.
The retention of your data depends on a number of factors. Most of the time, this depends on how much storage you have and Splunk will automatically remove data from indexes should storage be an issue. You can set a retirement date for your data by either size or age (time), and even do this per index if you care more about the retention of specific data. This document explains a bit about how Splunk data "ages" and "rolls" to different buckets depending on your settings:
https://docs.splunk.com/Documentation/Splunk/6.5.2/Indexer/Setaretirementandarchivingpolicy
If you want to determine how long your data is hanging around, you could try this:
| dbinspect
[ eventcount summarize="false" index=*
| dedup index
| fields index]
| stats min(startEpoch) AS startEpoch,min(modTime) AS modTime by index,splunk_server
| convert ctime(startEpoch) AS startEpoch
| rename modTime AS "Oldest Bucket",startEpoch AS "Earliest Event Time"
... View more