While it is okay to continue reading from a file that has been deleted (the OS guarantees this), this is not "safe" in that your Splunk instance could be restarted (or crash) before the data is indexed (especially if it is held up due to hardware errors or similar). In such a case, you will have deleted your source file and will not have any way to index the missing data.
"lukejadamec" 's suggestion that data is living in the index queue upon return 0 from "splunk add oneshot" is incorrect - this would imply that, for a 5 GB file, we load all 5 GB into memory before returning from the command. Instead, the file is read & indexed in a streaming fashion.
The best way to tell whether a file has been fully indexed is to verify that the eventcount for the file is correct in the index (in other words, do a search source=foo | stats count, or metasearch, or similar). However, this is obviously difficult in the case of multiline events and/or incorrect event parsing settings.
Therefore, the most reliable way to tell whether a oneshot file has been indexed is the following type of heuristic:
1) $ splunk add oneshot foo.log
2) Query the REST API at /services/data/inputs/oneshot and observe status of the item named foo.log (Bytes Indexed vs Size)
3) Eventually the file will be fully read and mostly indexed, with the remaining bits sitting in various queues, awaiting indexing. Upon hitting this condition, foo.log will not longer display in the REST API.
At this point, data should finish indexing quickly - however, there could still be various issues preventing proper indexing, such as running out of disk space, a downed network connection to a downstream indexer, influx of data from other sources, etc. Therefore, the last step is:
4) Run timed searches (perhaps every 30 seconds) checking the eventcount for foo.log, until it stabilizes, meaning the eventcount hasn't changed for a few minutes. At this point, it is reasonable to consider the data fully indexed.
... View more