What is the best way to index a file (user application file) or two for a one time analysis? Should I create a new index, use a new source type, copy the file(s) into a directory on the server and then use the Web based data upload?
Then just delete everything afterwards?
Yes, in general a good idea. No need to invent sourcetypes if those that exist are good/relevant. Remember that the retention time of an index relates to the event timestamps, not when the logs were ingested. Keeping a too short retention time could cause them to be deleted before you have time to make the analysis.
1) If you have full access to the deployment, then by far the simplest way to do this is create yourself an index for the purpose, give yourself query rights to same, and delete the entire index once you are sure you are finished with the data. It keeps it neatly encapsulated, and makes the release of storage a doddle. If you go the alternative route of selectively deleting the event data using a delete directive, this is laborious, risks selecting/deleting unexpected data, will not fully release the storage used until the index content surrounding it is expired (because the data itself is not deleted from storage, per se, only de-indexed) and expunged by house-keeping, and has a high processing overhead by deleting each event index item in turn.
2) Bear in mind that whichever way you choose to go, deleting the data/index will not cancel the licence usage incurred. If the data is large in comparison with your licence cap you may cause yourself problems if this is a repeated occurrence.
Most helpful, thanks much.
I asked for frequency, so that you can create a separate index for this and set data retention as 1 or 2 days. That way you need not delete or clean index every time you finish your analysis.
We followed same approach while trying proof of concepts. We created a new index for this, called it as temp_analysis_idx. Created a new sourcetype as temp_analysis_srctype. Rest everything as you have mentioned in your question. This helped us not to mess up with other indexers. Since data retention was set as 2 days, automatically data used to get deleted and we need not execute commands every time.
Probably not often, but why would the frequency really matter?
The answer depends on how frequently you do this kind of one time analysis?