What strategies do people use for backups of their buckets? Is there a clean way to identify "new" buckets for a given day based on their file name?
The documentation recommends doing incremental backups of warm buckets, see Back up indexed data in the Managing Indexers and Clusters of Indexers manual. As skalliger mentions in his answer, the bucket names do indicate the age of the data they contain.
Splunk recommends snapshot technology to backup buckets. Due to hot buckets being written to, you should consider not backing them up via snapshots, as you may miss data. Apart from that, snapshotting all the other buckets is recommended (warm, cold, ...).
Every bucket follows a naming convention with two timestamps (newest and oldest time):
http://docs.splunk.com/Documentation/Splunk/6.5.1/Indexer/HowSplunkstoresindexes#Bucket_naming_conve...
Did that answer your question?
Skalli
Snapshot makes you get you data backed up for your instant point of time data. So very reason, Splunk recommends to have resiliency to be maintained, in case to protect data
If you have an Hadoop cluster, you might consider Hunk for a full backup solution - Is there a solution to back up Splunk data into HDFS to make it available for search via Hunk?