We are starting to run low on disk space on our Splunk server. We have a 500GB disk dedicated to Splunk data, and it's currently at 450GB used. We expect to be completely out of disk space in 3-4 days with the current growth rate.
We are looking into ways to mitigate this, and have a few questions.
Is there a way to store cold data in Splunk on an NFS mount?
Are there any performance related concerns if we moved our Splunk data to an external SAN and accessed it through fiber channel?
What is the normal procedure for
instances that run into this same
issue?
Chris's answer is mostly what you want.
A note on SAN: SANs are frequently (but not always) equivalent in performance to local storage. Some SANs can have high latency (which hurts search a lot), and sometimes they are overcommitted across various applications (splunk is I/O intensive), or configured as raid 5 (bad for writing out index data), but a beefy, tuned, up-to-date san shouldn't have any downsides over local storage.
NAS, or Network-Attached-Storage, contrarily is typically a lower bandwidth higher latency solution, and is typically not appropriate for indexing, but possibly acceptable for cold.
It is true, typically. It's possible to set up a NAS to perform sufficiently, but the technical bar is higher. Note there's a wide variety of "NAS" hardware out there, from enterprise class down to 2 disks in a tinkertoy box.
This would be true unless you tuned your NAS and network to handle the load. We have all our data stored on NAS and our performance is outstanding. We have designed a separate 1Gb interface on the server for the NAS, and the HA NAS switch is only allocated to NAS and servers that require the speed for performance. We leveraged our knowledge from putting our Oracle DB's on the same setup, which in turn made the choice of marrying Splunk and NAS easy. Good luck in whichever choice you make.
Colddb can be specified on any available filesystem the OS has access to.
Specify your colddb in
$SPLUNK_HOME/etc/system/local/indexes.conf so splunk can find it.
[ indexname ]
coldPath = < path to filesystem >
bucket directory db's will be created in
$SPLUNK_HOME/var/lib/splunk/index_name/colddb/db_n_n_n
NFS, SAN are perfectly fine for colddb bucket storage as colddb typically stores older, read only bucket data accessed by long running / time based searches.
Some other things you can do to set your index size:
Set your maxWarmDBCount to a smaller number, so more data is stored in cold as well
64bit systems - each warmDB bucket = 10gig
32bit systems - each warmDB bucket = 750mb
maxWarmDBCount = 30 (For example this would set your warm buckets to 30 dirs totaling 300 gigs on a 64 bit system)
After this number is reached, any new warm bucket db's created would roll the oldest warm db into your cold db directory.
Set your maxTotalDataSizeMB = < integer >
Set this value in MB, if the total index size reaches this number the oldest data is frozen(deleted by default)
Set your frozenTimePeriodInSecs = < integer >
Set the variable to the number of seconds after which indexed data should be erased.
Restart splunk and changes should take effect. (Although it may take some time to move large amounts of data)
A similar issue on disk space reclaiming emergencies is splunk answers as well http://answers.splunk.com/questions/1009/my-filesystem-is-full-and-splunk-stopped-indexing-how-do-i-...
Official docs at http://www.splunk.com/base/Documentation/latest/Admin/SetARetirementAndArchivingPolicy
This is not quite accurate. Each bucket is up to 10000 MB on 64-bit systems by default for the main index, and for other indexes if maxDataSize
is set to auto_high_volume
. Otherwise the bucket size is considerably smaller. I believe it is only 750 MB in these cases, but certainly less than 2000 MB.