Archive

considerations on using SSD for hot\cold indexes

Path Finder

for a small scale distributed (30GB p/d) splunk instance with indexes currently on one disk.

Planning to introduce SSD for hot\warm index.

I have read various posts and

If we were to configure the indexes for say 30-60 days of hot warm data before being rolled to the slower disks would there be anything to consider such as :

When a premium app such as ES also comes into play and the data model summary ranges are larger than the hot\warm retention.
Eg: hot\warm index on SSD keep for 30 days then move to slower disk - however the authentication data model is configured for 1 year ? Would that be a factor to consider or not ?

Anything else to consider ?

gratzi.

0 Karma
1 Solution

Splunk Employee
Splunk Employee

You can configure the storage location for DMA summaries separately; find tstatsHomePath here.
Switching to SSD will greatly improve search performance for sparse and rare term searches, where random access speeds are important.
For dense searches, things will get CPU bound, because removal of I/O constraints will mean your server will be mostly busy unzipping buckets.
Hope that helps.

View solution in original post

Splunk Employee
Splunk Employee

You can configure the storage location for DMA summaries separately; find tstatsHomePath here.
Switching to SSD will greatly improve search performance for sparse and rare term searches, where random access speeds are important.
For dense searches, things will get CPU bound, because removal of I/O constraints will mean your server will be mostly busy unzipping buckets.
Hope that helps.

View solution in original post

Path Finder

gratzi,

Would it be best practice to host the tstatsHomePath on the SSD also ?

0 Karma

Splunk Employee
Splunk Employee

If you have sufficient space, yes, absolutely.

0 Karma

Path Finder

thx squire

So using the following calculations from this search ..

| dbinspect index=*
| search tsidxState="full"
| stats min(startEpoch) as MinStartTime max(startEpoch) as MaxStartTime min(endEpoch) as MinEndTime max(endEpoch) as MaxEndTime max(hostCount) as MaxHosts max(sourceTypeCount) as MaxSourceTypes sum(eventCount) as TotalEvents sum(rawSize) as TotalRawDataSizeMB sum(sizeOnDiskMB) as TotalDiskDataSizeMB by state
| eval TotalRawDataSizeMB =round((TotalRawDataSizeMB/1024/1024),6)
| eval MinStartTime=strftime(MinStartTime,"%Y/%m/%d %H:%M:%s")
| eval MaxStartTime=strftime(MaxStartTime,"%Y/%m/%d %H:%M:%s")
| eval MinEndTime=strftime(MinEndTime,"%Y/%m/%d %H:%M:%s")
| eval MaxEndTime=strftime(MaxEndTime,"%Y/%m/%d %H:%M:%s")
| eval PercentSizeReduction=round(((TotalRawDataSizeMB-TotalDiskDataSizeMB)/TotalRawDataSizeMB)*100,2)

Run over a 90 day period
(if that was how long i wanted to keep my hot\warm data before rolling to cold)

state TotalRawDataSizeMB TotalDiskDataSizeMB PercentSizeReduction
cold 27315.003618 8304.898440 69.60
hot 49257.884926 15460.234388 68.61
warm 1569389.609292 599056.425956 61.83

Total hot & warm usage on disk = roughly 600GB

So a 1TB SSD would suffice in this instance ?

If a disk of that size was unavailable could we split those indexes and put the ones we use most on the SSD and the others leave where they are ?

How would you make the same calculation for the DMA Summaries ?

0 Karma