Solved: Planning Cluster Total Storage Capacity (when no o...

ejpulsar · ‎09-10-2013

Hi,

I've read several cluster deployment references but still have no clearly answer for one question.

I need to store 50 TB of data in a cluster with 30-50 typical peers which have 1-2TB RAID1,10 storage on each. I need to expand this storage by simply adding peers. I do not want to use Storage Systems with 25-50TB pool (it's so expensive). Can Splunk spread buckets to other peer and holds on first peer only part of entire buckets?

Аccording "Buckets and clusters" first peer holds all buckets, but when it's dead all buckets spreads across all cluster.

Is this any performance impact on searching? Or we must use storage systems or third party file system virtualization tools?

kristian_kolb · ‎09-10-2013

Hm, no. Well.

With a clustered setup, all peers will hold buckets. There is no layered/tiered indexer structure, where there are primary and secondary indexers.

With a normal setup, forwarders will send data to all indexers (loadbalancing between them). Then as part of the index replication functionality, indexers will send data between themselves, in order to have redundant copies of the indexed data. Thus, each indexer will have both primary buckets (containing data that came straight from a forwarder) and replicated buckets (which were copied from another indexer).

Assuming that you have 3 indexers, and a replication factor of 2, and a search factor of 2, the bucket distribution could look like this.

UPPERCASE = primary buckets
lowercase = replicated buckets

host         indexer1    indexer2    indexer3
Primary      A, D        B, E        C, F
Replicated   e, c        a, f        b, d

With both RF=2 and SF=2, data will take up twice the space. So if your original logs are 50 TB, you can count on an average compression rate of 50% (compressed raw data + indexes for making it searchable), netting 25TB. But since you have index replication your storage needs are doubled (for this scenario), so you're back at needing 50TB of hard drive space.

Hope this helps,

K

View solution in original post

mahamed_splunk · ‎09-17-2013

Splunk compresses the data before storing it on disk. It also need to build search files (TSIDX) on top of the raw data to speed up searching.

The following blog post talks about storage requirements in clustering.

http://blogs.splunk.com/2013/01/31/disk-space-estimator-for-index-replication/

kristian_kolb · ‎09-16-2013

Well, as always, the answer is "It depends". There may be reasons against this for geographical or topological reasons. But in theory, yes, spreading the data over more indexers allows for faster search results.

/K

ejpulsar · ‎09-12-2013

Hello

Should we point forwarders on all 50 peers?

gfuente · ‎09-10-2013

Hello,

So, with the explanation that Kristian gave, and your data you will have:

50 peers with 2TB = 100TB so you can storage up to 50TB (x2 due to replication)
RF= 2 and SF = 2

Regards

kristian_kolb · ‎09-10-2013

Hm, no. Well.

With a clustered setup, all peers will hold buckets. There is no layered/tiered indexer structure, where there are primary and secondary indexers.

With a normal setup, forwarders will send data to all indexers (loadbalancing between them). Then as part of the index replication functionality, indexers will send data between themselves, in order to have redundant copies of the indexed data. Thus, each indexer will have both primary buckets (containing data that came straight from a forwarder) and replicated buckets (which were copied from another indexer).

Assuming that you have 3 indexers, and a replication factor of 2, and a search factor of 2, the bucket distribution could look like this.

UPPERCASE = primary buckets
lowercase = replicated buckets

host         indexer1    indexer2    indexer3
Primary      A, D        B, E        C, F
Replicated   e, c        a, f        b, d

With both RF=2 and SF=2, data will take up twice the space. So if your original logs are 50 TB, you can count on an average compression rate of 50% (compressed raw data + indexes for making it searchable), netting 25TB. But since you have index replication your storage needs are doubled (for this scenario), so you're back at needing 50TB of hard drive space.

Hope this helps,

K

ejpulsar · ‎10-30-2014

Hi Kristian! Thanks for the answer and late accept.
Now I clearly figured this.

Planning Cluster Total Storage Capacity (when no one peer holds entire buckets set)

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?