Deployment Architecture

Estimating Storage requirement when only internal logs

hectorvp
Communicator

Hi Splunkers,

We need to estimate the disk space required for our single box Splunk enterprise.

We are planning to only ingest internal logs of splunkd and don't see any way, how can I estimate disk space for internal logs. Don't know how many events are generated from a UF and how much size a single event size would be.

We would be having around 400 UFs running on servers and have expectancy of 60days of retention policy.

I'm afraid if 500GB space will fill up before 60days and have wont have internal logs.

Apart from this, Please suggest if I really need RAID levels 1+0 for internal logs, there would be few schedule search for health checkups and DMC. Or  any other simple storage will suffice? 

Is there any way to estimate this part?

Labels (3)
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @hectorvp,

_internal number of events is a realy variable number and the correct approach is to see in your real situation the events and the storage occupation you have.

Anyway, you can start this analysis with a starting value of 800.000 events/day, that means around 0.07 GB/day for each server.

for your 400 UFs, are around 28.5 GB/day.

So the storage depends much on the retention policy you use.

In my projects I usually use 15 or 30 days of retention, I think that this is a useful period to analysze what happend in case of problems, I don't think that older events can be useful.

This means 425 or 850 GB of real events, that compressed are half of these values.

In conclusion, 400 UFs with a retention of 15 days on two clustered Indexers use around 215 GB on each Indexer.

About the question of RAID 1+0, it isn't mandatory, but i usually use RAID1+0 for qll Indexes also because _internal logs are ususlly used also to check if a serer is up and running, eventually to consume less high performace storage, it could be useful putting cold data on a slower storage.

Ciao.

Giuseppe

 

View solution in original post

gcusello
SplunkTrust
SplunkTrust

Hi @hectorvp,

_internal number of events is a realy variable number and the correct approach is to see in your real situation the events and the storage occupation you have.

Anyway, you can start this analysis with a starting value of 800.000 events/day, that means around 0.07 GB/day for each server.

for your 400 UFs, are around 28.5 GB/day.

So the storage depends much on the retention policy you use.

In my projects I usually use 15 or 30 days of retention, I think that this is a useful period to analysze what happend in case of problems, I don't think that older events can be useful.

This means 425 or 850 GB of real events, that compressed are half of these values.

In conclusion, 400 UFs with a retention of 15 days on two clustered Indexers use around 215 GB on each Indexer.

About the question of RAID 1+0, it isn't mandatory, but i usually use RAID1+0 for qll Indexes also because _internal logs are ususlly used also to check if a serer is up and running, eventually to consume less high performace storage, it could be useful putting cold data on a slower storage.

Ciao.

Giuseppe

 

hectorvp
Communicator

Thanks @gcusello ,

Since I'm storing only internal logs in my standalone indexer, I may use raid0 or 15k rpm sas HDD....still would think over it.

0 Karma
Get Updates on the Splunk Community!

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Shape the Future of Splunk: Join the Product Research Lab!

Join the Splunk Product Research Lab and connect with us in the Slack channel #product-research-lab to get ...