Getting Data In

indexes.conf recommendation for large volume of data per day in 1index

fgu
Loves-to-Learn Lots

Hi,

I am looking for any recommendation when  dealing with such scenario. On one instance or one indexer, 300-400GB of data per day in a single index. Is there any recommended configuration for such index?

So far, I came with few changes:

-  increase maxTotalDataSizeMB to go beyond the 500GB default and meet my requirement in term of retention.

- Enable maxDataSize = auto_high_volume (host bucket to 10GB)

What I am considering is to increase the number of hot bucket, cause with 6 Hot bucket by default, it is only 60GB of data which is not even 24 Hours of data. Should I increase it ? or should I only increase the number of warm bucket? or both?

Are warm buckets also 10GB each? If my disk capacity and performance allows it, can I keep only warm bucket for my max retention (30 days) and dont use Cold?

Any advice or feedbacks with this type of scenario?

thanks

/Fabien

Labels (1)
0 Karma

fgu
Loves-to-Learn Lots

Thank you guys for taking the time to share your insight! that is helpful.  

The data is coming from 1 source. This is why it is in a single index. It is using also the same type of storage (local SSD), that is why I was considering keeping as many warm bucket as I can.

About the volume of data per indexer, I am much (much!) higher that what you recommend.  I thought it was Ok cause of the spec of the server I am using: (48c/96T cpu @2 .3 Ghz, 128GB of RAM, 5TB local SSD in RAID10). Would you recommend to  keep index volume low even with this type of configuration? Is there a rule of thumb I can follow to find out what the right volume of data per day based on HW specs?

 

thanks

/Fabien

0 Karma

richgalloway
SplunkTrust
SplunkTrust
0 Karma

richgalloway
SplunkTrust
SplunkTrust

400GB to a single indexer is too much, especially if you want to search, too.  Splunk recommends 100GB per indexer, although you may be able to get away with 200GB / indexer.  Using multiple indexers will improve indexing and search performance.

Why put all that data into a single index?  Is it all from the same source/sourcetype or all related somehow?  If not, split it into separate indexes.  Having all your data in a single large index may seem convenient, but searching through so much data will be slower than searching smaller indexes.

Warm buckets are the exact same size as they were when they were hot.  The change from hot to warm is just a rename.

Yes, you can keep only warm buckets, if you choose.  There's no harm in using cold buckets, however, if the storage media has the same performance as for warm buckets.  .

---
If this reply helps you, Karma would be appreciated.

isoutamo
SplunkTrust
SplunkTrust
I agree with @richgalloway, 400GB for individual indexers per daily is too much. With ES 100GB is absolutely max GB/day/indexer, with pure splunk enterprise that could be 150GB but not more. Of course it depends how much you are defining inputs and how much splunk needs to guess with events.

I also prefer separate indexes not only for access and retention, but also for search profile define which data should put together and which needs separation.

This https://community.splunk.com/t5/Getting-Data-In/What-is-the-disadvantage-of-having-a-lot-of-small-bu... probably gives you more thinking?

r. Ismo
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...