Hello all,
I want to ask about the mechanic of rolling bucket from hot to cold. In our indexes.conf we don't setup a warm path, just hot and cold control by maxDataSizeMB.
System team give me 1TB of SSD and 3TB of SAS to work with. So naturally, I put hot path to the SSD and cold path to the SAS. Now we are encountering the problems that the indexingQueue always fill up to 100 whenever that indexer ingest data. So my question is:
1. Does the process of rolling bucket from hot to cold affect the IOPS and the writting in indexingQueue?
2. My understanding is that, the data flow go like this Forwarder -> indexer hot -> indexer cold, and this is a continuous process. And in case indexer hot is max out, it will roll to cold, but cold is SAS so the writing speed is < SSD. For example hot ingesting 2000 events per sec, but only push out 500 events per sec to cold, but hot is full already so it render the effective ingesting speed of hot to only 500 (since it full and can only take in the amount that it can push out). Is this correct?
3. If my understanding is correct, how should I approach in optimizing it. I'm thinking of two option:
a) Switch our retention policy from size base to day base, setting hot retention to 1 day, cold remain size retention, since we ingested 600~800GB per day, we can ensure the hot partion will always have a buffer to ensure the smooth transition. My question in this section is when is the rolling happen, at the end of the day, or whenever the event is one day old, thus don't change anything.
b) Create a warm path as a buffer, hot->warm->cold, the warm bucket will have 1TB and retention of 1 day, so, and with how we ingest 600-800GB per day, the warm path will always have space for the hot to roll over
Is there anything else can I do?
Wait a minute. You said that you have 1TB SSD + 3TB SAS disks and you are ingesting 600-800GB per day.
How many indexers you have? I really hope that you have clusters which contains several indexers. And have you any Splunk premium apps like ES or ITSI running on your environment? So which kind of architecture you currently have to manage your workload? And what kind of nodes you have in resource point of view?
Basically your understanding is quite correct.
With this daily ingestion amount you definitely must have a cluster or at least several indexers take data in and serving searches at same time.
Do you have a MC (monitoring console) up and running? This is excellent tool do get more information what is happening in your environment.
Volumes are excellent way to manage your indexers space. I usually define one volume for hot+warm and another for cold. You must remember that you shouldn't allocate all disk space for use there must be some additional free space for disk operations. How much this is, depending on your filesystem. Some kind rule of thumb is 10-20%. Also you shouldn't allocate all filesystem space to Splunk Volume leave also there some room as splunk will need it when if flush data from warm to cold and also from cold to frozen if you have separate frozen space.
@tungpx- Your answers below:
1. Yes, in your situation because I think as you mentioned hot is already full.
2. Your understanding is mostly correct I think.
3. Solution:
I hope this helps!!! Kindly upvote !!!!!