Getting Data In

Fishbucket Growing Issue

hieuba6868
Explorer

Hi, I have an issue here with the fishbucket of the Universal Forwarder. I have tried to look for quite a lot of documentation, but it seems that there is too little documentation, and there are also few topics on it.

The problem I am facing is that the fishbucket is taking up a large amount of space, about 2GB on the hard drive, while the limit configuration in limits.conf is:

 file_tracking_db_threshold_mb = 500 

In some other topics, I read that the fishbucket can be up to 2 or 3 times larger than the configured limit. And this happens because of its backup mechanism with file save and snapshot.tmp

However, is there a limit to the size of the fishbucket? Will it continue to expand over time without limit, or only expand to a certain limit?

PS:  i have nmon TA install on my server.

Please, provide me with Splunk documentation on this part.

Thank you.

 

 

Labels (2)
Tags (1)
0 Karma
1 Solution

hieuba6868
Explorer

So, after asking for Splunk Support, they said:

1. The possible reasons and conditions under which the fishbucket could exceed the configured threshold of 500MB.

It is because of the amount of data ingestion you are doing per day & the fishbucket can be up to 2 or 3 times larger than the configured limit. And this happens because of its backup mechanism with file save and snapshot.tmp

2. If there are any log files or diagnostic tools within Splunk that can help us track and understand the growth of the fishbucket index.

If you have the nmon app installed we found that it was contributing to the fishbucket's rapid growth.

3.The absolute maximum size that the fishbucket can reach within the Splunk system.

There is no strict maximum size for the splunk fish bucket. It is the size influenced by factors like the volume of data being ingested, the frequency of indexing & the specific configuration of the your splunk environment.

4.Any factors that could contribute to the fishbucket exceeding the expected maximum by such a substantial margin.

It can only grow with the time, volume of the data, frequency of indexing.

View solution in original post

hieuba6868
Explorer

So, after asking for Splunk Support, they said:

1. The possible reasons and conditions under which the fishbucket could exceed the configured threshold of 500MB.

It is because of the amount of data ingestion you are doing per day & the fishbucket can be up to 2 or 3 times larger than the configured limit. And this happens because of its backup mechanism with file save and snapshot.tmp

2. If there are any log files or diagnostic tools within Splunk that can help us track and understand the growth of the fishbucket index.

If you have the nmon app installed we found that it was contributing to the fishbucket's rapid growth.

3.The absolute maximum size that the fishbucket can reach within the Splunk system.

There is no strict maximum size for the splunk fish bucket. It is the size influenced by factors like the volume of data being ingested, the frequency of indexing & the specific configuration of the your splunk environment.

4.Any factors that could contribute to the fishbucket exceeding the expected maximum by such a substantial margin.

It can only grow with the time, volume of the data, frequency of indexing.

richgalloway
SplunkTrust
SplunkTrust

You are correct about the size being affected by backup and save files.  The limit.conf setting is for the base file only so the total could be 4x that value.

I don't see how the nmon TA has any bearing on this.

---
If this reply helps you, Karma would be appreciated.
0 Karma

hieuba6868
Explorer

Hi @richgalloway,  

  1. Do you have any documentation that validates the possibility of the fishbucket's size being up to four times larger than the limit specified in the limits.conf file? Any official resources or explanations that could clarify why the fishbucket index might exceed the configured threshold by such a significant margin would be extremely helpful.

  2. Concerning TA-nmon: I've noticed that it monitors the server by generating new CSV files every minute, and it deletes the older ones. I suspect that this process could incrementally increase the size of the fishbucket, as it continuously logs the CRCs of newly created log files without removing the CRCs of the old, deleted logs. This situation seems to be evidenced by the _internal log errors related to checksum faild when the log files no longer exist.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

1.  Not exactly.  Here's what limits.conf.spec says about the fishbucket size limit:

file_tracking_db_threshold_mb = <integer>
* The size, in megabytes, at which point the file tracking
  database, otherwise known as the "fishbucket" or "btree", rolls over
  to a new file.
* The rollover process is as follows:
  * After the fishbucket reaches 'file_tracking_db_threshold_mb' megabytes
    in size, a new database file is created.
  * From this point forward, the processor writes new entries to the
    new database.
  * Initially, the processor attempts to read entries from the new database,
    but upon failure, falls back to the old database.
  * Successful reads from the old database are written to the new database.

Notice the old database file stays around even when a new database file is created.  That implies the file_tracking_db_threshold_mb value is at least doubled.  When the database is saved, it's doubled again for each file (new and old) so 4x.

2. I see what you mean, although this is true for any TA, not just nmon.  The more input files you have, the more that must be tracked in the fishbucket.

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...