Solved: The percentage of small of buckets is very high an... - Page 2

net1993 · ‎08-08-2019

Hello
I had that red warning right before the username in splunk and after analyzing I found that there were a few sourcetypes with wrong timeparsing.
I have fixed all of these fails but the red warning is still appearing (there is approximately 1 hours since last parsing error)
I am curious if there are no anymore parsing errors, when the red warning will disappear?

jacobpevans · ‎08-16-2019

Greetings @net1993,

Please post your version. After upgrading to 7.2.4 from 6.6.4, we are seeing the same error. According to @kheo_splunk on this Splunk answers, a small bucket is 10% of maxDataSize for the index (although I couldn't find that in indexes.conf or health.conf). Here's as far as I've gotten with this:

Error

On an indexer, click the health badge in header bar next to your user name, then Buckets.

Buckets
Root Cause(s):
The percentage of small of buckets created (83) over the last hour is very high and exceeded the red thresholds (50) for index=windows, and possibly more indexes, on this indexer
Last 50 related messages:
08-16-2019 10:30:21.649 -0400 INFO HotBucketRoller - finished moving hot to warm bid=services~920~0514B976-C45E-486C-B57C-A1E810AEC966 idx=services from=hot_v1_920 to=db_1565890631_1565852558_920_0514B976-C45E-486C-B57C-A1E810AEC966 size=393109504 caller=lru maxHotBuckets=3, count=4 hot buckets,evicting_count=1 LRU hots
08-16-2019 10:00:03.781 -0400 INFO HotBucketRoller - finished moving hot to warm bid=windows~145~0514B976-C45E-486C-B57C-A1E810AEC966 idx=windows from=hot_v1_145 to=db_1565761563_1564808117_145_0514B976-C45E-486C-B57C-A1E810AEC966 size=1052672 caller=lru maxHotBuckets=3, count=4 hot buckets,evicting_count=1 LRU hots

We have two indexers. The two indexers have different numbers (83 on Indexer 1, 66 on Indexer 2) and errors so it appears to be checking them separately. As a side note, I do not believe the over the last hour part of the error is accurate. The setting to change this is indicator:percent_small_buckets_created_last_24h which leads me to believe the search is over the past 24 hours.

Queries

Run the following search for either yesterday or the previous 24 hours. I haven't narrowed down the exact time frame, but it does seem to be some variation of 24 hours.

index=_internal sourcetype=splunkd component=HotBucketRoller "finished moving hot to warm"
| eval bucketSizeMB = round(size / 1024 / 1024, 2)
| table _time splunk_server idx bid bucketSizeMB
| rename idx as index
| join type=left index 
    [ | rest /services/data/indexes count=0
      | rename title as index
      | eval maxDataSize = case (maxDataSize == "auto",             750,
                                 maxDataSize == "auto_high_volume", 10000,
                                 true(),                            maxDataSize)
      | table  index updated currentDBSizeMB homePath.maxDataSizeMB maxDataSize maxHotBuckets maxWarmDBCount ]
| eval bucketSizePercent = round(100*(bucketSizeMB/maxDataSize))
| eval isSmallBucket     = if (bucketSizePercent < 10, 1, 0)
| stats sum(isSmallBucket) as num_small_buckets
        count              as num_total_buckets
        by index splunk_server
| eval  percentSmallBuckets = round(100*(num_small_buckets/num_total_buckets))
| sort  - percentSmallBuckets
| eval isViolation = if (percentSmallBuckets > 30, "Yes", "No")

Breaking it down,

index=_internal sourcetype=splunkd component=HotBucketRoller "finished moving hot to warm"
| eval bucketSizeMB = round(size / 1024 / 1024, 2)
| table _time splunk_server idx bid bucketSizeMB
| rename idx as index

Get each instance of a hot bucket rolling over to a warm bucket. Rename to "index" for the join to work properly. Has the size of the now warm bucket.

| join type=left index 
    [ | rest /services/data/indexes count=0
      | rename title as index
      | eval maxDataSize = case (maxDataSize == "auto",             750,
                                 maxDataSize == "auto_high_volume", 10000,
                                 true(),                            maxDataSize)
      | table  index updated currentDBSizeMB homePath.maxDataSizeMB maxDataSize maxHotBuckets maxWarmDBCount ]

Join each instance of a rollover event to a rest call to get the maxDataSize for that index. A value of "auto" is 750MB. "auto_high_volume" is 10GB (or 1GB on 32 bit systems). The rest is pretty self-explanatory, but I'll explain a few lines.

| eval bucketSizePercent = round(100*(bucketSizeMB/maxDataSize))
| eval isSmallBucket     = if (bucketSizePercent < 10, 1, 0)

Apparently a small bucket is <10% of the maxDataSize for the index.

| eval isViolation = if (percentSmallBuckets > 30, "Yes", "No")

The standard setting for a violation is >30%.

This still does not fully work for me, but I believe the answer is close to this. I tried past 60 minutes (definitely not), past 24 hours, Today, and Yesterday. None of the values match, although I do see "violations". One thing I did notice is that the numbers displayed in the error (83 and 66 for me) do not seem to change as if this check is not running often (every 4 hours? Once a day?).

If anyone sees anything wrong, just let me know.

Edit: fixed one issue. This query is now close enough to accurate for my purposes. It does work to find indexes with a high percent of small buckets, it just doesn't match the numbers that Splunk shows.

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

View solution in original post

The percentage of small of buckets is very high and exceeded the red thresholds...-When the red warning will disappear after fixed parsing?

time

Error

Queries

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!