Why is the percentage of small of buckets are very...

HathMH · ‎07-08-2022

New to cybersecurity, been in my first entry level job for 6 months.

New to splunk, took some classes but they were quick and didn't detail a whole lot, splunk instructor read the slides basically.

Ran into issue, red warning on 8.2.4

The percentage of small buckets (100%) created over the last hour is high and exceeded the red thresholds (50%) for index=mail, and possibly more indexes, on this indexer. At the time this alert fired, total buckets created=5, small buckets=1
and then it would list that last 50 related messages, early this morning it did....but now it says 'None'

It happens on indexer #7 of the 8 we have.

Crawling the web to gain some understanding. Found this link:

Solved: The percentage of small of buckets is very high an... - Splunk Community

The OP had the same issues but talked about timeparsing and they were able to fix.

- what is and how to fix timeparsing?

I am not great with search strings, regex, etc...splunk just kind of fell in my lap. I tried to follow the search string that @jacobpevans wrote up in reply to the post above, not sure if I follow it well. basically its serching for each hot bucket for index _internal sourcetype splunkd, listing those hot buckets that are moving to warm, renaming to index to join, join command then joins each intance to a rollover.

I run his string as is, and get a lot of readout listing many indexes but not index mail as indicated in the warning. Also the readout shows 4 rows (2 indexes on 2 different indexers) with Violation and 100% small buckets.

I would like to resolve this issue but i am seriously lost, haha. I think splunk may be the death of my career even before i get started.

rsennett_splunk · ‎07-08-2022

Hello @HathMH ! Welcome to Splunk and to the working world!

Essentially what this error implies (and what the solution you mentioned was talking about) is that you've got data coming in, where you have perhaps not given Splunk enough information about how to handle the timestamp. It has chosen something... either something that looks like but is not the timestamp OR there is a level of granularity in milliseconds to the timestamp that is proving too much for the hardware/networking you are working with, without some kind of care and feeding.

When data is brought into Splunk, and you don't give it instruction, it will do its very best to first figure out where a good timestamp is and it will use it. Then it looks for key=value pairs etc... and a number of other things in order to make sense of your data. It works hard for this and some data will cause it to work unreasonably hard. i.e. the data looks one way for a while and then suddenly it changes so Splunk has to re-evaluate it's decisions. you know your data best... so it is best to share that info.

This is what the other solution is talking about:
https://docs.splunk.com/Documentation/Splunk/9.0.0/Data/Configuretimestamprecognition

Now... if you need this granularity that you're getting and all the time recognition is intentional, then you have to look at how the indexers are configured and how the index=mail is configured regarding how much it collects into each type of bucket before it rolls.

you should not START with complex bucket math. And it isn't recommended to do too much tinkering with the index configurations until you know what you're doing (meaning you are a certified admin and have had some experience at it).

Take a look at the sourcetypes in the index called mail
index=mail |stats count by sourcetype

you need to look for the props.conf that mentions that sourcetype and look for the timestamp configuration.

If it's there... then you might need more help.
If not... then identify the timestamp and apply that to the data on the way in.

warning... don't do this on a production system.
Best thing to do is to take a sample log, set up a test index and send data into it with your test config. Maybe you have a dev or test env... try it there.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

Why is the percentage of small of buckets are very high and exceeded the red thresholds?

error

indexer

indexing performance

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio