Splunk Add-On for AWS Small Hot Buckets- Why are w...

andrew_burnett · ‎06-02-2022

We are getting the small hot buckets warning for this index, but the timestamps look fine just with a few hours offset. Not quite sure where to go from here.

shivanshu1593 · ‎06-02-2022

"A few hours offset" can be a big contributing factor here. How big of an offset are we talking here? If you haven't changed the value of maxHotBuckets for the index, it defaults to auto, which the indexers use to the set the value to 3. If the timestamp of the data is all over the place (Some in the future, some really old data), Splunk ingests it but force roll a hot bucket to create a new one for the data with unusual timestamp and if it happens frequently, you find a lot of small hot buckets being created in a short span of time.

Please check if the data coming from AWS is being accepted and stored "in the future" by running searches with index=yourindex sourcetype=aws:* earliest=+5m latest=+7d. If the volume is considerably large, this could be a big contributor to the error.
Please look for errors like "Accepted time is suspiciously far away from the previous event's time". This can tell you if the events that got ingested have timestamps far back in the past and Splunk ended creating hot buckets for them.
Creating a custom props.conf to define TIME_PREFIX, MAX_TIMESTAMP_LOOKAHEAD and TIME_FORMAT along with line breaking to ensure that Splunk reads the timestamp properly from your data.
Doing a sanity check of the data itself. I've seen some log sources in some environments, where timestamp within the log source was all over the place all the time. Ended up with DATETIME_CONFIG = current to resolve the problem.

Hope this helps,

###If it helps, kindly consider accepting as an answer/upvoting###

Thank you,
Shiv
###If you found the answer helpful, kindly consider upvoting/accepting it as the answer as it helps other Splunkers find the solutions to similar issues###

andrew_burnett · ‎06-02-2022

And all the timestamps are the same TZ, so no weird differing times that I can see either.

andrew_burnett · ‎06-02-2022

So the add-on came with props, and what I mean by offset is that all the events are in a timezone 6 hours ahead of us, but when I search it converts it to my time. When I tried to search it failed with this message "Unable to parse the search: Invalid time bounds in search: start=1654750800 > end=1654198380."

shivanshu1593 · ‎06-02-2022

My bad. I wrote the timeranges incorrectly for the search. Have updated the answer above.

Thank you,
Shiv
###If you found the answer helpful, kindly consider upvoting/accepting it as the answer as it helps other Splunkers find the solutions to similar issues###

andrew_burnett · ‎06-02-2022

I have no events in the future.

shivanshu1593 · ‎06-02-2022

Interesting. Could you kindly share as to which version of Splunk are you using and what's the % of hot buckets that Splunk is giving in the error message.

Please try running the search from this post and see if the index that you got the error message for gets identified.

https://community.splunk.com/t5/Getting-Data-In/The-percentage-of-small-of-buckets-is-very-high-and-...

Could you also check if:

The issue is present in all of your indexers.
Were the affected indexers restarted recently and had any issues accepting the data (Can be found by looking into splunkd.log)

Thank you,
Shiv
###If you found the answer helpful, kindly consider upvoting/accepting it as the answer as it helps other Splunkers find the solutions to similar issues###

shivanshu1593 · ‎06-02-2022

Could you please run the following search for last last 7 days and see if it returns the name of the affected indexer. If it doesn't returns a result, please try to take the "Received shutdown signal." string and run it in the splunkd.log of the indexer (If you have access to its box).

index=_internal "Received shutdown signal." sourcetype=splunkd component!="SearchParser" 
| dedup host 
| stats max(_time) as _time by host

Thank you,
Shiv
###If you found the answer helpful, kindly consider upvoting/accepting it as the answer as it helps other Splunkers find the solutions to similar issues###

andrew_burnett · ‎06-02-2022

Beyond just a second ago when I restarted it, I have nothing popping up.

shivanshu1593 · ‎06-02-2022

Okay. This seems interesting though. We've eliminated the most common reasons for this issue. The only ones that remain are:

Checking if there are events with timestamp extraction issues by searching the string "is suspiciously far away from the previous event's time" and checking if it is happening for your affected log source.
Network connectivity issue between HF and Indexers (Which is highly unlikely)

Thank you,
Shiv
###If you found the answer helpful, kindly consider upvoting/accepting it as the answer as it helps other Splunkers find the solutions to similar issues###

andrew_burnett · ‎06-02-2022

Well there is no errors in Splunk for that sourcetype, so nothing like that is flagging in the data. And all the connections seem fine, there is other add-ons on that HF that are reporting fine.

andrew_burnett · ‎06-02-2022

I am running 8.2.4 with 69% of small buckets and it's only flagging on one of my indexers. And I don't see any errors in splunkd regarding that add-on.

andrew_burnett · ‎06-02-2022

Restarting the indexer got rid of the problem for now, not sure it's going to fix the underlying problem.

shivanshu1593 · ‎06-02-2022

Ah. Was the instance recently restarted as well? If there's no problem with the log source, you should not face the issue again anytime soon hopefully. Restarting the indexers rolls all the hot buckets to warm, so that would have done the trick for now.

Thank you,
Shiv
###If you found the answer helpful, kindly consider upvoting/accepting it as the answer as it helps other Splunkers find the solutions to similar issues###

andrew_burnett · ‎06-03-2022

It came back within 2 hours and is now affecting two indexers.

jamie00171 · ‎06-06-2022

Hi @andrew_burnett

Can you run the following search please:

index=_internal component=HotBucketRoller idx=<insert impacted index name here>
| stats count by calller

which will show why the buckets are rolling to hot.

Then you can run:

index=_internal component=IndexWriter idx=<insert impacted index name here>

which should show more details about why new hot buckets and being created and, if it was due to the timestamp of an event it will show it.

Thanks,

Jamie

andrew_burnett · ‎06-06-2022

This is the update of the first search.

Splunk Add-On for AWS Small Hot Buckets- Why are we receiving warning?

heavy forwarder

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases