Re: Indexer saturation

Ethil · ‎11-25-2024

[UPDATE]

Hello everyone, and thanks in advance for your help. I'm very new to this subject so if anything is unclear, i'll try to explain my problem more in details.

I'm using spunk 9.2.1, and i recently observed that my indexer was not indexing logs received. The indexer is in a failure state because my partition $SPLUNK_DB reached the minFreeSpace allowed in server.conf.

After further analysis it seems that one of the index _metrics on the partition is saturated with warm buckets (db_*) and taking all the space available. I however have configured all my indexes with the indexes.conf ($SPLUNK_HOME/etc/system/default/indexes.conf)

# index specific defaults
maxTotalDataSizeMB = 5000
maxDataSize = 1000
maxMemMB = 5
maxGlobalRawDataSizeMB = 0
maxGlobalDataSizeMB = 0
rotatePeriodInSecs = 30
maxHotIdleSecs = 432000
maxHotSpanSecs = 7776000
maxHotBuckets = auto
maxWarmDBCount = 300
frozenTimePeriodInSecs = 188697600
...
# there's more but i might not be able to disclose them or it might not be revelant

[_metrics]
coldPath = $SPLUNK_DB/_metrics/colddb
homePath = $SPLUNK_DB/_metrics/db
thawedPath = $SPLUNK_DB/_metrics/thaweddb
frozenTimePeriodInSecs = 1209600

From what i understand with this conf applied the index should not exceed 5GB, and when reached the warm/hot buckets should be removed, but it seems that's it's not taken into account in my case.

The indexer work fine after purging the buckets and restarting it, but i don't get why the conf was not applied ? Is there something i didn't get here ?

Is there a way to check the "characteristics" of my index once started ? -> Checked, the conf is correctly applied.

If you know anything on this subject please help me 🙂

thank you

richgalloway · ‎11-25-2024

You've showed the configuration for a single index, but no doubt there are other indexes on the same disk. Those other indexes also consume disk space and help lead to a minFreeSpace situation.

To better manage that, I recommend using volumes. Create a volume (in indexes.conf) that is about the size of the disk (or the amount you want to use) and make the indexes part of that volume (using volume:foo references). That will ensure the indexer considers the sizes of all indexes when deciding when to roll warm buckets.

---
If this reply helps you, Karma would be appreciated.

Ethil · ‎11-25-2024

Hi @richgalloway ,

thanks for your input, yes i only gave the configuration for one index because i mainly rely on the default conf written above for all my indexes on the disk, plus this specific index was the only one saturated, thus probably the issue here ? (please correct me if i'm wrong in this statement)

For the volumes, i have one in my conf, but i'm not sure how it works and how it's used (i didn't write this conf file myself), i'll try to look into this subject.

[volume:MyVolume]
path = $SPLUNK_DB

Thanks !

richgalloway · ‎11-25-2024

That's a start. You'll also need maxVolumeDataSizeMB so Splunk knows how large the volume is. Then each index definition needs to reference the volume.

[volume:MyVolume]
path = /some/file/path

[MyIndexSaturated]
coldPath = volume:path/myindexsaturated/colddb
homePath = volume:path/myindexsaturated/db
thawedPath = $SPLUNK_DB/myindexsaturated/thaweddb
frozenTimePeriodInSecs = 1209600

---
If this reply helps you, Karma would be appreciated.

Ethil · ‎11-26-2024

Ok thanks, i get this part, i'll try to rework the indexes.conf. But what i still don't get, and i really would like to know (it's quite important for me to know what was wrong before changing anything) is why it didn't work in the first place ? From what i read in the doc it should have work with a simple conf like this no ? Furthermore, using a Volume and maxVolumeDataSizeMB will help me monitor the global size of all indexes on my volume right ? But i need each indexes to possibly have a specific maxTotalDataSIzeMB and abide by it.

If it's not possible or limited (because of whatever reason) feel free to tell me.

Thanks again !

PickleRick · ‎11-26-2024

OK. See my response there - https://community.splunk.com/t5/Deployment-Architecture/How-do-I-enforce-disk-usage-on-volumes-by-in...

Additionally, because I'm not sure if this has been said here or not - just because you define something as a volume, doesn't mean that everything "physically located" in that directory is treated by Splunk as that volume.

So if you define a volume like in your case:

[volume:MyVolume]
path = $SPLUNK_DB

you must explicitly use that volume when defining index parameters. Otherwise it will not be considered a part of this volume. In other words if your index has

coldPath = volume:MyVolume/myindexsaturated/colddb

this directory will be managed with normal per-index constraints as well as global volume-based constraints.

But if you define it as

coldPath = $SPLUNK_DB/myindexsaturated/colddb

even though it is in exactly the same place on the disk, it is not considered part of that volume.

isoutamo · ‎11-26-2024

Hi

One comment to use $SPLUNK_DB in volume definition.

Actually splunk use $SPLUNK_DB on different things and storing different stuff there. This means than when you are defining inside volume that it's path = $SPLUNK_DB and set some size for it, it applies only for that volumes. When you have e.g. other indexes and some other stuff in same filesystem where your $SPLUNK_DB is, I think that spunk cannot count those size for that total volume sizes. It just counts those indexes which has definition to use that volume!

Basically this means that your volume could be come to full and this will stopped splunk, even you have add enough low max volume size attribute for volume.

For that reason I suggest that you shouldn't ever user $SPLUNK_DB as on any volume path/dir. You should always use some other separate filesystem in separate LV volume etc.

To be honest, I haven't test this is my lab to verify that my assumption is correct, but maybe other have done this test?

r. Ismo

PickleRick · ‎11-27-2024

Yes. That's a valid point. That's just one of the specific cases of my general remarks of mixing the same space both as volume-based definition and direct directory "pointer".

Theroretically, you could use $SPLUNK_DB as your volume location but:

1. There are some default indexes which write there (like all the _internal and other underscore indexes) and you'll have to make sure to relocate/redefine all of them, which might be tricky to keep synced with new software releases which might introduce new indexes (like _configtracker).

2. $SPLUNK_DB does not contain just indexes but also - for example - kvstore contents (and its backups).

isoutamo · ‎11-27-2024

You should also remember that Splunk has this as a default

splunk btool indexes list volume:_splunk_summaries|egrep '(\[volume|path)'
[volume:_splunk_summaries]
path = $SPLUNK_DB

Ethil · ‎11-26-2024

Thanks for your input ! Your explanations were clear but it does not explain how/why my index did not roll the buckets after reaching the maxTotalDataSizeMB of 5GB and went up to 35GB.

richgalloway · ‎11-26-2024

We've established the disk is very full, but have not established what is using that space. I suspect several indexes are combining to fill up the disk, but the du utility can verify that.

---
If this reply helps you, Karma would be appreciated.

Ethil · ‎11-26-2024

@richgalloway ,

Maybe my post was not clear enough sorry, i did state that one of my index on the partition (and i already know which one, the one i gave in the indexes.conf) is saturated with warm buckets (db_*) and taking all the space available, even though it's configurate as shown in the indexes.conf. Of course multiple indexes are using the disk, but only one went highly above the maxTotalDataSizeMB and saturated it.

PickleRick · ‎11-26-2024

OK. Did you verify what Splunk actually sees?

| rest /data/indexes/myindex

Some of this info you can also see in Settings->Indexes

Ethil · ‎11-26-2024

I did not, as said above in my post, i'm very new to the subject and i asked how to check if the conf was taken into account. Thanks for telling me how, i did check and splunk does seem to take the default conf written.

isoutamo · ‎11-26-2024

If it's ok to put some old files/logs into frozen state (I suppose that you have cold2frozen script on place, or you don't need those old events) then you can put your indexer into detention mode (it's denying all new connections / indexing) and update min free space into some smaller value. Also you must check e.g. with "du -sh $SPLUNK_DB) which indexes are biggest / where you could archive some buckets. Based on that just update max retention time on indexes.conf for those. Then start splunk and wait that it archive those and you will get more space.

Of course it you could just add more space into that filesystem it's probably the best way to fix the situation and get spunk up and running.

BUT after that I said that you must plan your data storage to use volumes (with separate filesystems) and update indexes definitions to use those volumes. This needs some planning and also some service break time. There are in splunk docs and in community how to move current indexes to another directories on indexer. Just follow those instructions or hire any splunk partner/PS or other consultant who could to it for you.

Ethil · ‎11-26-2024

Hi @isoutamo ,

Thanks for your input, but that's not the issue there, i already did clean my saturated index and restarted the indexer and it works fine now. And as I said to @richgalloway , in my post i stated that only one of my indexes was taking way more space than it should and i know which one. The issue is why it did exceed the maxTotalDataSizeMB set in the indexes.conf ?

Just adding more space might not be the right solution for us, but i keep in mind the whole thing around using volumes for a better planning of the data storage, thanks.

PickleRick · ‎11-27-2024

One more important things to check:

splunk btool indexes list --debug

This will give you an overview of the settings which are applied to your indexes along with where they are defined. Make sure your settings are defined in proper places

https://docs.splunk.com/Documentation/Splunk/latest/Admin/Wheretofindtheconfigurationfiles

Ethil · ‎11-27-2024

@PickleRick ,

Okay thanks for your answer, i did check both "| rest /data/indexes/myindex" and btool as you mentioned and both have maxTotalDataSizeMB to 5000 (5GB). I can't check through the GUI "Settings->Indexes" but i guess it's not that important.

PickleRick · ‎11-27-2024

OK. So that is interesting. I'd check then

1) If there isn't by any chance another definition pointing to the same directory (for example one index defined by means of $SPLUNK_DB and another based on volume)

2) What actually consumes the disk in this directory. Just the buckets or something else? Maybe you have a lot of DAS data. Or maybe you're ingesting a lot of data with indexed extractions and have bloated idx files...

Ethil · ‎11-27-2024

@PickleRick

Thanks again.

After reading again your past message and checking with my team, the index saturating in question is a default one so i updated my post with all informations i could give -> _metrics.

1) Checked, no others than _metrics are pointing on the $SPLUNK_DB/_metrics/ directory

2) Warm and Hot buckets only. What are DAS data ? I don't know about what you said last, but that might have cause the issue.

PickleRick · ‎11-27-2024

DAS = Datamodel Accelerated Summaries. In case of metrics it shouldn't apply. I'm not sure if you even can do that against metric indexes.

Anyway, does the current state reported by the rest command or the settings->indexes (in terms of current usage, not the settings) correspond to the data size on the disk?

Indexer saturation

indexer

indexing performance

resource usage

Splunk Observability for AI

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability as Code: From Zero to Dashboard

Are you a member of the Splunk Community?

Indexer saturation

indexer

indexing performance

resource usage

Splunk Observability for AI

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability as Code: From Zero to Dashboard