Deployment Architecture

What is the naming convention behind the db_ buckets?

Chris_R_
Splunk Employee
Splunk Employee

What's the naming convention on these directories(buckets) and where can i find more info on tuning my splunk buckets?

Tags (2)
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

The buckets are named:

db_latesttime_earliesttime_idnum

where latesttime is the time stamp of the latest event in the bucket, earliesttime is the time stamp of the earliest event in the bucket, and idnum is an ID number that must be unique within the database across all buckets in the database.

Splunk uses the time numbers to decide whether to look in a bucket at all for events in a given time span. If you narrow these numbers, then events that are in the bucket but outside the new time range will not be returned in search. If you widen them, Splunk will waste time looking in the bucket for events it will never find.

ID numbers matter if you are merging databases or restoring buckets from an archive, and you must ensure that every bucket has a unique ID after any merge. Although ID numbers are generated sequentially by normal Splunk indexing, they do not have to be sequential, nor can you count on them remaining unchanged. A simple way to ensure unique IDs would be to append a distinct digit (or series of digits) to buckets from each specific source, so that buckets from different sources could not possibly match on their last digit.

In general, you should not tune bucket sizes without extensive and deep knowledge of how indexing and searching operates. Most changes from the standards will result in decreased search performance over that data, sometimes enormously decreased, and will extremely rarely result in any noticeable improvements. Use auto or auto_high_volume, and accept or copy defaults for most parameters.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

The buckets are named:

db_latesttime_earliesttime_idnum

where latesttime is the time stamp of the latest event in the bucket, earliesttime is the time stamp of the earliest event in the bucket, and idnum is an ID number that must be unique within the database across all buckets in the database.

Splunk uses the time numbers to decide whether to look in a bucket at all for events in a given time span. If you narrow these numbers, then events that are in the bucket but outside the new time range will not be returned in search. If you widen them, Splunk will waste time looking in the bucket for events it will never find.

ID numbers matter if you are merging databases or restoring buckets from an archive, and you must ensure that every bucket has a unique ID after any merge. Although ID numbers are generated sequentially by normal Splunk indexing, they do not have to be sequential, nor can you count on them remaining unchanged. A simple way to ensure unique IDs would be to append a distinct digit (or series of digits) to buckets from each specific source, so that buckets from different sources could not possibly match on their last digit.

In general, you should not tune bucket sizes without extensive and deep knowledge of how indexing and searching operates. Most changes from the standards will result in decreased search performance over that data, sometimes enormously decreased, and will extremely rarely result in any noticeable improvements. Use auto or auto_high_volume, and accept or copy defaults for most parameters.

View solution in original post

znaesh
Path Finder

And the bucket number should really only contain digits, no literals, and field length is limited.

0 Karma

Chris_R_
Splunk Employee
Splunk Employee

These are your warm buckets in each index stored in UTC epoch seconds db_#

If your using the default splunk index buckets will be stored in
$SPLUNK_HOME/var/lib/splunk/defaultdb/db

Each index has a number of warm buckets which is specified in your indexes.conf ( Defaults to 300) By default, Splunk sets the bucket size to 10GB for 64bit systems and 750MB on 32bit systems.

For further info on backup, retirement, and archiving best practices see:

http://www.splunk.com/wiki/Deploy:UnderstandingBuckets
http://docs.splunk.com/Documentation/Splunk/5.0/Indexer/Setaretirementandarchivingpolicy
http://www.splunk.com/wiki/Deploy:BestPracticesForBackingUp#How_data_moves_through_Splunk