Solved: How can I find all duplicate bucket id's that are ...

jbsplunk · ‎11-21-2011

I've got an error like this:

ERROR IndexProcessor - caught exception for index=indexname during initialzation: 'Splunk has detected that a directory has been manually copied into its database, causing id conflicts [/opt/splunk/var/lib/splunk/indexname/db/db_epoch_epoch_1, /opt/splunk/var/lib/splunk/indexname/db/hot_v1_1].'.Disabling the index, please fix-up and run splunk enable index.

I try to fix it using the instructions here:

http://splunk-base.splunk.com/answers/23536/moving-indexes-to-a-new-splunk-server

But I keep finding more conflicts. How can I find all the bucket id conflicts and fix them?

Masa · ‎11-21-2011

Go to your $SPLUNK_DB directory
```
 # cd $SPLUNK_DB 
```

Run the following one liner

( The earliest and latest time information will be removed from the buckets.)

 # find . -maxdepth 3 -mindepth 3 -type d  | grep -P "db_\d{10}|hot" |  sed 's/(db_)[0-9]_[0-9]([0-9]*$)/\1\2/' | sed  's/(hot)_v1([0-9]*$)/db_\2 \1/' | awk '{a[$1]++} END { for ( i in a ) print a[i], "\t", i}' | sort -rn  | grep -P "^([2-9] |[1-9][0-9]+)"

Look for the id(s) in the index database(s). They could be hot, warm, or cold buckets id.

Here is an output example. I used var/lib/splunk as $SPLUNK_DB.

# find var/lib/splunk -maxdepth 3 -mindepth 3 -type d  | grep -P "db_\d{10}_|hot_" |  sed -e 's/\(db_\)[0-9]*_[0-9]*_\([0-9]*$\)/\1\2/' -e  's/\(hot\)_v1_\([0-9]*$\)/db_\2 \1/' | awk '{a[$1]++} END { for ( i in a ) print a[i], "\t", i}' | sort -rn  | grep -P "^([2-9] |[1-9][0-9]+)" 
2        var/lib/splunk/os/db/db_167
2        var/lib/splunk/os/db/db_111
2        var/lib/splunk/defaultdb/db/db_9
2        var/lib/splunk/defaultdb/db/db_7
2        var/lib/splunk/defaultdb/db/db_6
2        var/lib/splunk/defaultdb/db/db_4
2        var/lib/splunk/defaultdb/db/db_3
2        var/lib/splunk/defaultdb/db/db_2
2        var/lib/splunk/defaultdb/db/db_1
2        var/lib/splunk/defaultdb/db/db_0

Of course, you can not run this in Windows....

View solution in original post

vicbanta · ‎12-05-2016

There are a couple of escape chars missing in step 2 of the accepted answer.
This updated version ran without error for me.

find . -maxdepth 3 -mindepth 3 -type d | grep -P "db_\d{10}|hot" | sed 's/(db_)[0-9]_[0-9]([0-9]$)/\1\2/' | sed 's/(warm)v1([0-9]$)/db\ 2\ 1/' | awk '{a[$1]++} END { for ( i in a ) print a[i], "\t", i}' | sort -rn | grep -P "^([2-9] |[1-9][0-9]+)"

wsnyder2 · ‎10-25-2013

Here is an easy way to look for duplicates on Linux,
cd (directory where the all the indexes live)
ls -R | cut -d'_' -f4 | sort -n | uniq -c | grep -v "1 [1-9]"

sowings · ‎10-25-2013

Note that with a clustered index, you'll have to take some other things into account. Each clustered indexer starts counting at 0 for new buckets, so you might have legitimate overlap. The bucket name also includes the source server GUID in fifth position (cut -d'_' -f5).

http://docs.splunk.com/Documentation/Splunk/6.0/Indexer/HowSplunkstoresindexes

yannK · ‎12-19-2012

Also when I have a list of conflits in splunkd.log on linux, I use splunk to generate my script to fix them (by incrementing and moving them)

example with increment of 100 to the bucket id.

index=_internal source=*splunkd.log*  "conflicts [" | rex "conflicts \[(?)," | rex field=path "(?.*)_\d+$" | rex "_(?\d+)$" | convert num(id) | eval id=id+100 | eval _raw="mv ".path." ".shortpath."_".id

sdwilkerson · ‎11-21-2011

JBSplunk,

It looks like Masa contributed a great solution, but I wanted to share what I generally use as well, since it is shorter syntax (less to remember/paste).

I use Larry Wall's "rename" perl script. (Remember, Larry Wall is the father of perl). This rename comes stock on debian/ubuntu but it is NOT the same as the CentOS/RHEL rename. To use it on CentOS/RHEL I download a fresh copy from the Internet.

Larry Wall's "rename" takes sed-style matches, so the following will work:

cd $SPLUNK_DB/targetindex/(db|colddb) (Where targetindex is the name of the index you want to manipulate)
START=XXX (where XXX is the bucket number you want to start counting on)
Run the following: for i in `ls -rtd db_`; do START=$(($START + 1)); rename -nv "s/\d+$/$START/" $i ;done*
- The ls with rtd args, sorts reverse by time and only shows the directories, so the oldest bucket will be listed first and therefore will be the first bucket in your list to keep the order somewhat the way it would have been originally.
- The business with the START thing helps to maintain a counter so you can move through them one at a time. This is important when you are doing both the HOT then COLD DBs, and therefore you need to start on a specific number.

Sean

satyenshah · ‎09-03-2021

To manually merge buckets from multiple legacy indexers onto one new indexer, I used these commands which work on RHEL7/8:

1. on the legacy indexers, in indexes.conf, set maxVolumeDataSizeMB=400 for the warm volume to force all buckets to roll to cold

2. on the legacy indexers, in /etc/sudoers add

johndoe ALL=NOPASSWD:/usr/bin/rsync

to enable passwordless sudo with rsync

3. on the new indexer, run this command to rysnc cold buckets from each legacy indexer to a subfolder on the new indexer:

sudo rsync --delete --compress-level=0 -aPe ssh --rsync-path="sudo rsync" johndoe@192.168.1.108:/splunk_cold /splunk_cold/idx8

in the above case, 192.168.1.108 is the address for legacy indexer#8

4. on the new indexer, run this command to renumber the buckets from each indexer

for i in /splunk_cold/idx8/*/colddb ; do echo $i ; cd $i ; ls ; for f in ` ls -rtd db_* `; do jj=` echo $f | cut -d "_" -f 4 `; kk=$(($jj + 8000)) ; ff=` echo $f | sed -e "s/_$jj\$/_$kk/" ` ; mv $f $ff ; done ; done

in the above case, each bucket from indexer#8 with 3-digit id=xxx is changed to id=8xxx

5. on the new indexer, run this command to merge each subfolder to the parent folder

cd /splunk_cold/idx8 ; 596 find -type d -exec mkdir -vp "/splunk_cold"/{} \; -or -exec mv -nv {} "/splunk_cold"/{} \;

sdwilkerson · ‎11-21-2011

Masa,
Good catch. Thanks.

Masa · ‎11-21-2011

Sean:

Thanks for sharing the alternative way. Just like Perl, there are more than one way to do it.

ls -rtd db_*

This needs to be in the specific directory like defaultdb/db or defaultdb/colddb. In such case, it'd be a lot easier. The one liner is to go through all the index database directories.

Masa · ‎11-21-2011

Go to your $SPLUNK_DB directory
```
 # cd $SPLUNK_DB 
```

Run the following one liner

( The earliest and latest time information will be removed from the buckets.)

 # find . -maxdepth 3 -mindepth 3 -type d  | grep -P "db_\d{10}|hot" |  sed 's/(db_)[0-9]_[0-9]([0-9]*$)/\1\2/' | sed  's/(hot)_v1([0-9]*$)/db_\2 \1/' | awk '{a[$1]++} END { for ( i in a ) print a[i], "\t", i}' | sort -rn  | grep -P "^([2-9] |[1-9][0-9]+)"

Look for the id(s) in the index database(s). They could be hot, warm, or cold buckets id.

Here is an output example. I used var/lib/splunk as $SPLUNK_DB.

# find var/lib/splunk -maxdepth 3 -mindepth 3 -type d  | grep -P "db_\d{10}_|hot_" |  sed -e 's/\(db_\)[0-9]*_[0-9]*_\([0-9]*$\)/\1\2/' -e  's/\(hot\)_v1_\([0-9]*$\)/db_\2 \1/' | awk '{a[$1]++} END { for ( i in a ) print a[i], "\t", i}' | sort -rn  | grep -P "^([2-9] |[1-9][0-9]+)" 
2        var/lib/splunk/os/db/db_167
2        var/lib/splunk/os/db/db_111
2        var/lib/splunk/defaultdb/db/db_9
2        var/lib/splunk/defaultdb/db/db_7
2        var/lib/splunk/defaultdb/db/db_6
2        var/lib/splunk/defaultdb/db/db_4
2        var/lib/splunk/defaultdb/db/db_3
2        var/lib/splunk/defaultdb/db/db_2
2        var/lib/splunk/defaultdb/db/db_1
2        var/lib/splunk/defaultdb/db/db_0

Of course, you can not run this in Windows....

jbsplunk · ‎11-21-2011

Thanks, this is exactly what I needed!

How can I find all duplicate bucket id's that are causing conflicts in my index?

Splunk Classroom Chronicles: Training Tales and Testimonials (Episode 2)

Index This | I am a number but I am countless. What am I?

What’s New in Splunk Enterprise 9.4: Tools for Digital Resilience