Getting Data In

How to index part of a log file that was not indexed after disk failure?

envato_dennis
New Member

We had a disk failure on our indexer. During this time, Splunk was thinking it was indexing data. We had to stop splunk, remount the disk, and start it again. However, the period that the disk went offline (containing one of our indexes) we now have a gap were we don't have any events.

The logs are still available on the application servers and they run universal forwarders.

I want to re-index just the missing 3 hour time period. If I push the whole log via one shot (containing events before and after the disk outage), I will get duplicate events as I would if I deleted the _fishbucket on the forwarders. This is production data.

What are my options in this instance?

Thanks

Tags (3)
0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

Something that more selective than deleting the entire _fishbucket is using the btprobe command:

splunk cmd btprobe -d SPLUNK_HOME/var/lib/splunk/fishbucket/splunk_private-_db --file <source> --reset

You can read more about btprobe here.

Please see @YannK 's answer here as well.

How many files are involved in that 3 hour window? Are they all within a single file? I guess hypothetically you could just parse out the portion you want to reindex, and just reindex that one section? Slightly less than desirable I'm sure though 😛

0 Karma

envato_dennis
New Member

Thanks for the reply.

Yes, so the problem is that every host has at least 16 logs that need to be done and we have around 30-40 hosts that we are really interested in.

I will investigate btprobe and report back.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...