Getting Data In

How to explore the fishbucket to analyze file indexing

chris
Motivator

Hi

We recently had a problem with one type of our indexed log files suddenly being recognized as binary.

This is the message we saw in splunkd.log: WARN FileClassifierManager - Invalid file: /xy/mylog.log, reason: binary.

We don't know why this happened it is an xml log, the format may have changed this type of file did get indexed befor.

So we changed our props.conf on the indexer and added the following parameter for our sources: NO_BINARY_CHECK = true We already had this parameter previusly: CHARSET = ISO-8859-2

We only saw very few events and a lot of warning messages: WARN UTF8Processor - Using charset UTF-8 for events from 'source::xxx|host::xxx|remoteport::33270', as the monitor is believed over the raw text which may be ISO-8859-2

So that made us think, that our config CHARSET setting (and therefore the NO_BINARY_CHECK) were not working. After reading this article in the wiki: http://www.splunk.com/wiki/Where_do_I_configure_my_Splunk_settings%3F we moved the props.conf settings to our light forwarders

The ISO-8859-2 warnings disappeared and one log file of that type got indexed, but we have several such files. Some were missing.

I ended up deleting the fishbucket index on the light forwarder and all files are indexed properly now.

So I'm guessing that something in the fishbucket prevented those files from being indexed. After reading this (old) blog post from Andrea Longo: http://blogs.splunk.com/2008/08/14/what-is-this-fishbucket-thing/ I was hoping that I could search the _fishbucket index (on the light forwarder) and remove entries for the files that are not being indexed if I have a similar case in the future.

My first question is: Is this a doable approach or have I missunderstood the problem/Is there a better way to resolve such issues?

My second question is The fishbucket index on all our instances exists, but it is empty (viewing indexes from the Splunk Manager). How do I enable it on the indexer and is it possible to enable it and make it searchable on a SplunkLightForwarder somehow?

Thank you for helping me.

Edit-- Enabling the following debug settings $SPLUNK_HOME/etc/log.cfg helps showing whether new data from a file is detected by splunk category.FileInputTracker=DEBUG category.selectProcessor=DEBUG category.TailingProcessor=DEBUG This is documented in: http://www.splunk.com/wiki/Community:Troubleshooting_Monitor_Inputs

1 Solution

gkanapathy
Splunk Employee
Splunk Employee

The big problem with the fishbucket stuff that Andrea wrote about is that it does not apply in 4.x and up. It's accurate if you have a 3.x forwarder, but 4.x no longer stores the data in a Splunk index (it wasn't a good idea in the first place, though it was convenient for some purposes), but rather in the splunk_private_db inside the fishbucket index location. You can kind of examine the data using the $SPLUNK_HOME/bin/btprobe tool, but it's not that helpful, in particular because we are now only storing the hash and position, and not recording any of the other information that used to be in the fishbucket index.

I think there might be some plans to add back some tools and info to get some of this functionality back, but you might want to file ERs on it.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

The big problem with the fishbucket stuff that Andrea wrote about is that it does not apply in 4.x and up. It's accurate if you have a 3.x forwarder, but 4.x no longer stores the data in a Splunk index (it wasn't a good idea in the first place, though it was convenient for some purposes), but rather in the splunk_private_db inside the fishbucket index location. You can kind of examine the data using the $SPLUNK_HOME/bin/btprobe tool, but it's not that helpful, in particular because we are now only storing the hash and position, and not recording any of the other information that used to be in the fishbucket index.

I think there might be some plans to add back some tools and info to get some of this functionality back, but you might want to file ERs on it.

gkanapathy
Splunk Employee
Splunk Employee

All you need to do is change the maxKBps setting in limits.conf to increase that.

0 Karma

chris
Motivator

Thank you for the quick reply. The main problem we had was that our forwarder was limited to sending at 256KBps and the Server sometimes needs slightly more than that so sometimes we didn't see any new events for up to almost 30min from some logs. It took us a while, things look better now.

0 Karma

Lowell
Super Champion

That's a great question Chris. I'm looking forward to a great answer too; I'd like to understand the fishbucket index better as well.

0 Karma
Get Updates on the Splunk Community!

What’s new on Splunk Lantern in August

This month’s Splunk Lantern update gives you the low-down on all of the articles we’ve published over the past ...

Welcome to the Future of Data Search & Exploration

You have more data coming at you than ever before. Over the next five years, the total amount of digital data ...

This Week's Community Digest - Splunk Community Happenings [8.3.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...