Getting Data In

How to reindex data from a forwarder

mataharry
Communicator

I have a Storm project and I want to clean all and reindex only the last days, and some specific files.
I have Splunk Universal forwarders monitoring my files for now.

I suppose that this is similar for a Splunk Enterprise, when we clear an index and Storm when we manually empty a project.

Tags (3)
1 Solution

yannK
Splunk Employee
Splunk Employee

First of all even before reindexing, to configure Splunk to index only recent data, you can use the 2 techniques:

  • for file monitoring, add the parameter ignoreOlderThan in inputs.conf
    It will look at the modtime of the files, example : ignoreOlderThan=7d will index only files touched during the last 7 days. On linux you can couple this with the touch command to change the modtime of a file and trigger the indexing.
    see http://docs.splunk.com/Documentation/Splunk/latest/admin/Inputsconf

  • for WinEventLogs, you can setup the parameter current_only=1 in inputs.conf to exclude the historical logs, and starts only now.


Now that you have setup your inputs to avoid blastering your instance, you can focus on How to force a splunk instance to reindex a file that has already be indexed.

  • the radical method is to clean the fishbucket index . That will remove the memory of every files, But it will reindex all.
  • on an indexer splunk clean eventdata -index _fishbucket
  • on a forwarder by removing the folder $SPLUNK_HOME/var/lib/splunk/fishbucket

  • or selectivelly forgot a single file from the fishbucket

    splunk cmd btprobe -d $SPLUNK_HOME/var/lib/splunk/fishbucket/splunk_private_db --file $FILE --reset

  • manually reindex each file with the oneshot option,
    you also can edit the log file and add a comment on the first line that will force the file to detected as a new file.

    ./splunk add oneshot "/path/to/my/file.log" -sourcetype mysourcetype

  • modify the first line of the files to reindex, by default splunk checks the first 256 chars of a file to differentiate them. If you had a simple comment on the first line it wil reindex it

  • change the crcSalt, create a new input for a new folder, add all the correct sourcetypes, etc...
    using a static string that will force a one time reindexing.

    crcSalt= REINDEXMEPLEASE

or add the option

crcSalt= <SOURCE>

then move or copy the files to be reindex to the folder, they will be detected as new (because the path will be considered in the crc calculation). (ps the source field will be different of course.)

see http://docs.splunk.com/Documentation/Splunk/latest/admin/Inputsconf


Remark : before reindexing you may want to remove the existing data in splunk to avoid duplicates.

Remark : if you are monitoring windows logs (wineventlog) or are using modular inputs, the counters are not in the fishbucket.
you need to clear the checkpoints files in $SPLUNK_HOME/var/lib/splunk/modinputs/

View solution in original post

Get Updates on the Splunk Community!

Exciting News: The AppDynamics Community Joins Splunk!

Hello Splunkers,   I’d like to introduce myself—I’m Ryan, the former AppDynamics Community Manager, and I’m ...

The All New Performance Insights for Splunk

Splunk gives you amazing tools to analyze system data and make business-critical decisions, react to issues, ...

Good Sourcetype Naming

When it comes to getting data in, one of the earliest decisions made is what to use as a sourcetype. Often, ...