I have a Storm project and I want to clean all and reindex only the last days, and some specific files.
I have Splunk Universal forwarders monitoring my files for now.
I suppose that this is similar for a Splunk Enterprise, when we clear an index and Storm when we manually empty a project.
First of all even before reindexing, to configure Splunk to index only recent data, you can use the 2 techniques:
for file monitoring, add the parameter
ignoreOlderThan in inputs.conf
It will look at the modtime of the files, example :
ignoreOlderThan=7d will index only files touched during the last 7 days. On linux you can couple this with the
touch command to change the modtime of a file and trigger the indexing.
for WinEventLogs, you can setup the parameter
current_only=1 in inputs.conf to exclude the historical logs, and starts only now.
Now that you have setup your inputs to avoid blastering your instance, you can focus on How to force a splunk instance to reindex a file that has already be indexed.
splunk clean eventdata -index _fishbucket
on a forwarder by removing the folder $SPLUNK_HOME/var/lib/splunk/fishbucket
or selectivelly forgot a single file from the fishbucket
splunk cmd btprobe -d $SPLUNKHOME/var/lib/splunk/fishbucket/splunkprivate_db --file $FILE --reset
manually reindex each file with the oneshot option,
you also can edit the log file and add a comment on the first line that will force the file to detected as a new file.
./splunk add oneshot "/path/to/my/file.log" -sourcetype mysourcetype
modify the first line of the files to reindex, by default splunk checks the first 256 chars of a file to differentiate them. If you had a simple comment on the first line it wil reindex it
change the crcSalt, create a new input for a new folder, add all the correct sourcetypes, etc...
using a static string that will force a one time reindexing.
or add the option
then move or copy the files to be reindex to the folder, they will be detected as new (because the path will be considered in the crc calculation). (ps the source field will be different of course.)
Remark : before reindexing you may want to remove the existing data in splunk to avoid duplicates.
|deletecommand to selectively hide some events. see http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Delete
Remark : if you are monitoring windows logs (wineventlog) or are using modular inputs, the counters are not in the fishbucket.
you need to clear the checkpoints files in
This method doesn't work with the Splunk 6 Forwarder but I found if you remove all directories in C:\Program Files\SplunkUniversalForwarder\var\lib\splunk, this will force Splunk to reindex all the Window's logs. You have to remove all of them.
Will any of these methods work for re-indexing the data from an API? Many of the resources I've found only mention log files when speaking of re-indexing. My data input is an API. I am able to clean the index for this API, but want to ensure I can re-index all the data.
jgreen12 please open a new question as this question is answered (and very old)
It may or may not relate to modular inputs and there may be a checkpoint file keeping track of the data it has obtained but a new question would make more sense here rather than guessing...
Agreed. Also, you may need to check with the creator of that particular add-on. Once you create the new question thread, link to it here and we can jump over there.