I'm having problems getting splunk to re-index data. Here are the steps I've taken:
Created a data input file from a shared folder on another computer
indexed into test index
checked the data, made sure everything was correct
disabled the data input
deleted data in the test index by using | delete
in the CLI, stopped splunk
ran
splunk clean eventdata -index test
splunk start
changed the data input to send to main index
started enabled data input
I was expecting the data to be re-indexed, but this hasn't happened.
clean the fishbucket
Yes, but cleaning the fishbucket will reset the status of all inputs - meaning that Splunk will reindex everything again, not just the one file or directory
You might find some help in this answer (even though you wouldn't guess from the name). It shows how to eliminate a single file entry from the fishbucket. Since the fishbucket is where data files are "remembered," this should cause Splunk to forget that it once indexed this file.
I updated that post - because I was wrong! Gack!! So you might want to look again...
According to that post it is no longer possible to delete single files from the fishbucket.
"Splunk no longer lets you look at the fishbucket index. You cannot manage the specific records. The format is not published and the files are kept in binary. Sorry"
I'd love to just delete the entry from the fishbucket. However, how do I find out the file name to delete? There is other valid data in the fishbucket that I don't want to get rid of. Also, this data source is a directory with 1 file per entry. I want to re-index the directory.
The input you created remembers data files that have already been indexed regardless of whether or not the index still exists or still has the data. You need to create a new input exactly as the one you have, but with a slightly different name and pointing to the right index, then poof your data will be re-indexed.
Mention a new sourcetype name and give it a try..
Now that you mention it, yes. For a directory or file monitor the inputName is the path.
I was thinking last night that you should use crc salt to reindex what is there, and then remove crc salt (because it can cause trouble down the road).
To use crcSalt you need to add the line to your input stanza:
crcSalt=
Here is what the documents say about using crcSalt=
Don't forget to delete the line and the original files after the original files are reindexed.
I'm not sure where the inputName is to be changed then. I'm using the Data Input - Files & Directories method to pull data off a network shared folder. Is the input name the path to the data? So I would need to change the log location?
Just to be clear, it is the inputName that is important. You need to give it a new inputName. I don't think you need to change anything else.
I can't help with the fishbucket thing cause I've not done that yet, and you are correct - there is other information in there that you don't want to delete.
I had tried earlier to delete the input, then re-create it, however I used the same host name. After reading your post, I deleted the input, then re-created it with a different host name. The data isn't being re-indexed. Do I need to create a new sourceType?
Good to know and makes sense. I'll keep that in mind.
I am not sure what happened here, but I do know one thing. If you do -
splunk clean eventdata -index test
splunk start
| delete
first