I have a bit of an issue, as I typo'd a path change this morning, and ended up with about 8-10 hours of data being indexed with the wrong hostname. I have this fixed now, but would like to go back and reindex the strange data..
My log entries look like this (from various sources):
Aug 31 10:12:04 host1.mydomain.com facilityname: actual log data
Aug 31 10:12:04 host2.mydomain.com facilityname: actual log data
Aug 31 10:12:04 host13.mydomain.com facilityname: actual log data
Aug 31 10:12:04 host10.mydomain.com facilityname: actual log data
Aug 31 10:12:04 host26.mydomain.com facilityname: actual log data
Aug 31 10:12:04 host32.mydomain.com facilityname: actual log data
the problem is that I had indexed them parsing the pathname segment for hostname... this caused all of the data to end up with hostname "2010"
I can find all of the data with this search...
host="2010" | rex "(?i)^(?:[^ ]* ){3}(?P<HOSTNAMEFROMLOG>[^ ]+)"
However, Is there a way to pipe this into some command that will reindex with the different hostname?
As gkanapathy said, you can't modify data that has been already indexed.
The easiest way probably is to first export this data with the export command (remember to stop Splunk first):
$SPLUNK_HOME/bin/splunk export eventdata main -dir /tmp/events -host 2010
This will create text files in the /tmp/events directory that will contain original logs. Now you will have to delete the data and then reindex it again. Keep in mind that the delete command just marks the data so you don't see it (it stays in the index). If you have a separate index or not too much data yet, it might be better to export everything, completely delete the index and then reimport the data.
There isn't a real easy way to "fix" data after it has been indexed. You either have to (1) reload the existing data after the config fix and the delete
the incorrect data. Or (2), do some kind of behind the scenes dump/fix/import processing using exporttool
and importtool
.
You may find this wiki entry helpful:
You can not modify data that has already been indexed. The easiest way to deal with this if you have and can readily identify the original data would be to delete it using the delete
search command, and then reindex the files using splunk add oneshot
. If the files aren't available, you will have to export the data from the existing index using either $SPLUNK_HOME/bin/splunk export
, $SPLUNK_HOME/bin/splunk search
, or $SPLUNK_HOME/bin/exporttool
, then delete and reimport the data.
If you set the sourcetype of this data to be "syslog" we will automatically extract the hostname into the host field from the log lines. If you don't want to change your sourcetype, you can configure this in props.conf
with:
[<sourcetype>]
TRANSFORMS-host = syslog-host
This isn't an option for me, as I need to keep the4 sourcetype of the entries that are there.