At the moment due to various sources/sourcetypes, as well as historical hostname changes we have a lot of "duplicate" hostnames listed under "hosts" inside the Summary - Search view. One example of a host: say in /var/log/cache.log it has a hostname of linux33.ext and in /var/log/messages it has a hostname of linux33.local. But actually it's all the same host.
Is there a way to have the splunk indexer read a file of a similar format to this:
...and have Splunk show/record only the "actual_hostname" value for every time the indexer encounters one of the aliases?
I have a combination of forwarder inputs and syslog inputs on the indexer so I would like this processing to be done at the indexer itself.
Unless I misunderstand your question, I believe you can achieve this via lookups and then modifying the searches in your views, and/or fields you reference in your searches.
I created and loaded a lookup table that looks like this:
Then, executed a search using both fields; host and actual host.
password fail* | table user, host, actualhost
You can see the results below. So you could modify views that use the field host and change it to actual host to get the values you are looking for. For example, I would modify my search above to use only actual host:
password fail* | table actualhost
The summary view has a saved search running behind the scenes. You can identify and modify that saved search to show your actual hosts on the summary page too.
I would like to do it pre-index or index time.
1. A splunk indexer receives a log file entry "20101106 linux16.local rd server restarted!"
2. The indexer "rewrites" linux16.local to linux16 (please refer to the tsv file in my question)
3. The indexer saves the entry to an index file with host field = "linux16".
And same for ALL other types of log that indexer can parse (syslog, apache logs etc).