I have been tasked with deploying Splunk for an organization that has an extensive syslog (multiple rsyslog & syslog-ng svrs) environment. The problem is with their naming convention. Of the hundreds of syslog sources, only 60% follow a naming convention. The remainder may be (random) IP's, or a hostname that does not align with its events. The logs are not stored in logical directories, AND..... they are unwilling to make the changes necessary to "clean it up".
The REGEX's in inputs.conf if rapidly becoming ugly, and with every change, requires complete re-validation. What do others do in this situation to manage poor syslog naming conventions, and still get the events into the proper indexes without the extensive use of REGEX in inputs.conf, and without touching the syslog conf?
This might not exactly be what you are looking for, but syslog-ng can manage lists that you can use in filters to classify your log messages (for example, to add specific message fields if the host/IP appears in a specific list), using the inlist filter, or add metadata from files. Recent versions of syslog-ng Premium Edition (the commercial version of syslog-ng) can even send log messages to Splunk HEC directly.
Thanks for the response. The challenges I must overcome are: no access to the syslog servers or the UF, and no modifications allowed to the syslog storage format. Means anything I do must be via an App @ the Universal Forwarder, or on the Indexers @ parsing time.