I am indexing a couple hundred Solaris 10 BSM audit files a day. The audit files are converted to ASCII. It handles the indexing and host assignment just fine, but when it comes to the sourcetype, it gives me all sorts of crazy results. I have the sourcetype be assigned automatically during the index. There are many field extractions that I want to configure but it would be illogical to use the automatically assigned sourcetypes as it would take 2-3 fields per host. And due to the volume of sources, it wouldn't make sense to assign them by host either.
Example of what sort of sourcetypes I am seeing would be HostnameA's audit file would not only have a sourcetype of another Hostname, but the logs within that audit file would be assigned (what seems like randomly) to HostnameB-2 and HostnameB-3.
Hopefully this all makes sense, it is a rather frustrating issue. Any help would be greatly appreciated!
Set the sourcetypes manually. You can set them in inputs.conf. If that is not practical, put the sourcetype settings in a props.conf file in the same directory as the inputs.conf. Setting the sourcetype happens at input time - at index time, it is more difficult to override the sourcetype.
If you are collecting all the logs by directory, your inputs.conf will contain something like this:
Clearly, you can't just set the sourcetype here, as there are a variety of sourcetypes in the single directory. In this case, you will need to do something like this in props.conf
You should also look at the Splunk pretrained sourcetypes. In addition, that documentation page links to "how to override automatic sourcetypes" if this explanation is not enough.