OK, I've got a stream of, potentially, over 100 different event formats that I want to send into Splunk. Inside each event I specify the sourcetype I'd like splunk to use to process them - it's the only reliable way to determine the format of the rest of the fields, including such things as the timestamp and searchable fields.
They come in over a TCP port where they get a sourcetype of GENERIC. The stanzas in props.conf and transforms.conf then fire and correctly change the sourcetype to the one embedded in the event. But none of the rules I've tried coding to fire off of the now correctly set sourcetype will fire.
My conclusion after 2-3 days of poking around with Splunk and reading various questions and answers is that it's actually impossible. Can someone confirm that this is due to Splunks design - basically the transforms that process the sourcetype changes are done to late in the process for it to go back and honor the timestamp and field extraction rules for the new sourcetype?
CLONESOURCETYPE=$1 in the sourcetype setting transform doesn't seem to work (and, even if it did, I suspect wouldn't inject the cloned event back into the processing flow early enough to make a difference). There's no way to trigger a specific transform based upon a REGEX match (and the best that could do is run CLONESOURCETYPE=FORMAT1).
So, really, is one TCP port per sourcetype the only way to get this to work?
Even the SDKs don't seem to be able to intercept events before they are indexed.
Guess I'm going to get the award for most unwieldy app ever - 100+ tcp port definitions 😞 😞 😞
The sourcetype override (TRANSFORMS-...) actually does happen after all the event processing steps such as Line Breaking, Timestamp Recognition etc, hence the event never actually gets to be processed with new sourcetype settings. The TRANSFORMS actually is the very last step before it's being sent to Index queue. So yes, your requirement is something that's not possible in Splunk natively due to it's design.
The other workaround that you may think of trying is to use syslog to have to event stream written to a file and then use custom script to read/process the event stream and re-write into separate files OR convert into some common format.
Ha! Logstash 😉 Should be able to get it to spit out multiple files that Splunk can then monitor. More efficient just to live with a horde of IP ports though...