Hey everyone! I have what I would consider a complex problem, and I was hoping to get some guidance on the best way to handle it. We are attempting to log events from an OpenShift (Kubernetes) environment. So far, I've successfully gotten raw logs coming in from Splunk Connect for Kubernetes, into our heavy forwarder via HEC, and then into our indexer. The data that is being ingested has a bunch of metadata about the pod name, container name, etc. from this step. However, the problem is what to do with it from there. In the case of this specific configuration, the individual component logs are getting combined into a single stream, with a few fields of metadata attached to the beginning. After this metadata, the event 100% matches up with what I'd consider a "standard" event, or something Splunk is more used to processing. For example: tomcat-access;console;test;nothing;127.0.0.1 - - [08/Dec/2021:13:25:21 -0600] "GET /idp/status HTTP/1.1" 200 3984 This is first semicolon-delimited, and then space delimited, as follows: "tomcat-access" is the name of the container component that generated the file. "console" indicates the source (console or file name) "test" indicates the environment. "Nothing" indicates the User Token And everything after this semicolon is the real log. In this example, it is a Tomcat-access sourcetype. Compare this to another line in the same log: shib-idp;idp-process.log;test;nothing;2021-12-08 13:11:21,335 - 10.103.10.30 - INFO [Shibboleth-Audit.SSO:283] - 10.103.10.30|2021-12-08T19:10:57.659584Z|2021-12-08T19:11:21.335145Z|sttreic shib-idp is the name of the container component that generated the log idp-process.log is the source file in that component test is the environment nothing is the user token And everything after that last semicolon is the Shibboleth process log. Notably, this part uses pipes as delimiters. The SCK components, as I have them configured now, ship all these sources to "ocp:container:shibboleth" (or something like that). When they are shipped over, metadata is added for the container_name, pod_name, and other CRI-based log data. What I am aiming to do I would like to use the semicolon-delimited parts of the event to tell the heavy forwarder what sourcetypes to work with. Ideally, I would like to cut down on having to make my own sourcetypes and regex, but I can do that if I must. So for the tomcat-access example above, I'd want: All the SCK / Openshift related fields to stick with the event. The event to be chopped up into 5 segments The event type to be recognized by the first 2 fields (there is some duplication in the first field, so the second field would be the most important) The first 4 segments to be appended as field information (like "identifier" or "internal_source") The 5th segment to be exported to another sourcetype for further processing (in this case, "tomcat_localhost_access" from "Splunk_TA_tomcat"). All the other fields would stick with the system as Splunk_TA_tomcat did its field extractions. If this isn't possible, I could make a unique sourcetype transform for each event type - the source program has 8 potential sources. But that would involve quite a bit of duplication. Even as I type this out, I'm getting the sinking feeling that I'll need to just bite the bullet and make 8 different transforms. But one can hope, right? Any help would be appreciated. I've gotten through Sysadmin and data admin training, but nothing more advanced than that. I suspect I'll need to use this pattern in the future for other Openshift logs of ours, but I don't know at this stage.
... View more