Olga Malita
The default behavior of filelog receiver is to send log lines one by one. It is the desired behavior in most cases, but it’s problematic whenever the information is too long to be kept in one line only. This can lead to fragmented and incomplete log entries, making it challenging to analyze or interpret the logged data coherently.
In this blog, you’ll learn how to configure multilineConfigs in SOCK (Splunk Otel Collector for Kubernetes) to handle multiline log entries appropriately, ensuring that complex information is presented in a structured and readable manner.
Let’s see the example of a Java exception sent to Splunk by SOCK:
If we click Event Actions > Show Source (with Wrap results enabled) we’ll see such a log snippet:
So we can see that the one-by-one approach doesn’t work in this case.
Even though the recombine operator is the right tool to use in this case, we don’t need to prepare the config on our own as this is already simplified by the multilineConfigs option in SOCK. Check out the example below, and read more about it in this documentation.
My java pod comes from the deployment java-app-deployment (in default namespace) and its container static name is java-app-container, so the config above is enough to identify all the logs that must be processed and recombined according to the pattern. Alternatively, we could configure something like this:
Here we use podName as the main datapoint that identifies the log source, but it is defined as regex (because the pod names contain dynamic IDs that are different every time the pod is triggered). Because using regex is resource intensive, we should avoid using it whenever we can, so the first version of the config is much better than the second one.
The most important thing here is the firstEntryRegex that specifies what pattern regex should find a match to treat the line as the first line of the block sequence. In this case, the example is ^[^\s] which means “it shouldn’t start with the whitespace character”.
Let’s see it on the regex101:
Note that you don’t need to create a regex to match the whole line - the beginning of it is enough.
Now we’re ready to apply the configuration and see the result in Splunk:
Finally, it looks exactly how it should be.
Another example might be the processing of outputs of a system command - curl, a powerful tool for making HTTP requests and retrieving data from URLs.
This is how it looks in Splunk:
The source is:
We can see that curl blocks start with either <HTML> or the <html> marker. The multiline config in this case might look like this:
As you can see the firstEntryRegex is defined as (^<html>)|(^<HTML).
After applying the config the result in Splunk has changed to be:
Logs likely include a timestamp at the beginning of every line. This is a good sign for us whenever we want to process it with a multiline config, as timestamps are easy to express as regexes. The example below shows such a log of a Java application sent without any configured processing to Splunk:
This is a similar case to the one from above, but this time, simple ^[^\s] won’t work. Looking for the whitespaces as the beginning groups exceptions together, but still, it doesn’t connect it with the proper beginning of the log - the timestamp.
Using regex101 we can create the following regex:
And incorporate it into the SOCK configuration:
After reloading, the logs are correctly grouped:
This article explored multiple ways of combining multiline log entries into one. This is a useful feature, as such logs are commonly encountered, and parsing them correctly greatly helps with working with them down the line.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.