Community Blog
Get the latest updates on the Splunk Community, including member experiences, product education, events, and more!

Combine Multiline Logs into a Single Event with SOCK: a Step-by-Step Guide for Newbies

sbylica
Splunk Employee
Splunk Employee

Combine multiline logs into a single event with SOCK - a step-by-step guide for newbies

Olga Malita

The default behavior of filelog receiver is to send log lines one by one. It is the desired behavior in most cases, but it’s problematic whenever the information is too long to be kept in one line only. This can lead to fragmented and incomplete log entries, making it challenging to analyze or interpret the logged data coherently.

In this blog, you’ll learn how to configure multilineConfigs in SOCK (Splunk Otel Collector for Kubernetes) to handle multiline log entries appropriately, ensuring that complex information is presented in a structured and readable manner.

Problem statement

Let’s see the example of a Java exception sent to Splunk by SOCK:

sbylica_0-1713545377261.png

If we click Event Actions > Show Source (with Wrap results enabled) we’ll see such a log snippet:

sbylica_1-1713545377139.png

 

So we can see that the one-by-one approach doesn’t work in this case.

The solution - java traces

Even though the recombine operator is the right tool to use in this case, we don’t need to prepare the config on our own as this is already simplified by the multilineConfigs option in SOCK. Check out the example below, and read more about it in this documentation

Screenshot 2024-04-24 at 11.12.58.png

My java pod comes from the deployment java-app-deployment (in default namespace) and its container static name is java-app-container, so the config above is enough to identify all the logs that must be processed and recombined according to the pattern. Alternatively, we could configure something like this:

Screenshot 2024-04-24 at 10.57.30.png

Here we use podName as the main datapoint that identifies the log source, but it is defined as regex (because the pod names contain dynamic IDs that are different every time the pod is triggered). Because using regex is resource intensive, we should avoid using it whenever we can, so the first version of the config is much better than the second one.

The most important thing here is the firstEntryRegex that specifies what pattern regex should find a match to treat the line as the first line of the block sequence. In this case, the example is ^[^\s] which means “it shouldn’t start with the whitespace character”.

Let’s see it on the regex101:

sbylica_2-1713545377133.png

 

Note that you don’t need to create a regex to match the whole line - the beginning of it is enough.

Now we’re ready to apply the configuration and see the result in Splunk:

sbylica_3-1713545377176.png

 

Finally, it looks exactly how it should be.

The solution - curl outputs

Another example might be the processing of outputs of a system command - curl, a powerful tool for making HTTP requests and retrieving data from URLs.

This is how it looks in Splunk:

sbylica_4-1713545377268.png

 

The source is:

sbylica_5-1713545377108.png

 

We can see that curl blocks start with either <HTML> or the <html> marker. The multiline config in this case might look like this:

Screenshot 2024-04-24 at 10.57.43.png

As you can see the firstEntryRegex is defined as (^<html>)|(^<HTML).

After applying the config the result in Splunk has changed to be:

sbylica_6-1713545377186.png

 

The solution - logs with the timestamp

Logs likely include a timestamp at the beginning of every line. This is a good sign for us whenever we want to process it with a multiline config, as timestamps are easy to express as regexes. The example below shows such a log of a Java application sent without any configured processing to Splunk:

sbylica_7-1713545377260.png

 

This is a similar case to the one from above, but this time, simple ^[^\s] won’t work. Looking for the whitespaces as the beginning groups exceptions together, but still, it doesn’t connect it with the proper beginning of the log - the timestamp.

Using regex101 we can create the following regex:

sbylica_8-1713545377143.png

 

And incorporate it into the SOCK configuration:

Screenshot 2024-04-24 at 10.57.56.png

After reloading, the logs are correctly grouped:

sbylica_9-1713545377291.png

Conclusion

 

This article explored multiple ways of combining multiline log entries into one. This is a useful feature, as such logs are commonly encountered, and parsing them correctly greatly helps with working with them down the line.

Tags (1)
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...