Getting Data In

Docker logs produced in raw

mazzy89
Engager

I have a Docker application which push Docker logs to Splunk. The Docker app use json-file log driver. The logs are read by the Universal Forwarder and pushed to Splunk.

The logs appears like this:

{
  "log": "json here",
  "stream": "stdout",
  "time": "time here"
}

The problem is that when Docker produces logs very fast, Splunk is not able to parse it and then all the logs will appear like raw in Splunk.

Do you have any idea which parameter might I tune?

0 Karma

mattymo
Splunk Employee
Splunk Employee

Hi Mazzy,

Those are what raw docker logs look like... Can you elaborate on why you think Splunk is not keeping up???

check out the props/transforms we published to github:

https://github.com/splunk/docker-itmonitoring/tree/7.0.0-k8s

Basically the approach I took is to use a "base" sourcetype to take care of stripping the docker JSON cruft off the log and remove any random commenting:

[kubernetes]
CHARSET=UTF-8
SHOULD_LINEMERGE=false
NO_BINARY_CHECK = true
# remove docker json wrapper, then remove escapes from the quotes in the log message. 
SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*?)\\n","stream.*/\1/g
SEDCMD-2_unescapequotes = s/\\"/"/g
# another exprimental version of the sed.
#SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*)\\n","stream.*?([\n\r])/\1\2/g
category = Custom
disabled = false
pulldown_type = true
TRUNCATE=150000
TZ=UTC

Then what you can do is use source based props that are placed AHEAD of this sourcetype to apply container/app specific log parsing (see http://docs.splunk.com/Documentation/Splunk/7.0.1/Admin/Wheretofindtheconfigurationfiles )

for example, here I use a source based props for all my orders containers to implement a custom linebreaker to get multiline log support.

[source::/var/log/containers/orders-(?!db-)*.log]
#SHOULD_LINEMERGE = true
SHOULD_LINEMERGE = false
NO_BINARY_CHECK = true
#BREAK_ONLY_BEFORE = \d{4}\-\d{2}\-\d{2}\s\d{2}\:\d{2}\:\d{2}\.\d{3}
LINE_BREAKER = ([\n\r]+){"log":"[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}\s
CHARSET = UTF-8
disabled = false

This way you can leverage the Splunk pipeline order of operations to hit the source based props first, then pass it through the kubernetes sourcetype (or wharever you'd like to call the sourcetype, I just happen to working with k8s) to strip off the stuffs you dont want and then use your beloved TAs 🙂

Great deep reading on what happens, when, in the Splunk indexing pipeline.

https://wiki.splunk.com/Community:HowIndexingWorks

- MattyMo
0 Karma

ShaneNewman
Motivator

You might create a new sourcetype definition for this dataset in the props.conf that lands on the indexers and set it up something like this:

[docker:json]
NO_BINARY_CHECK=1
TIME_PREFIX = \"time\"\:\s+\"
MAX_TIMESTAMP_LOOKAHEAD = 200 (or larger)
TIME_FORMAT = %a %b %d %H:%M:%S %Y (for example)
TRUNCATE = 999999
BREAK_ONLY_BEFORE = ^\{\s+\"log\"
MUST_BREAK_AFTER = <timestamp_format_regex>:\s+\}

You will certainly have to update the regexes and such but that should get you most of the way there.

0 Karma

outcoldman
Communicator

@mazzy89 could you share the code of application you are a reference to?

on a side note, have you seen our solution https://www.outcoldsolutions.com/ to send logs and metrics to Splunk?

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...