topic Re: Docker logs produced in raw in Getting Data In

Docker logs produced in raw

mazzy89 — Tue, 16 Jan 2018 18:54:27 GMT

I have a Docker application which push Docker logs to Splunk. The Docker app use json-file log driver. The logs are read by the Universal Forwarder and pushed to Splunk.

The logs appears like this:

{
  "log": "json here",
  "stream": "stdout",
  "time": "time here"
}

The problem is that when Docker produces logs very fast, Splunk is not able to parse it and then all the logs will appear like raw in Splunk.

Do you have any idea which parameter might I tune?

Re: Docker logs produced in raw

outcoldman — Tue, 16 Jan 2018 21:08:24 GMT

@mazzy89 could you share the code of application you are a reference to?

on a side note, have you seen our solution https://www.outcoldsolutions.com/ to send logs and metrics to Splunk?

Re: Docker logs produced in raw

ShaneNewman — Tue, 16 Jan 2018 23:09:18 GMT

You might create a new sourcetype definition for this dataset in the props.conf that lands on the indexers and set it up something like this:

[docker:json]
NO_BINARY_CHECK=1
TIME_PREFIX = \"time\"\:\s+\"
MAX_TIMESTAMP_LOOKAHEAD = 200 (or larger)
TIME_FORMAT = %a %b %d %H:%M:%S %Y (for example)
TRUNCATE = 999999
BREAK_ONLY_BEFORE = ^\{\s+\"log\"
MUST_BREAK_AFTER = <timestamp_format_regex>:\s+\}

You will certainly have to update the regexes and such but that should get you most of the way there.

Re: Docker logs produced in raw

mattymo — Tue, 16 Jan 2018 23:43:49 GMT

Hi Mazzy,

Those are what raw docker logs look like... Can you elaborate on why you think Splunk is not keeping up???

check out the props/transforms we published to github:

https://github.com/splunk/docker-itmonitoring/tree/7.0.0-k8s

Basically the approach I took is to use a "base" sourcetype to take care of stripping the docker JSON cruft off the log and remove any random commenting:

[kubernetes]
CHARSET=UTF-8
SHOULD_LINEMERGE=false
NO_BINARY_CHECK = true
# remove docker json wrapper, then remove escapes from the quotes in the log message. 
SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*?)\\n","stream.*/\1/g
SEDCMD-2_unescapequotes = s/\\"/"/g
# another exprimental version of the sed.
#SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*)\\n","stream.*?([\n\r])/\1\2/g
category = Custom
disabled = false
pulldown_type = true
TRUNCATE=150000
TZ=UTC

Then what you can do is use source based props that are placed AHEAD of this sourcetype to apply container/app specific log parsing (see http://docs.splunk.com/Documentation/Splunk/7.0.1/Admin/Wheretofindtheconfigurationfiles )

for example, here I use a source based props for all my orders containers to implement a custom linebreaker to get multiline log support.

[source::/var/log/containers/orders-(?!db-)*.log]
#SHOULD_LINEMERGE = true
SHOULD_LINEMERGE = false
NO_BINARY_CHECK = true
#BREAK_ONLY_BEFORE = \d{4}\-\d{2}\-\d{2}\s\d{2}\:\d{2}\:\d{2}\.\d{3}
LINE_BREAKER = ([\n\r]+){"log":"[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}\s
CHARSET = UTF-8
disabled = false

This way you can leverage the Splunk pipeline order of operations to hit the source based props first, then pass it through the kubernetes sourcetype (or wharever you'd like to call the sourcetype, I just happen to working with k8s) to strip off the stuffs you dont want and then use your beloved TAs 🙂

Great deep reading on what happens, when, in the Splunk indexing pipeline.

https://wiki.splunk.com/Community:HowIndexingWorks