Getting Data In

Docker logs produced in raw

mazzy89
Engager

I have a Docker application which push Docker logs to Splunk. The Docker app use json-file log driver. The logs are read by the Universal Forwarder and pushed to Splunk.

The logs appears like this:

{
  "log": "json here",
  "stream": "stdout",
  "time": "time here"
}

The problem is that when Docker produces logs very fast, Splunk is not able to parse it and then all the logs will appear like raw in Splunk.

Do you have any idea which parameter might I tune?

0 Karma

mattymo
Splunk Employee
Splunk Employee

Hi Mazzy,

Those are what raw docker logs look like... Can you elaborate on why you think Splunk is not keeping up???

check out the props/transforms we published to github:

https://github.com/splunk/docker-itmonitoring/tree/7.0.0-k8s

Basically the approach I took is to use a "base" sourcetype to take care of stripping the docker JSON cruft off the log and remove any random commenting:

[kubernetes]
CHARSET=UTF-8
SHOULD_LINEMERGE=false
NO_BINARY_CHECK = true
# remove docker json wrapper, then remove escapes from the quotes in the log message. 
SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*?)\\n","stream.*/\1/g
SEDCMD-2_unescapequotes = s/\\"/"/g
# another exprimental version of the sed.
#SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*)\\n","stream.*?([\n\r])/\1\2/g
category = Custom
disabled = false
pulldown_type = true
TRUNCATE=150000
TZ=UTC

Then what you can do is use source based props that are placed AHEAD of this sourcetype to apply container/app specific log parsing (see http://docs.splunk.com/Documentation/Splunk/7.0.1/Admin/Wheretofindtheconfigurationfiles )

for example, here I use a source based props for all my orders containers to implement a custom linebreaker to get multiline log support.

[source::/var/log/containers/orders-(?!db-)*.log]
#SHOULD_LINEMERGE = true
SHOULD_LINEMERGE = false
NO_BINARY_CHECK = true
#BREAK_ONLY_BEFORE = \d{4}\-\d{2}\-\d{2}\s\d{2}\:\d{2}\:\d{2}\.\d{3}
LINE_BREAKER = ([\n\r]+){"log":"[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}\s
CHARSET = UTF-8
disabled = false

This way you can leverage the Splunk pipeline order of operations to hit the source based props first, then pass it through the kubernetes sourcetype (or wharever you'd like to call the sourcetype, I just happen to working with k8s) to strip off the stuffs you dont want and then use your beloved TAs 🙂

Great deep reading on what happens, when, in the Splunk indexing pipeline.

https://wiki.splunk.com/Community:HowIndexingWorks

- MattyMo
0 Karma

ShaneNewman
Motivator

You might create a new sourcetype definition for this dataset in the props.conf that lands on the indexers and set it up something like this:

[docker:json]
NO_BINARY_CHECK=1
TIME_PREFIX = \"time\"\:\s+\"
MAX_TIMESTAMP_LOOKAHEAD = 200 (or larger)
TIME_FORMAT = %a %b %d %H:%M:%S %Y (for example)
TRUNCATE = 999999
BREAK_ONLY_BEFORE = ^\{\s+\"log\"
MUST_BREAK_AFTER = <timestamp_format_regex>:\s+\}

You will certainly have to update the regexes and such but that should get you most of the way there.

0 Karma

outcoldman
Communicator

@mazzy89 could you share the code of application you are a reference to?

on a side note, have you seen our solution https://www.outcoldsolutions.com/ to send logs and metrics to Splunk?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...