(Trying to pull a few similar discussions together and recorded for posterity)
The current Docker Logging Driver for Splunk sends HTTP events to Splunk as single-line JSON events because Docker treats each line flushed to stdout / stderror as a separate event (thanks @halr9000). Example:
access_combined
. This means that you're in a pickle trying to sourcetype the line
payload. How can this be addressed to enjoy the power of my existing sourcetypes with this HTTP Event Collector payload from Docker?
The strongest solution is in the works! That is for the Docker Logging Driver for Splunk to transmit HTTP Event Collector in raw mode (rather than json), so the events won’t get surrounded by JSON and our normal field extraction stuff will work.
Our yogi @Michael Wilde has been tracking a PR with Docker for specifically this. If and when that's implemented, I hope to update this accordingly.
I'm dealing with problem #1 above. I'm running the docker logging driver, getting apache logs from my container and they're being sent to splunk, however they are not being parsed into their respective fields. I have set the splunk-format to raw in my container, but that doesn't seem to have helped. After a week of searching and digging around on slack, I finally found someone that's described my problem exactly, but I still don't know what to do to resolve it. If anyone is still watching this thread. is there something you could do to help me here?
Thanks!
Whilst we wait for the ideal solution, here's some workarounds to consider:
line
's payload (similar to this) and rename to the desired sourcetype. With a sourcetype override, the field parsing of the sourcetype name is what is used.Other ideas? Post 'em up!
Cross reference to this thread: https://answers.splunk.com/answers/390219/how-to-parse-docker-logs-with-multiple-events-from.html
The strongest solution is in the works! That is for the Docker Logging Driver for Splunk to transmit HTTP Event Collector in raw mode (rather than json), so the events won’t get surrounded by JSON and our normal field extraction stuff will work.
Our yogi @Michael Wilde has been tracking a PR with Docker for specifically this. If and when that's implemented, I hope to update this accordingly.
Greetings from the future!
it is 2023 and the best way to handle docker logs is the OTel agent which has features to handle multiline log reassembly and much more!
this replaces the guidance I provided to use fluentd-hec, which still works, but will be end of support january 2024.
i do not recommend using the docker log driver or plugin at all.
if you read this and need help hit me up on splk.it/slack usergroups or check out the otel collector repo
https://github.com/signalfx/splunk-otel-collector
Splunk team : do you have any working solution for supporting Multi-line events, like java stack traces?
It will be nice if spunk EC can support mutiline configuration and make more readable way to see error message.
It is hard to see proper error message with current format where each line is being treaded separate event in splunk UI.
Use our otel agent and you can use its recombine operator to solve for multiline logs.
Hi @hpant - Yes and No. From what I understand, the limitation is that Docker itself outputs each line of multiline events as single lines. Theoretically, you could stitch such single lines together in Splunk into a multiline event BUT that might not work when each line comes into Splunk with such atomicity (as in, delay between lines of the same stack trace such that they appear to want to be distinct).
Shout out to anyone who knows better than I to correct me on this.
Hi Splunk Team,
Is there any working solution for joining multi-line docker logs? If yes, please share the solution.
Yes, check out our otel agent!
Nothing at this time but let me see if i can get the attention of people smarter than myself on the topic.
Hi Team,
I suggest looking at our https://hub.docker.com/r/splunk/fluentd-hec images we released as part of the Splunk Connect for Kubernetes project. (hint: its a gold mine, lots of application outside docker/k8s too! Should simplify the major integrations into one connector)
It allows dynamic sourcetyping through our jq_transformer
plugin and has multi-line capabilities similar to Splunk "Line Breakers" via the concat plugin.
Here is a super simple config to PoC the idea. Basically you would return docker to the default JSON driver, and review it's log rotation settings, then use our image to collect logs and metadata from /var/lib/containers
. I like this as a practitioner, because there is a clear demarc between us and docker.
See https://kubernetes.io/docs/concepts/cluster-administration/logging/#using-a-node-logging-agent
I have demo configs here:
https://github.com/matthewmodestino/container_workshop/tree/master/fluentd_docker
If that's not your style, check out collectord https://collectord.io/
What are the requirements to run this image?
I am also struggling with the same issue as everyone here, i.e. multiline raw docker logs for a postgreSQL application. I wanted to give this solution a shot rather than writing multi-line parsing in props/transforms on my indexers.
I pulled down the image and when I run it, I get following error message:
$ docker run splunk/fluentd-hec
bundler: failed to load command: fluentd (/usr/bin/fluentd)
/usr/share/gems/gems/fluentd-1.11.5/lib/fluent/config.rb:31:in `initialize': No such file or directory @ rb_sysopen - /fluentd/etc/fluent.conf (Errno::ENOENT)
from /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/config.rb:31:in `open'
from /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/config.rb:31:in `build'
from /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/supervisor.rb:634:in `configure'
from /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/command/fluentd.rb:340:in `<top (required)>'
from /usr/share/gems/gems/fluentd-1.11.5/bin/fluentd:8:in `require'
from /usr/share/gems/gems/fluentd-1.11.5/bin/fluentd:8:in `<top (required)>'
from /usr/bin/fluentd:23:in `load'
from /usr/bin/fluentd:23:in `<top (required)>'
from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli/exec.rb:58:in `load'
from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli/exec.rb:58:in `kernel_load'
from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli/exec.rb:23:in `run'
from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli.rb:478:in `exec'
from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/vendor/thor/lib/thor.rb:392:in `dispatch'
from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli.rb:31:in `dispatch'
from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/vendor/thor/lib/thor/base.rb:485:in `start'
from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli.rb:25:in `start'
from /usr/local/share/gems/gems/bundler-2.2.31/exe/bundle:49:in `block in <top (required)>'
from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/friendly_errors.rb:103:in `with_friendly_errors'
from /usr/local/share/gems/gems/bundler-2.2.31/exe/bundle:37:in `<top (required)>'
from /usr/local/bin/bundle:23:in `load'
from /usr/local/bin/bundle:23:in `<main>'
Check out our otel image. It replaces fluentd-hec anyways.
How can the fields which are separated by colon like “line” , “tag” and “source” be extracted automatically on source=http:docker for Docker logs while using Http Event Collector , also if the docker logs have the Key Value in the logs how can those appear as fields in Splunk?
For example the log has the following :
{ [-]
line: 2016-11-14 15:22:03,779; [LOG=debug, NAME=bhav, TIME=1,MSG=Goodday, CLIENT=127.0.0.1]
source: stdout
tag: abc02be1be4e
}
I need to see line , source and tag as fields , along with that KV pair should also show up fields like LOG, NAME, MSG and CLIENT .
Can this be done if so how ? We would want a permanent solution so that it can be applied Enterprise wise.
(Sounds like this is in regards to using the log driver so I've moved this comment to that solution rather than the answer related to more traditional approaches)
Odd that those fields are not already parsed for you. Does the sourcetype have a props.conf entry for KV_MODE = json
? What is the sourcetype being used and where was it defined (by you or by an app from splunkbase)?
We have been using the default sourcetype - json_no_timestamp which shows up on the Data Inputs -> Http Event Collector.
I'm not sure if json_no_timestamp
is an out-of-the-box sourcetype. What is the value of KV_MODE
for that sourcetype (Settings -> Sourcetypes)? In fact, maybe provide a screen shot of that sourectypes definition from the web UI (or btool - but not the conf file).
Also this brings us to a point where we will need to start monitoring the GC(Garbage Collection) and Heap exhaustion in a containers using Splunk - how do we that ?
Splunk is a dummy here and just simply accepting data the container sends so you'll need the container to send that data as well. So you'll need to expose data from the GC and heap of the container to Splunk over the same driver.