Challenge

sloshburch · ‎09-21-2016

(Trying to pull a few similar discussions together and recorded for posterity)

Challenge

The current Docker Logging Driver for Splunk sends HTTP events to Splunk as single-line JSON events because Docker treats each line flushed to stdout / stderror as a separate event (thanks @halr9000). Example:

Notice that those ideally are our access_combined data but since the data is json, we can't get all the field parsing that comes with the out-of-the-box access_combined. This means that you're in a pickle trying to sourcetype the line payload.
Multi-line events, like java stack traces, arrive line by line with this implementation because the connection is not held open until the the event finishes (thanks @Michael Wilde).

How can this be addressed to enjoy the power of my existing sourcetypes with this HTTP Event Collector payload from Docker?

sloshburch · ‎09-21-2016

Solution

The strongest solution is in the works! That is for the Docker Logging Driver for Splunk to transmit HTTP Event Collector in raw mode (rather than json), so the events won’t get surrounded by JSON and our normal field extraction stuff will work.
Our yogi @Michael Wilde has been tracking a PR with Docker for specifically this. If and when that's implemented, I hope to update this accordingly.

View solution in original post

mpflugfelder · ‎02-28-2020

I'm dealing with problem #1 above. I'm running the docker logging driver, getting apache logs from my container and they're being sent to splunk, however they are not being parsed into their respective fields. I have set the splunk-format to raw in my container, but that doesn't seem to have helped. After a week of searching and digging around on slack, I finally found someone that's described my problem exactly, but I still don't know what to do to resolve it. If anyone is still watching this thread. is there something you could do to help me here?

Thanks!

sloshburch · ‎09-21-2016

Workaround(s)

Whilst we wait for the ideal solution, here's some workarounds to consider:

Traditional UF Monitor: Have the container write its logs to a volume on the host and use a universal forwarder to pick them up from the host and move them to the indexer. Keep in mind that it's not a bad idea to have a forwarder on the host anyway so you can see things outside of the containers. Again, thank you to @Michael Wilde for this premise!
Sourcetype Override: Use a props and transforms to override the sourcetype to rip out the line's payload (similar to this) and rename to the desired sourcetype. With a sourcetype override, the field parsing of the sourcetype name is what is used.

Other ideas? Post 'em up!

sloshburch · ‎09-21-2016

Cross reference to this thread: https://answers.splunk.com/answers/390219/how-to-parse-docker-logs-with-multiple-events-from.html

sloshburch · ‎09-21-2016

Solution

The strongest solution is in the works! That is for the Docker Logging Driver for Splunk to transmit HTTP Event Collector in raw mode (rather than json), so the events won’t get surrounded by JSON and our normal field extraction stuff will work.
Our yogi @Michael Wilde has been tracking a PR with Docker for specifically this. If and when that's implemented, I hope to update this accordingly.

mattymo · ‎02-15-2023

Greetings from the future!

it is 2023 and the best way to handle docker logs is the OTel agent which has features to handle multiline log reassembly and much more!

this replaces the guidance I provided to use fluentd-hec, which still works, but will be end of support january 2024.

i do not recommend using the docker log driver or plugin at all.

if you read this and need help hit me up on splk.it/slack usergroups or check out the otel collector repo

https://github.com/signalfx/splunk-otel-collector

- MattyMo

hpant · ‎06-25-2017

Splunk team : do you have any working solution for supporting Multi-line events, like java stack traces?
It will be nice if spunk EC can support mutiline configuration and make more readable way to see error message.

It is hard to see proper error message with current format where each line is being treaded separate event in splunk UI.

mattymo · ‎02-15-2023

Use our otel agent and you can use its recombine operator to solve for multiline logs.

- MattyMo

sloshburch · ‎06-26-2017

Hi @hpant - Yes and No. From what I understand, the limitation is that Docker itself outputs each line of multiline events as single lines. Theoretically, you could stitch such single lines together in Splunk into a multiline event BUT that might not work when each line comes into Splunk with such atomicity (as in, delay between lines of the same stack trace such that they appear to want to be distinct).

Shout out to anyone who knows better than I to correct me on this.

nbharati · ‎04-22-2019

Hi Splunk Team,
Is there any working solution for joining multi-line docker logs? If yes, please share the solution.

mattymo · ‎02-15-2023

Yes, check out our otel agent!

- MattyMo

sloshburch · ‎06-07-2019

Nothing at this time but let me see if i can get the attention of people smarter than myself on the topic.

mattymo · ‎06-07-2019

Hi Team,

I suggest looking at our https://hub.docker.com/r/splunk/fluentd-hec images we released as part of the Splunk Connect for Kubernetes project. (hint: its a gold mine, lots of application outside docker/k8s too! Should simplify the major integrations into one connector)

It allows dynamic sourcetyping through our jq_transformer plugin and has multi-line capabilities similar to Splunk "Line Breakers" via the concat plugin.

Here is a super simple config to PoC the idea. Basically you would return docker to the default JSON driver, and review it's log rotation settings, then use our image to collect logs and metadata from /var/lib/containers. I like this as a practitioner, because there is a clear demarc between us and docker.

See https://kubernetes.io/docs/concepts/cluster-administration/logging/#using-a-node-logging-agent

I have demo configs here:
https://github.com/matthewmodestino/container_workshop/tree/master/fluentd_docker

https://github.com/matthewmodestino/container_workshop/blob/master/docs/01-docker-lab.md#splunk-flue...

If that's not your style, check out collectord https://collectord.io/

- MattyMo

fatemabwudel · ‎11-09-2021

What are the requirements to run this image?

I am also struggling with the same issue as everyone here, i.e. multiline raw docker logs for a postgreSQL application. I wanted to give this solution a shot rather than writing multi-line parsing in props/transforms on my indexers.

I pulled down the image and when I run it, I get following error message:

$ docker run splunk/fluentd-hec

bundler: failed to load command: fluentd (/usr/bin/fluentd)

/usr/share/gems/gems/fluentd-1.11.5/lib/fluent/config.rb:31:in `initialize': No such file or directory @ rb_sysopen - /fluentd/etc/fluent.conf (Errno::ENOENT)

from /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/config.rb:31:in `open'

from /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/config.rb:31:in `build'

from /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/supervisor.rb:634:in `configure'

from /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/command/fluentd.rb:340:in `<top (required)>'

from /usr/share/gems/gems/fluentd-1.11.5/bin/fluentd:8:in `require'

from /usr/share/gems/gems/fluentd-1.11.5/bin/fluentd:8:in `<top (required)>'

from /usr/bin/fluentd:23:in `load'

from /usr/bin/fluentd:23:in `<top (required)>'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli/exec.rb:58:in `load'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli/exec.rb:58:in `kernel_load'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli/exec.rb:23:in `run'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli.rb:478:in `exec'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/vendor/thor/lib/thor.rb:392:in `dispatch'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli.rb:31:in `dispatch'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/vendor/thor/lib/thor/base.rb:485:in `start'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli.rb:25:in `start'

from /usr/local/share/gems/gems/bundler-2.2.31/exe/bundle:49:in `block in <top (required)>'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/friendly_errors.rb:103:in `with_friendly_errors'

from /usr/local/share/gems/gems/bundler-2.2.31/exe/bundle:37:in `<top (required)>'

from /usr/local/bin/bundle:23:in `load'

from /usr/local/bin/bundle:23:in `<main>'

mattymo · ‎02-15-2023

Check out our otel image. It replaces fluentd-hec anyways.

- MattyMo

bhavesh91 · ‎12-01-2016

How can the fields which are separated by colon like “line” , “tag” and “source” be extracted automatically on source=http:docker for Docker logs while using Http Event Collector , also if the docker logs have the Key Value in the logs how can those appear as fields in Splunk?

For example the log has the following :

{ [-] 
   line: 2016-11-14 15:22:03,779; [LOG=debug, NAME=bhav, TIME=1,MSG=Goodday, CLIENT=127.0.0.1] 
   source: stdout
   tag: abc02be1be4e 
}

I need to see line , source and tag as fields , along with that KV pair should also show up fields like LOG, NAME, MSG and CLIENT .

Can this be done if so how ? We would want a permanent solution so that it can be applied Enterprise wise.

sloshburch · ‎12-02-2016

(Sounds like this is in regards to using the log driver so I've moved this comment to that solution rather than the answer related to more traditional approaches)

Odd that those fields are not already parsed for you. Does the sourcetype have a props.conf entry for KV_MODE = json? What is the sourcetype being used and where was it defined (by you or by an app from splunkbase)?

bhavesh91 · ‎12-02-2016

We have been using the default sourcetype - json_no_timestamp which shows up on the Data Inputs -> Http Event Collector.

sloshburch · ‎12-02-2016

I'm not sure if json_no_timestamp is an out-of-the-box sourcetype. What is the value of KV_MODE for that sourcetype (Settings -> Sourcetypes)? In fact, maybe provide a screen shot of that sourectypes definition from the web UI (or btool - but not the conf file).

bhavesh91 · ‎12-04-2016

Also this brings us to a point where we will need to start monitoring the GC(Garbage Collection) and Heap exhaustion in a containers using Splunk - how do we that ?

sloshburch · ‎12-07-2016

Splunk is a dummy here and just simply accepting data the container sends so you'll need the container to send that data as well. So you'll need to expose data from the GC and heap of the container to Splunk over the same driver.

Sourcetypes with Docker and HTTP Event Collector

Challenge

Solution

Workaround(s)

Solution

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Deep Dive: Accelerate threat investigation with Splunk’s AI Assistant in Security

Announcing Modern Navigation: A New Era of Splunk User Experience

Detection Engineering Office Hours: Real-World Troubleshooting & Q&A

Join the Conversation

Sourcetypes with Docker and HTTP Event Collector

Challenge

Solution

Workaround(s)

Solution

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Deep Dive: Accelerate threat investigation with Splunk’s AI Assistant in Security

Announcing Modern Navigation: A New Era of Splunk User Experience

Detection Engineering Office Hours: Real-World Troubleshooting & Q&A