Getting Data In

Sourcetypes with Docker and HTTP Event Collector

sloshburch
Splunk Employee
Splunk Employee

(Trying to pull a few similar discussions together and recorded for posterity)

Challenge

The current Docker Logging Driver for Splunk sends HTTP events to Splunk as single-line JSON events because Docker treats each line flushed to stdout / stderror as a separate event (thanks @halr9000). Example:
json_event

  1. Notice that those ideally are our access_combined data but since the data is json, we can't get all the field parsing that comes with the out-of-the-box access_combined. This means that you're in a pickle trying to sourcetype the line payload.
  2. Multi-line events, like java stack traces, arrive line by line with this implementation because the connection is not held open until the the event finishes (thanks @Michael Wilde).

How can this be addressed to enjoy the power of my existing sourcetypes with this HTTP Event Collector payload from Docker?

1 Solution

sloshburch
Splunk Employee
Splunk Employee

Solution

The strongest solution is in the works! That is for the Docker Logging Driver for Splunk to transmit HTTP Event Collector in raw mode (rather than json), so the events won’t get surrounded by JSON and our normal field extraction stuff will work.
Our yogi @Michael Wilde has been tracking a PR with Docker for specifically this. If and when that's implemented, I hope to update this accordingly.

View solution in original post

mpflugfelder
Engager

I'm dealing with problem #1 above. I'm running the docker logging driver, getting apache logs from my container and they're being sent to splunk, however they are not being parsed into their respective fields. I have set the splunk-format to raw in my container, but that doesn't seem to have helped. After a week of searching and digging around on slack, I finally found someone that's described my problem exactly, but I still don't know what to do to resolve it. If anyone is still watching this thread. is there something you could do to help me here?

Thanks!

sloshburch
Splunk Employee
Splunk Employee

Workaround(s)

Whilst we wait for the ideal solution, here's some workarounds to consider:

  1. Traditional UF Monitor: Have the container write its logs to a volume on the host and use a universal forwarder to pick them up from the host and move them to the indexer. Keep in mind that it's not a bad idea to have a forwarder on the host anyway so you can see things outside of the containers. Again, thank you to @Michael Wilde for this premise!
  2. Sourcetype Override: Use a props and transforms to override the sourcetype to rip out the line's payload (similar to this) and rename to the desired sourcetype. With a sourcetype override, the field parsing of the sourcetype name is what is used.

Other ideas? Post 'em up!

sloshburch
Splunk Employee
Splunk Employee
0 Karma

sloshburch
Splunk Employee
Splunk Employee

Solution

The strongest solution is in the works! That is for the Docker Logging Driver for Splunk to transmit HTTP Event Collector in raw mode (rather than json), so the events won’t get surrounded by JSON and our normal field extraction stuff will work.
Our yogi @Michael Wilde has been tracking a PR with Docker for specifically this. If and when that's implemented, I hope to update this accordingly.

mattymo
Splunk Employee
Splunk Employee

Greetings from the future!

it is 2023 and the best way to handle docker logs is the OTel agent which has features to handle multiline log reassembly and much more!

this replaces the guidance I provided to use fluentd-hec, which still works, but will be end of support january 2024.

i do not recommend using the docker log driver or plugin at all. 

if you read this and need help hit me up on splk.it/slack usergroups or check out the otel collector repo 

https://github.com/signalfx/splunk-otel-collector

 

 

- MattyMo
0 Karma

hpant
New Member

Splunk team : do you have any working solution for supporting Multi-line events, like java stack traces?
It will be nice if spunk EC can support mutiline configuration and make more readable way to see error message.

It is hard to see proper error message with current format where each line is being treaded separate event in splunk UI.

0 Karma

mattymo
Splunk Employee
Splunk Employee

Use our otel agent and you can use its recombine operator to solve for multiline logs.

- MattyMo
0 Karma

sloshburch
Splunk Employee
Splunk Employee

Hi @hpant - Yes and No. From what I understand, the limitation is that Docker itself outputs each line of multiline events as single lines. Theoretically, you could stitch such single lines together in Splunk into a multiline event BUT that might not work when each line comes into Splunk with such atomicity (as in, delay between lines of the same stack trace such that they appear to want to be distinct).

Shout out to anyone who knows better than I to correct me on this.

0 Karma

nbharati
New Member

Hi Splunk Team,
Is there any working solution for joining multi-line docker logs? If yes, please share the solution.

0 Karma

mattymo
Splunk Employee
Splunk Employee

Yes, check out our otel agent!

- MattyMo
0 Karma

sloshburch
Splunk Employee
Splunk Employee

Nothing at this time but let me see if i can get the attention of people smarter than myself on the topic.

0 Karma

mattymo
Splunk Employee
Splunk Employee

Hi Team,

I suggest looking at our https://hub.docker.com/r/splunk/fluentd-hec images we released as part of the Splunk Connect for Kubernetes project. (hint: its a gold mine, lots of application outside docker/k8s too! Should simplify the major integrations into one connector)

It allows dynamic sourcetyping through our jq_transformer plugin and has multi-line capabilities similar to Splunk "Line Breakers" via the concat plugin.

Here is a super simple config to PoC the idea. Basically you would return docker to the default JSON driver, and review it's log rotation settings, then use our image to collect logs and metadata from /var/lib/containers. I like this as a practitioner, because there is a clear demarc between us and docker.

See https://kubernetes.io/docs/concepts/cluster-administration/logging/#using-a-node-logging-agent

I have demo configs here:
https://github.com/matthewmodestino/container_workshop/tree/master/fluentd_docker

https://github.com/matthewmodestino/container_workshop/blob/master/docs/01-docker-lab.md#splunk-flue...

If that's not your style, check out collectord https://collectord.io/

- MattyMo
0 Karma

fatemabwudel
Path Finder

What are the requirements to run this image?

 

I am also struggling with the same issue as everyone here, i.e. multiline raw docker logs for a postgreSQL application. I wanted to give this solution a shot rather than writing multi-line parsing in props/transforms on my indexers.

 

I pulled down the image and when I run it, I get following error message:

 

$ docker run splunk/fluentd-hec

bundler: failed to load command: fluentd (/usr/bin/fluentd)

/usr/share/gems/gems/fluentd-1.11.5/lib/fluent/config.rb:31:in `initialize': No such file or directory @ rb_sysopen - /fluentd/etc/fluent.conf (Errno::ENOENT)

from /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/config.rb:31:in `open'

from /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/config.rb:31:in `build'

from /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/supervisor.rb:634:in `configure'

from /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/command/fluentd.rb:340:in `<top (required)>'

from /usr/share/gems/gems/fluentd-1.11.5/bin/fluentd:8:in `require'

from /usr/share/gems/gems/fluentd-1.11.5/bin/fluentd:8:in `<top (required)>'

from /usr/bin/fluentd:23:in `load'

from /usr/bin/fluentd:23:in `<top (required)>'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli/exec.rb:58:in `load'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli/exec.rb:58:in `kernel_load'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli/exec.rb:23:in `run'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli.rb:478:in `exec'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/vendor/thor/lib/thor.rb:392:in `dispatch'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli.rb:31:in `dispatch'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/vendor/thor/lib/thor/base.rb:485:in `start'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/cli.rb:25:in `start'

from /usr/local/share/gems/gems/bundler-2.2.31/exe/bundle:49:in `block in <top (required)>'

from /usr/local/share/gems/gems/bundler-2.2.31/lib/bundler/friendly_errors.rb:103:in `with_friendly_errors'

from /usr/local/share/gems/gems/bundler-2.2.31/exe/bundle:37:in `<top (required)>'

from /usr/local/bin/bundle:23:in `load'

from /usr/local/bin/bundle:23:in `<main>'

Tags (1)
0 Karma

mattymo
Splunk Employee
Splunk Employee

Check out our otel image. It replaces fluentd-hec anyways.

- MattyMo
0 Karma

bhavesh91
New Member

How can the fields which are separated by colon like “line” , “tag” and “source” be extracted automatically on source=http:docker for Docker logs while using Http Event Collector , also if the docker logs have the Key Value in the logs how can those appear as fields in Splunk?

For example the log has the following :

{ [-] 
   line: 2016-11-14 15:22:03,779; [LOG=debug, NAME=bhav, TIME=1,MSG=Goodday, CLIENT=127.0.0.1] 
   source: stdout
   tag: abc02be1be4e 
}

I need to see line , source and tag as fields , along with that KV pair should also show up fields like LOG, NAME, MSG and CLIENT .

Can this be done if so how ? We would want a permanent solution so that it can be applied Enterprise wise.

0 Karma

sloshburch
Splunk Employee
Splunk Employee

(Sounds like this is in regards to using the log driver so I've moved this comment to that solution rather than the answer related to more traditional approaches)

Odd that those fields are not already parsed for you. Does the sourcetype have a props.conf entry for KV_MODE = json? What is the sourcetype being used and where was it defined (by you or by an app from splunkbase)?

0 Karma

bhavesh91
New Member

We have been using the default sourcetype - json_no_timestamp which shows up on the Data Inputs -> Http Event Collector.

0 Karma

sloshburch
Splunk Employee
Splunk Employee

I'm not sure if json_no_timestamp is an out-of-the-box sourcetype. What is the value of KV_MODE for that sourcetype (Settings -> Sourcetypes)? In fact, maybe provide a screen shot of that sourectypes definition from the web UI (or btool - but not the conf file).

0 Karma

bhavesh91
New Member

Also this brings us to a point where we will need to start monitoring the GC(Garbage Collection) and Heap exhaustion in a containers using Splunk - how do we that ?

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Splunk is a dummy here and just simply accepting data the container sends so you'll need the container to send that data as well. So you'll need to expose data from the GC and heap of the container to Splunk over the same driver.

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...