Getting Data In

Sourcetypes with Docker and HTTP Event Collector

Ultra Champion

(Trying to pull a few similar discussions together and recorded for posterity)

Challenge

The current Docker Logging Driver for Splunk sends HTTP events to Splunk as single-line JSON events because Docker treats each line flushed to stdout / stderror as a separate event (thanks @halr9000). Example:
json_event

  1. Notice that those ideally are our access_combined data but since the data is json, we can't get all the field parsing that comes with the out-of-the-box access_combined. This means that you're in a pickle trying to sourcetype the line payload.
  2. Multi-line events, like java stack traces, arrive line by line with this implementation because the connection is not held open until the the event finishes (thanks @Michael Wilde).

How can this be addressed to enjoy the power of my existing sourcetypes with this HTTP Event Collector payload from Docker?

1 Solution

Ultra Champion

Solution

The strongest solution is in the works! That is for the Docker Logging Driver for Splunk to transmit HTTP Event Collector in raw mode (rather than json), so the events won’t get surrounded by JSON and our normal field extraction stuff will work.
Our yogi @Michael Wilde has been tracking a PR with Docker for specifically this. If and when that's implemented, I hope to update this accordingly.

View solution in original post

Engager

I'm dealing with problem #1 above. I'm running the docker logging driver, getting apache logs from my container and they're being sent to splunk, however they are not being parsed into their respective fields. I have set the splunk-format to raw in my container, but that doesn't seem to have helped. After a week of searching and digging around on slack, I finally found someone that's described my problem exactly, but I still don't know what to do to resolve it. If anyone is still watching this thread. is there something you could do to help me here?

Thanks!

0 Karma

Ultra Champion

Workaround(s)

Whilst we wait for the ideal solution, here's some workarounds to consider:

  1. Traditional UF Monitor: Have the container write its logs to a volume on the host and use a universal forwarder to pick them up from the host and move them to the indexer. Keep in mind that it's not a bad idea to have a forwarder on the host anyway so you can see things outside of the containers. Again, thank you to @Michael Wilde for this premise!
  2. Sourcetype Override: Use a props and transforms to override the sourcetype to rip out the line's payload (similar to this) and rename to the desired sourcetype. With a sourcetype override, the field parsing of the sourcetype name is what is used.

Other ideas? Post 'em up!

Ultra Champion
0 Karma

Ultra Champion

Solution

The strongest solution is in the works! That is for the Docker Logging Driver for Splunk to transmit HTTP Event Collector in raw mode (rather than json), so the events won’t get surrounded by JSON and our normal field extraction stuff will work.
Our yogi @Michael Wilde has been tracking a PR with Docker for specifically this. If and when that's implemented, I hope to update this accordingly.

View solution in original post

New Member

Splunk team : do you have any working solution for supporting Multi-line events, like java stack traces?
It will be nice if spunk EC can support mutiline configuration and make more readable way to see error message.

It is hard to see proper error message with current format where each line is being treaded separate event in splunk UI.

0 Karma

Ultra Champion

Hi @hpant - Yes and No. From what I understand, the limitation is that Docker itself outputs each line of multiline events as single lines. Theoretically, you could stitch such single lines together in Splunk into a multiline event BUT that might not work when each line comes into Splunk with such atomicity (as in, delay between lines of the same stack trace such that they appear to want to be distinct).

Shout out to anyone who knows better than I to correct me on this.

0 Karma

New Member

Hi Splunk Team,
Is there any working solution for joining multi-line docker logs? If yes, please share the solution.

0 Karma

Ultra Champion

Nothing at this time but let me see if i can get the attention of people smarter than myself on the topic.

0 Karma

Splunk Employee
Splunk Employee

Hi Team,

I suggest looking at our https://hub.docker.com/r/splunk/fluentd-hec images we released as part of the Splunk Connect for Kubernetes project. (hint: its a gold mine, lots of application outside docker/k8s too! Should simplify the major integrations into one connector)

It allows dynamic sourcetyping through our jq_transformer plugin and has multi-line capabilities similar to Splunk "Line Breakers" via the concat plugin.

Here is a super simple config to PoC the idea. Basically you would return docker to the default JSON driver, and review it's log rotation settings, then use our image to collect logs and metadata from /var/lib/containers. I like this as a practitioner, because there is a clear demarc between us and docker.

See https://kubernetes.io/docs/concepts/cluster-administration/logging/#using-a-node-logging-agent

I have demo configs here:
https://github.com/matthewmodestino/container_workshop/tree/master/fluentd_docker

https://github.com/matthewmodestino/container_workshop/blob/master/docs/01-docker-lab.md#splunk-flue...

If that's not your style, check out collectord https://collectord.io/

0 Karma

New Member

How can the fields which are separated by colon like “line” , “tag” and “source” be extracted automatically on source=http:docker for Docker logs while using Http Event Collector , also if the docker logs have the Key Value in the logs how can those appear as fields in Splunk?

For example the log has the following :

{ [-] 
   line: 2016-11-14 15:22:03,779; [LOG=debug, NAME=bhav, TIME=1,MSG=Goodday, CLIENT=127.0.0.1] 
   source: stdout
   tag: abc02be1be4e 
}

I need to see line , source and tag as fields , along with that KV pair should also show up fields like LOG, NAME, MSG and CLIENT .

Can this be done if so how ? We would want a permanent solution so that it can be applied Enterprise wise.

0 Karma

Ultra Champion

(Sounds like this is in regards to using the log driver so I've moved this comment to that solution rather than the answer related to more traditional approaches)

Odd that those fields are not already parsed for you. Does the sourcetype have a props.conf entry for KV_MODE = json? What is the sourcetype being used and where was it defined (by you or by an app from splunkbase)?

0 Karma

New Member

We have been using the default sourcetype - json_no_timestamp which shows up on the Data Inputs -> Http Event Collector.

0 Karma

Ultra Champion

I'm not sure if json_no_timestamp is an out-of-the-box sourcetype. What is the value of KV_MODE for that sourcetype (Settings -> Sourcetypes)? In fact, maybe provide a screen shot of that sourectypes definition from the web UI (or btool - but not the conf file).

0 Karma

New Member

Also this brings us to a point where we will need to start monitoring the GC(Garbage Collection) and Heap exhaustion in a containers using Splunk - how do we that ?

0 Karma

Ultra Champion

Splunk is a dummy here and just simply accepting data the container sends so you'll need the container to send that data as well. So you'll need to expose data from the GC and heap of the container to Splunk over the same driver.

0 Karma

New Member

How can the fields which are separated by colon like “line” , “tag” and “source” be extracted automatically on source=http:docker for Docker logs while using Http Event Collector , also if the docker logs have the Key Value in the logs how can those appear as fields in Splunk?

For example the log has the following :

{ [-]
line: 2016-11-14 15:22:03,779; [LOG=debug, NAME=bhav, TIME=1,MSG=Goodday, CLIENT=127.0.0.1]
source: stdout
tag: abc02be1be4e
}

I need to see line , source and tag as fields , along with that KV pair should also showup fields like LOG, NAME, MSG and CLIENT .

Can this be done if so how ? We would want a permanent solution so that it can be applied Enterprise wise.

0 Karma

Path Finder

Keep in mind that that raw events are only supported in Splunk 6.4 and onwards

http://dev.splunk.com/view/event-collector/SP-CAAAE8Y

0 Karma

Ultra Champion

Yes! Thanks for adding that info!

0 Karma

Splunk Employee
Splunk Employee

Just to clarify: this solution solves challenge 1, not 2. Multi-line events like stack traces are still not handled properly as stderr/stdout streams from different containers are interleaved as they are aggregated by Docker logging driver.

Engager

When logs are being forwarded from the filesystem the indexer is able to join line like stack traces with the appropriate sourcetype. What is the indexer using to determine that the lines can be joined, is it the "source"? If so, is it possible to have the log driver stream the logs to the indexer with some unique identifier for the container source? Or am I misunderstanding the mechanics of the line joining?

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!