(Trying to pull a few similar discussions together and recorded for posterity)
The current Docker Logging Driver for Splunk sends HTTP events to Splunk as single-line JSON events because Docker treats each line flushed to stdout / stderror as a separate event (thanks @halr9000). Example:
access_combined
. This means that you're in a pickle trying to sourcetype the line
payload. How can this be addressed to enjoy the power of my existing sourcetypes with this HTTP Event Collector payload from Docker?
The strongest solution is in the works! That is for the Docker Logging Driver for Splunk to transmit HTTP Event Collector in raw mode (rather than json), so the events won’t get surrounded by JSON and our normal field extraction stuff will work.
Our yogi @Michael Wilde has been tracking a PR with Docker for specifically this. If and when that's implemented, I hope to update this accordingly.
I'm dealing with problem #1 above. I'm running the docker logging driver, getting apache logs from my container and they're being sent to splunk, however they are not being parsed into their respective fields. I have set the splunk-format to raw in my container, but that doesn't seem to have helped. After a week of searching and digging around on slack, I finally found someone that's described my problem exactly, but I still don't know what to do to resolve it. If anyone is still watching this thread. is there something you could do to help me here?
Thanks!
Whilst we wait for the ideal solution, here's some workarounds to consider:
line
's payload (similar to this) and rename to the desired sourcetype. With a sourcetype override, the field parsing of the sourcetype name is what is used.Other ideas? Post 'em up!
Cross reference to this thread: https://answers.splunk.com/answers/390219/how-to-parse-docker-logs-with-multiple-events-from.html
The strongest solution is in the works! That is for the Docker Logging Driver for Splunk to transmit HTTP Event Collector in raw mode (rather than json), so the events won’t get surrounded by JSON and our normal field extraction stuff will work.
Our yogi @Michael Wilde has been tracking a PR with Docker for specifically this. If and when that's implemented, I hope to update this accordingly.
Splunk team : do you have any working solution for supporting Multi-line events, like java stack traces?
It will be nice if spunk EC can support mutiline configuration and make more readable way to see error message.
It is hard to see proper error message with current format where each line is being treaded separate event in splunk UI.
Hi @hpant - Yes and No. From what I understand, the limitation is that Docker itself outputs each line of multiline events as single lines. Theoretically, you could stitch such single lines together in Splunk into a multiline event BUT that might not work when each line comes into Splunk with such atomicity (as in, delay between lines of the same stack trace such that they appear to want to be distinct).
Shout out to anyone who knows better than I to correct me on this.
Hi Splunk Team,
Is there any working solution for joining multi-line docker logs? If yes, please share the solution.
Nothing at this time but let me see if i can get the attention of people smarter than myself on the topic.
Hi Team,
I suggest looking at our https://hub.docker.com/r/splunk/fluentd-hec images we released as part of the Splunk Connect for Kubernetes project. (hint: its a gold mine, lots of application outside docker/k8s too! Should simplify the major integrations into one connector)
It allows dynamic sourcetyping through our jq_transformer
plugin and has multi-line capabilities similar to Splunk "Line Breakers" via the concat plugin.
Here is a super simple config to PoC the idea. Basically you would return docker to the default JSON driver, and review it's log rotation settings, then use our image to collect logs and metadata from /var/lib/containers
. I like this as a practitioner, because there is a clear demarc between us and docker.
See https://kubernetes.io/docs/concepts/cluster-administration/logging/#using-a-node-logging-agent
I have demo configs here:
https://github.com/matthewmodestino/container_workshop/tree/master/fluentd_docker
If that's not your style, check out collectord https://collectord.io/
How can the fields which are separated by colon like “line” , “tag” and “source” be extracted automatically on source=http:docker for Docker logs while using Http Event Collector , also if the docker logs have the Key Value in the logs how can those appear as fields in Splunk?
For example the log has the following :
{ [-]
line: 2016-11-14 15:22:03,779; [LOG=debug, NAME=bhav, TIME=1,MSG=Goodday, CLIENT=127.0.0.1]
source: stdout
tag: abc02be1be4e
}
I need to see line , source and tag as fields , along with that KV pair should also show up fields like LOG, NAME, MSG and CLIENT .
Can this be done if so how ? We would want a permanent solution so that it can be applied Enterprise wise.
(Sounds like this is in regards to using the log driver so I've moved this comment to that solution rather than the answer related to more traditional approaches)
Odd that those fields are not already parsed for you. Does the sourcetype have a props.conf entry for KV_MODE = json
? What is the sourcetype being used and where was it defined (by you or by an app from splunkbase)?
We have been using the default sourcetype - json_no_timestamp which shows up on the Data Inputs -> Http Event Collector.
I'm not sure if json_no_timestamp
is an out-of-the-box sourcetype. What is the value of KV_MODE
for that sourcetype (Settings -> Sourcetypes)? In fact, maybe provide a screen shot of that sourectypes definition from the web UI (or btool - but not the conf file).
Also this brings us to a point where we will need to start monitoring the GC(Garbage Collection) and Heap exhaustion in a containers using Splunk - how do we that ?
Splunk is a dummy here and just simply accepting data the container sends so you'll need the container to send that data as well. So you'll need to expose data from the GC and heap of the container to Splunk over the same driver.
How can the fields which are separated by colon like “line” , “tag” and “source” be extracted automatically on source=http:docker for Docker logs while using Http Event Collector , also if the docker logs have the Key Value in the logs how can those appear as fields in Splunk?
For example the log has the following :
{ [-]
line: 2016-11-14 15:22:03,779; [LOG=debug, NAME=bhav, TIME=1,MSG=Goodday, CLIENT=127.0.0.1]
source: stdout
tag: abc02be1be4e
}
I need to see line , source and tag as fields , along with that KV pair should also showup fields like LOG, NAME, MSG and CLIENT .
Can this be done if so how ? We would want a permanent solution so that it can be applied Enterprise wise.
Keep in mind that that raw events are only supported in Splunk 6.4 and onwards
Yes! Thanks for adding that info!
Just to clarify: this solution solves challenge 1, not 2. Multi-line events like stack traces are still not handled properly as stderr/stdout streams from different containers are interleaved as they are aggregated by Docker logging driver.
When logs are being forwarded from the filesystem the indexer is able to join line like stack traces with the appropriate sourcetype. What is the indexer using to determine that the lines can be joined, is it the "source"? If so, is it possible to have the log driver stream the logs to the indexer with some unique identifier for the container source? Or am I misunderstanding the mechanics of the line joining?