Getting Data In

Sourcetypes with Docker and HTTP Event Collector

sloshburch
Splunk Employee
Splunk Employee

(Trying to pull a few similar discussions together and recorded for posterity)

Challenge

The current Docker Logging Driver for Splunk sends HTTP events to Splunk as single-line JSON events because Docker treats each line flushed to stdout / stderror as a separate event (thanks @halr9000). Example:
json_event

  1. Notice that those ideally are our access_combined data but since the data is json, we can't get all the field parsing that comes with the out-of-the-box access_combined. This means that you're in a pickle trying to sourcetype the line payload.
  2. Multi-line events, like java stack traces, arrive line by line with this implementation because the connection is not held open until the the event finishes (thanks @Michael Wilde).

How can this be addressed to enjoy the power of my existing sourcetypes with this HTTP Event Collector payload from Docker?

1 Solution

sloshburch
Splunk Employee
Splunk Employee

Solution

The strongest solution is in the works! That is for the Docker Logging Driver for Splunk to transmit HTTP Event Collector in raw mode (rather than json), so the events won’t get surrounded by JSON and our normal field extraction stuff will work.
Our yogi @Michael Wilde has been tracking a PR with Docker for specifically this. If and when that's implemented, I hope to update this accordingly.

View solution in original post

bhavesh91
New Member

How can the fields which are separated by colon like “line” , “tag” and “source” be extracted automatically on source=http:docker for Docker logs while using Http Event Collector , also if the docker logs have the Key Value in the logs how can those appear as fields in Splunk?

For example the log has the following :

{ [-]
line: 2016-11-14 15:22:03,779; [LOG=debug, NAME=bhav, TIME=1,MSG=Goodday, CLIENT=127.0.0.1]
source: stdout
tag: abc02be1be4e
}

I need to see line , source and tag as fields , along with that KV pair should also showup fields like LOG, NAME, MSG and CLIENT .

Can this be done if so how ? We would want a permanent solution so that it can be applied Enterprise wise.

0 Karma

dsmc_adv
Path Finder

Keep in mind that that raw events are only supported in Splunk 6.4 and onwards

http://dev.splunk.com/view/event-collector/SP-CAAAE8Y

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Yes! Thanks for adding that info!

0 Karma

rarsan_splunk
Splunk Employee
Splunk Employee

Just to clarify: this solution solves challenge 1, not 2. Multi-line events like stack traces are still not handled properly as stderr/stdout streams from different containers are interleaved as they are aggregated by Docker logging driver.

bradmongeon
Engager

When logs are being forwarded from the filesystem the indexer is able to join line like stack traces with the appropriate sourcetype. What is the indexer using to determine that the lines can be joined, is it the "source"? If so, is it possible to have the log driver stream the logs to the indexer with some unique identifier for the container source? Or am I misunderstanding the mechanics of the line joining?

0 Karma

sloshburch
Splunk Employee
Splunk Employee

You can certainly try but I believe this comes down to the way the payload is "cooked" or "parsed" by Splunk.

Ultimately, I believe you can do as you describe but if there is too much delay or if you define the sourcetype wrong then it will not work. Today, Docker sends each line of the stack trace as individual events.

Rumor is that Docker is exploring switching to Plugin model rather than Driver model for this so maybe this will all change anyway.

If you try this out, let us know how it works.

0 Karma

sloshburch
Splunk Employee
Splunk Employee

@rarsan - Are you sure? I thought the logging driver sends data from the container itself and so different containers send different streams.

0 Karma

ErikAulin
Engager

Is there any updates to multi-line events. Searched around but this is the closest post that discussed this.

0 Karma

sloshburch
Splunk Employee
Splunk Employee

@Michael Wilde - Is this because Docker still spits out each line individually or has this been adjusted on the docker side so as to send a multiline output as one event?

0 Karma

Michael_Wilde
Splunk Employee
Splunk Employee

Correct @SloshBurch log messages are lines. Docker won't solve this by nature. Multiline event aggregation isn't something many log tools other than splunk do well. Our log driver would almost require a "little splunk" inside it to properly aggregate events. There isn't a reason why a customer can't implement the HEC within their own app (running inside the container)

0 Karma

abedwards
New Member

@Michael Wilde can you clarify if splunk is working on a fix for issue#2 (ie multi-line stacktraces). The big problem with your statement "here isn't a reason why a customer can't implement the HEC within their own app (running inside the container)" is that nowadays with so many docker containers being published directly on dockerhub etc, if any of these applications produce multi-line outputs they don't work with your docker-splunk logging driver. It's not practical to get all these pre-built docker images to change and add support for the splunk HEC appenders. I think the only place that is capable of fixing this issue is directly in the splunk docker logging driver. It would somehow need to aggregate the events there first before sending to splunk or perhaps have some additional capabilities on the server side to merge them together using the container id to ensure logs from different containers aren't merged together.

In my case I have support for docker under a RedHat agreement as well as support for splunk Enterprise. Where is the underlying issue being tracked? is there a bug opened already for the splunk docker driver?

Do we need another new topic started for this second issue to track it as it clearly isn't solved. 😞

Note, I don't see docker itself ever being able to fix this issue since stacktraces will always be on multiple lines, the only other thing I could think of would be that if the logging drivers were somehow updated to put a special character for newline instead of newline itself, but then even if you did that you would run into the issue where docker cannot send long lines (i think it's a 16k limitation right now). We need a workable solution for this issue, can splunk help?

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Let me see if I can get the folks who are now working on Docker to elaborate on this.

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Hmm @michael wilde - how would that be different than the HEC collecting the data outside the container (today's common method).

It sounds like you are saying that docker doesn't even provide a mechanism to change the out it outputs to chunks in its standard out.

0 Karma

sloshburch
Splunk Employee
Splunk Employee

From @dgladkikh: Raw format has been merged to master https://github.com/docker/docker/pull/25786

So it should be available in 1.13 and it is possible to try it with experimental docker or custom build from master

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Slight update that it sounds like 1.13 is more easily accessible these days and you can start using the new driver: http://blogs.splunk.com/2016/12/01/docker-1-13-with-improved-splunk-logging-driver

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Looks like 1.13 just recently came out of beta!

Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...