<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Docker logs produced in raw in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Docker-logs-produced-in-raw/m-p/317079#M59258</link>
    <description>&lt;P&gt;Hi Mazzy, &lt;/P&gt;

&lt;P&gt;Those are what raw docker logs look like... Can you elaborate on why you think Splunk is not keeping up???&lt;/P&gt;

&lt;P&gt;check out the props/transforms we published to github:&lt;/P&gt;

&lt;P&gt;&lt;A href="https://github.com/splunk/docker-itmonitoring/tree/7.0.0-k8s"&gt;https://github.com/splunk/docker-itmonitoring/tree/7.0.0-k8s&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Basically the approach I took is to use a "base" sourcetype to take care of stripping the docker JSON cruft off the log and remove any random commenting:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[kubernetes]
CHARSET=UTF-8
SHOULD_LINEMERGE=false
NO_BINARY_CHECK = true
# remove docker json wrapper, then remove escapes from the quotes in the log message. 
SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*?)\\n","stream.*/\1/g
SEDCMD-2_unescapequotes = s/\\"/"/g
# another exprimental version of the sed.
#SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*)\\n","stream.*?([\n\r])/\1\2/g
category = Custom
disabled = false
pulldown_type = true
TRUNCATE=150000
TZ=UTC
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Then what you can do is use source based props that are placed AHEAD of this sourcetype to apply container/app specific log parsing (see &lt;A href="http://docs.splunk.com/Documentation/Splunk/7.0.1/Admin/Wheretofindtheconfigurationfiles"&gt;http://docs.splunk.com/Documentation/Splunk/7.0.1/Admin/Wheretofindtheconfigurationfiles&lt;/A&gt; ) &lt;/P&gt;

&lt;P&gt;for example, here I use a source based props for all my orders containers to implement a custom linebreaker to get multiline log support. &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[source::/var/log/containers/orders-(?!db-)*.log]
#SHOULD_LINEMERGE = true
SHOULD_LINEMERGE = false
NO_BINARY_CHECK = true
#BREAK_ONLY_BEFORE = \d{4}\-\d{2}\-\d{2}\s\d{2}\:\d{2}\:\d{2}\.\d{3}
LINE_BREAKER = ([\n\r]+){"log":"[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}\s
CHARSET = UTF-8
disabled = false
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This way you can leverage the Splunk pipeline order of operations to hit the source based props first, then pass it through the kubernetes sourcetype (or wharever you'd like to call the sourcetype, I just happen to working with k8s) to strip off the stuffs you dont want and then use your beloved TAs &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; &lt;/P&gt;

&lt;P&gt;Great deep reading on what happens, when, in the Splunk indexing pipeline. &lt;/P&gt;

&lt;P&gt;&lt;A href="https://wiki.splunk.com/Community:HowIndexingWorks"&gt;https://wiki.splunk.com/Community:HowIndexingWorks&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 16 Jan 2018 23:43:49 GMT</pubDate>
    <dc:creator>mattymo</dc:creator>
    <dc:date>2018-01-16T23:43:49Z</dc:date>
    <item>
      <title>Docker logs produced in raw</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Docker-logs-produced-in-raw/m-p/317076#M59255</link>
      <description>&lt;P&gt;I have a Docker application which push Docker logs to Splunk. The Docker app use &lt;CODE&gt;json-file&lt;/CODE&gt; log driver. The logs are read by the Universal Forwarder and pushed to Splunk.&lt;/P&gt;

&lt;P&gt;The logs appears like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;{
  "log": "json here",
  "stream": "stdout",
  "time": "time here"
}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The problem is that when Docker produces logs very fast, Splunk is not able to parse it and then all the logs will appear like raw in Splunk.&lt;/P&gt;

&lt;P&gt;Do you have any idea which parameter might I tune?&lt;/P&gt;</description>
      <pubDate>Tue, 16 Jan 2018 18:54:27 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Docker-logs-produced-in-raw/m-p/317076#M59255</guid>
      <dc:creator>mazzy89</dc:creator>
      <dc:date>2018-01-16T18:54:27Z</dc:date>
    </item>
    <item>
      <title>Re: Docker logs produced in raw</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Docker-logs-produced-in-raw/m-p/317077#M59256</link>
      <description>&lt;P&gt;@mazzy89 could you share the code of application you are a reference to?&lt;/P&gt;

&lt;P&gt;on a side note, have you seen our solution &lt;A href="https://www.outcoldsolutions.com/"&gt;https://www.outcoldsolutions.com/&lt;/A&gt; to send logs and metrics to Splunk? &lt;/P&gt;</description>
      <pubDate>Tue, 16 Jan 2018 21:08:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Docker-logs-produced-in-raw/m-p/317077#M59256</guid>
      <dc:creator>outcoldman</dc:creator>
      <dc:date>2018-01-16T21:08:24Z</dc:date>
    </item>
    <item>
      <title>Re: Docker logs produced in raw</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Docker-logs-produced-in-raw/m-p/317078#M59257</link>
      <description>&lt;P&gt;You might create a new sourcetype definition for this dataset in the props.conf that lands on the indexers and set it up something like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[docker:json]
NO_BINARY_CHECK=1
TIME_PREFIX = \"time\"\:\s+\"
MAX_TIMESTAMP_LOOKAHEAD = 200 (or larger)
TIME_FORMAT = %a %b %d %H:%M:%S %Y (for example)
TRUNCATE = 999999
BREAK_ONLY_BEFORE = ^\{\s+\"log\"
MUST_BREAK_AFTER = &amp;lt;timestamp_format_regex&amp;gt;:\s+\}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;You will certainly have to update the regexes and such but that should get you most of the way there.&lt;/P&gt;</description>
      <pubDate>Tue, 16 Jan 2018 23:09:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Docker-logs-produced-in-raw/m-p/317078#M59257</guid>
      <dc:creator>ShaneNewman</dc:creator>
      <dc:date>2018-01-16T23:09:18Z</dc:date>
    </item>
    <item>
      <title>Re: Docker logs produced in raw</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Docker-logs-produced-in-raw/m-p/317079#M59258</link>
      <description>&lt;P&gt;Hi Mazzy, &lt;/P&gt;

&lt;P&gt;Those are what raw docker logs look like... Can you elaborate on why you think Splunk is not keeping up???&lt;/P&gt;

&lt;P&gt;check out the props/transforms we published to github:&lt;/P&gt;

&lt;P&gt;&lt;A href="https://github.com/splunk/docker-itmonitoring/tree/7.0.0-k8s"&gt;https://github.com/splunk/docker-itmonitoring/tree/7.0.0-k8s&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Basically the approach I took is to use a "base" sourcetype to take care of stripping the docker JSON cruft off the log and remove any random commenting:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[kubernetes]
CHARSET=UTF-8
SHOULD_LINEMERGE=false
NO_BINARY_CHECK = true
# remove docker json wrapper, then remove escapes from the quotes in the log message. 
SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*?)\\n","stream.*/\1/g
SEDCMD-2_unescapequotes = s/\\"/"/g
# another exprimental version of the sed.
#SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*)\\n","stream.*?([\n\r])/\1\2/g
category = Custom
disabled = false
pulldown_type = true
TRUNCATE=150000
TZ=UTC
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Then what you can do is use source based props that are placed AHEAD of this sourcetype to apply container/app specific log parsing (see &lt;A href="http://docs.splunk.com/Documentation/Splunk/7.0.1/Admin/Wheretofindtheconfigurationfiles"&gt;http://docs.splunk.com/Documentation/Splunk/7.0.1/Admin/Wheretofindtheconfigurationfiles&lt;/A&gt; ) &lt;/P&gt;

&lt;P&gt;for example, here I use a source based props for all my orders containers to implement a custom linebreaker to get multiline log support. &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[source::/var/log/containers/orders-(?!db-)*.log]
#SHOULD_LINEMERGE = true
SHOULD_LINEMERGE = false
NO_BINARY_CHECK = true
#BREAK_ONLY_BEFORE = \d{4}\-\d{2}\-\d{2}\s\d{2}\:\d{2}\:\d{2}\.\d{3}
LINE_BREAKER = ([\n\r]+){"log":"[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}\s
CHARSET = UTF-8
disabled = false
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This way you can leverage the Splunk pipeline order of operations to hit the source based props first, then pass it through the kubernetes sourcetype (or wharever you'd like to call the sourcetype, I just happen to working with k8s) to strip off the stuffs you dont want and then use your beloved TAs &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; &lt;/P&gt;

&lt;P&gt;Great deep reading on what happens, when, in the Splunk indexing pipeline. &lt;/P&gt;

&lt;P&gt;&lt;A href="https://wiki.splunk.com/Community:HowIndexingWorks"&gt;https://wiki.splunk.com/Community:HowIndexingWorks&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 16 Jan 2018 23:43:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Docker-logs-produced-in-raw/m-p/317079#M59258</guid>
      <dc:creator>mattymo</dc:creator>
      <dc:date>2018-01-16T23:43:49Z</dc:date>
    </item>
  </channel>
</rss>

