<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: INDEXED_EXTRACTIONS=json with transform in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/INDEXED-EXTRACTIONS-json-with-transform/m-p/143500#M29302</link>
    <description>&lt;P&gt;Hi kamermans - Did you have any luck with this? I am having a similar issue.&lt;/P&gt;</description>
    <pubDate>Tue, 23 Sep 2014 08:53:47 GMT</pubDate>
    <dc:creator>rturk</dc:creator>
    <dc:date>2014-09-23T08:53:47Z</dc:date>
    <item>
      <title>INDEXED_EXTRACTIONS=json with transform</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/INDEXED-EXTRACTIONS-json-with-transform/m-p/143499#M29301</link>
      <description>&lt;P&gt;I have JSON data prefixed by syslog that I would like to index using &lt;CODE&gt;INDEXED_EXTRACTIONS=json&lt;/CODE&gt;.  Here's an example of the data:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;May 13 10:26:42 ip-10-11-12-13 myapp-17: {"headers":{"Accept":"*\/*","Accept-Language":"en-gb,en;q=0.5","User-Agent":"Mozilla\/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko\/20100101 Firefox\/29.0"},"date":1399976802,"node":"ip-10-11-12-13","source":"myapp-17","client_ip":"17.18.19.20"}&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;I need to strip off the stuff at the beginning of the, which was added by syslog, so everything before the first "{" char, then process the event as JSON:&lt;BR /&gt;
&lt;CODE&gt;&lt;BR /&gt;
{&lt;BR /&gt;
    "client_ip": "17.18.19.20",&lt;BR /&gt;
    "date": 1399976802,&lt;BR /&gt;
    "headers": {&lt;BR /&gt;
        "Accept": "*/*",&lt;BR /&gt;
        "Accept-Language": "en-gb,en;q=0.5",&lt;BR /&gt;
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0",&lt;BR /&gt;
    },&lt;BR /&gt;
    "node": "ip-10-11-12-13",&lt;BR /&gt;
    "source": "myapp-17",&lt;BR /&gt;
}&lt;BR /&gt;
&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;I have tried the following methods:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Remove leader by pretending it's a line breaker&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;&lt;CODE&gt;LINE_BREAKER=((:?^|\n).+?){&lt;BR /&gt;
SHOULD_LINEMERGE=false&lt;/CODE&gt;&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Removing the leader with SEDCMD: &lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;&lt;CODE&gt;SEDCMD-StripHeader=s/^[^{]+//&lt;/CODE&gt;&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Removing the leader via a transform on _raw:&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;;transforms.conf&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;[StripSyslog]&lt;BR /&gt;
REGEX = ^[^{]+(.*)$&lt;BR /&gt;
FORMAT = $1&lt;BR /&gt;
DEST_KEY = _raw&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;;props.conf&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;TRANSFORMS-StripSyslog = StripSyslog&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;All of these methods work with &lt;CODE&gt;KV_MODE=json&lt;/CODE&gt;, but none of them work with &lt;CODE&gt;INDEXED_EXTRACTIONS=json&lt;/CODE&gt;.&lt;/P&gt;

&lt;P&gt;What I don't like about &lt;CODE&gt;KV_MODE=json&lt;/CODE&gt; is that my events lose their hierarchical nature, so the keys in the headers.* collection are mixed in with the other keys.  For example, with &lt;CODE&gt;INDEXED_EXTRACTIONS=json&lt;/CODE&gt; I can do &lt;CODE&gt;"headers.User-Agent"="Mozilla/*"&lt;/CODE&gt;.  More importantly, I can group these headers.* keys to determine their relative frequency, which is not possible with &lt;CODE&gt;KV_MODE=json&lt;/CODE&gt; since the keys are flattened.&lt;/P&gt;

&lt;P&gt;In the splunkd.log file I see this error:&lt;BR /&gt;
&lt;CODE&gt;07-15-2014 12:33:16.384 -0400 ERROR JsonLineBreaker - JSON StreamID: 0 having confkey=source::/tmp/myfile.gz|host::17-18-19-20|JsonSyslog|3 had parsing error: Unexpected character while looking for value: 'M'&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;This tells me that the &lt;CODE&gt;JsonLineBreaker&lt;/CODE&gt; is probably trying to parse the line before applying any of the aforementioned transformations (the "M" is from "May 13 10:26:42...").&lt;/P&gt;

&lt;P&gt;Is there any way to apply a transformation before the &lt;CODE&gt;JsonLineBreaker&lt;/CODE&gt; kicks in, or perhaps to extend that class in order to strip the leader out?&lt;/P&gt;

&lt;P&gt;I am looking for a definitive answer here as the obvious workarounds (scripted input, change my data format, "sed -i" the file before input) are not great long-term.&lt;/P&gt;

&lt;P&gt;This is probably relevant to these other questions as well:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;&lt;A href="http://answers.splunk.com/answers/135469/escape-json-data-at-index-time"&gt;http://answers.splunk.com/answers/135469/escape-json-data-at-index-time&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="http://answers.splunk.com/answers/107488/is-it-possible-to-parse-an-extracted-field-as-json-if-the-whole-log-line-isnt-json"&gt;http://answers.splunk.com/answers/107488/is-it-possible-to-parse-an-extracted-field-as-json-if-the-whole-log-line-isnt-json&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="http://answers.splunk.com/answers/61235/kv_modejson-with-combined-json-textual-loglines"&gt;http://answers.splunk.com/answers/61235/kv_modejson-with-combined-json-textual-loglines&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Tue, 15 Jul 2014 16:59:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/INDEXED-EXTRACTIONS-json-with-transform/m-p/143499#M29301</guid>
      <dc:creator>kamermans</dc:creator>
      <dc:date>2014-07-15T16:59:32Z</dc:date>
    </item>
    <item>
      <title>Re: INDEXED_EXTRACTIONS=json with transform</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/INDEXED-EXTRACTIONS-json-with-transform/m-p/143500#M29302</link>
      <description>&lt;P&gt;Hi kamermans - Did you have any luck with this? I am having a similar issue.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Sep 2014 08:53:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/INDEXED-EXTRACTIONS-json-with-transform/m-p/143500#M29302</guid>
      <dc:creator>rturk</dc:creator>
      <dc:date>2014-09-23T08:53:47Z</dc:date>
    </item>
    <item>
      <title>Re: INDEXED_EXTRACTIONS=json with transform</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/INDEXED-EXTRACTIONS-json-with-transform/m-p/143501#M29303</link>
      <description>&lt;P&gt;Unfortunately, there is no solution at Splunk for your case. &lt;/P&gt;

&lt;P&gt;INDEXED_EXTRACTIOIN happens at reading file and parsing event time before transforms.conf is applied. &lt;/P&gt;</description>
      <pubDate>Fri, 26 Sep 2014 00:16:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/INDEXED-EXTRACTIONS-json-with-transform/m-p/143501#M29303</guid>
      <dc:creator>Masa</dc:creator>
      <dc:date>2014-09-26T00:16:49Z</dc:date>
    </item>
  </channel>
</rss>

