<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Indexing a text string within a json field, submitted via HEC in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Indexing-a-text-string-within-a-json-field-submitted-via-HEC/m-p/441181#M76907</link>
    <description>&lt;P&gt;Hi Folks,&lt;/P&gt;

&lt;P&gt;I am trying to extract fields from a text string that is included in a JSON event, submitted to Splunk via an HEC. (Additionally, modifying its sourcetype and index based on a regex.&lt;/P&gt;

&lt;P&gt;An example of the event:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;{"log":"2019-01-31 06:09:07:382+0000 pid=10  method=GET path=/ugc_service/ugcs/search format=json controller=ugcs action=search status=200 duration=124.48 view=15.78 db=3.25 es=100.61 time=20190131060907 service=rosi_ugc_service request_id=d79a3fd0175237c5d7d1 event=process_action.action_controller platform=keep params={\"page\":\"1\",\"per_page\":\"4\",\"query\":\"1380\",\"search_field\":\"%22\",\"sort\":\"most_recent\",\"_\":\"1482163239481\"} locale=en-US store=us x_forwarded_for=1.1.1.1, 2.2.2.2\n", "stream":"stdout", "time":"2019-01-31T06:09:07.382816774Z", "kubernetes":{"pod_name":"ugc-ugc-master-cc5b0195-unicorn-746bcc4765-sr697", "namespace_name":"rosi", "pod_id":"a134b3d9-1af0-11e9-bf52-005056b56c98", "labels":{"app":"ugc-unicorn", "pod-template-hash":"3026770321", "release":"ugc", "version":"ugc_master_cc5b0195"}, "annotations":{"cni.projectcalico.org/podIP":"10.52.4.36/32", "sidecar.istio.io/status":"{\\\"version\\\":\\\"89e068fb406338c02d8272a6459d5d4d7565ef2b228c8ad6778daffe24016ff0\\\",\\\"initContainers\\\":[\\\"istio-init\\\"],\\\"containers\\\":[\\\"istio-proxy\\\"],\\\"volumes\\\":[\\\"istio-envoy\\\",\\\"istio-certs\\\"],\\\"imagePullSecrets\\\":null}"}, "host":"us-perf-kubewrk-003.atl01.example.com", "container_name":"k8s-ugc", "docker_id":"727ca8a5aefb82385222900ef822875f7983b23550f34816e916b9836648030f"}}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;When initially indexed it creates all of the fields present in the JSON, both top level and nested, however I would like to "re-parse" the "log" field so that the key=value pairs are available. I guess I could parse it all with a REGEX , however that seems a little fragile, in the event that the developers add / reorder the fields in the log.&lt;/P&gt;

&lt;P&gt;I have tried using REPORT-xxx in transforms.conf without any success.&lt;/P&gt;

&lt;P&gt;As mentioned I have been updating the source type based on a regex match against the log like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;props.conf
-----------------
[source::http:perform-k8s]
        TRANSFORMS-changesourcetype = kubernetes_change_sourcetype_access_combined, kubernetes_change_sourcetype_rosi_application
        TRANSFORMS-changeindex = kubernetes_change_index_access_combined, kubernetes_change_index_rosi_application


transform.conf
----------------------
[kubernetes_change_index_access_combined]
DEST_KEY = _MetaData:Index
REGEX=^{"log":"\S+ \S+ \S+ \S* ?\[[^\]]+\] \\"[^"]*\\" \S+ \S+(?: \S+)?
FORMAT = apache

[kubernetes_change_index_rosi_application]
DEST_KEY = _MetaData:Index
REGEX=^{"log":"\S+\s+\S+\s+pid=\d+\s+method=\S+\s+path=\S+
FORMAT = rosi

[kubernetes_change_sourcetype_access_combined]
DEST_KEY = MetaData:Sourcetype
REGEX=^{"log":"\S+ \S+ \S+ \S* ?\[[^\]]+\] \\"[^"]*\\" \S+ \S+(?: \S+)?
FORMAT = sourcetype::k8s_access_combined

[kubernetes_change_sourcetype_rosi_application]
DEST_KEY = MetaData:Sourcetype
REGEX=^{"log":"\S+\s+\S+\s+pid=\d+\s+method=\S+\s+path=\S+
FORMAT = sourcetype::k8s_rosi_application
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This is using Splunk release 6.6.8, with the log events being submitted via fluent-bit, I am open to trying splunk-connect-for-kubernetes (&lt;A href="https://github.com/splunk/splunk-connect-for-kubernetes"&gt;https://github.com/splunk/splunk-connect-for-kubernetes&lt;/A&gt;) however it feels as though I will need to deal wth the same data format either way.&lt;/P&gt;

&lt;P&gt;If you got this far, thanks for reading &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;mgh&lt;/P&gt;</description>
    <pubDate>Fri, 01 Feb 2019 01:32:16 GMT</pubDate>
    <dc:creator>mgherman</dc:creator>
    <dc:date>2019-02-01T01:32:16Z</dc:date>
    <item>
      <title>Indexing a text string within a json field, submitted via HEC</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Indexing-a-text-string-within-a-json-field-submitted-via-HEC/m-p/441181#M76907</link>
      <description>&lt;P&gt;Hi Folks,&lt;/P&gt;

&lt;P&gt;I am trying to extract fields from a text string that is included in a JSON event, submitted to Splunk via an HEC. (Additionally, modifying its sourcetype and index based on a regex.&lt;/P&gt;

&lt;P&gt;An example of the event:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;{"log":"2019-01-31 06:09:07:382+0000 pid=10  method=GET path=/ugc_service/ugcs/search format=json controller=ugcs action=search status=200 duration=124.48 view=15.78 db=3.25 es=100.61 time=20190131060907 service=rosi_ugc_service request_id=d79a3fd0175237c5d7d1 event=process_action.action_controller platform=keep params={\"page\":\"1\",\"per_page\":\"4\",\"query\":\"1380\",\"search_field\":\"%22\",\"sort\":\"most_recent\",\"_\":\"1482163239481\"} locale=en-US store=us x_forwarded_for=1.1.1.1, 2.2.2.2\n", "stream":"stdout", "time":"2019-01-31T06:09:07.382816774Z", "kubernetes":{"pod_name":"ugc-ugc-master-cc5b0195-unicorn-746bcc4765-sr697", "namespace_name":"rosi", "pod_id":"a134b3d9-1af0-11e9-bf52-005056b56c98", "labels":{"app":"ugc-unicorn", "pod-template-hash":"3026770321", "release":"ugc", "version":"ugc_master_cc5b0195"}, "annotations":{"cni.projectcalico.org/podIP":"10.52.4.36/32", "sidecar.istio.io/status":"{\\\"version\\\":\\\"89e068fb406338c02d8272a6459d5d4d7565ef2b228c8ad6778daffe24016ff0\\\",\\\"initContainers\\\":[\\\"istio-init\\\"],\\\"containers\\\":[\\\"istio-proxy\\\"],\\\"volumes\\\":[\\\"istio-envoy\\\",\\\"istio-certs\\\"],\\\"imagePullSecrets\\\":null}"}, "host":"us-perf-kubewrk-003.atl01.example.com", "container_name":"k8s-ugc", "docker_id":"727ca8a5aefb82385222900ef822875f7983b23550f34816e916b9836648030f"}}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;When initially indexed it creates all of the fields present in the JSON, both top level and nested, however I would like to "re-parse" the "log" field so that the key=value pairs are available. I guess I could parse it all with a REGEX , however that seems a little fragile, in the event that the developers add / reorder the fields in the log.&lt;/P&gt;

&lt;P&gt;I have tried using REPORT-xxx in transforms.conf without any success.&lt;/P&gt;

&lt;P&gt;As mentioned I have been updating the source type based on a regex match against the log like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;props.conf
-----------------
[source::http:perform-k8s]
        TRANSFORMS-changesourcetype = kubernetes_change_sourcetype_access_combined, kubernetes_change_sourcetype_rosi_application
        TRANSFORMS-changeindex = kubernetes_change_index_access_combined, kubernetes_change_index_rosi_application


transform.conf
----------------------
[kubernetes_change_index_access_combined]
DEST_KEY = _MetaData:Index
REGEX=^{"log":"\S+ \S+ \S+ \S* ?\[[^\]]+\] \\"[^"]*\\" \S+ \S+(?: \S+)?
FORMAT = apache

[kubernetes_change_index_rosi_application]
DEST_KEY = _MetaData:Index
REGEX=^{"log":"\S+\s+\S+\s+pid=\d+\s+method=\S+\s+path=\S+
FORMAT = rosi

[kubernetes_change_sourcetype_access_combined]
DEST_KEY = MetaData:Sourcetype
REGEX=^{"log":"\S+ \S+ \S+ \S* ?\[[^\]]+\] \\"[^"]*\\" \S+ \S+(?: \S+)?
FORMAT = sourcetype::k8s_access_combined

[kubernetes_change_sourcetype_rosi_application]
DEST_KEY = MetaData:Sourcetype
REGEX=^{"log":"\S+\s+\S+\s+pid=\d+\s+method=\S+\s+path=\S+
FORMAT = sourcetype::k8s_rosi_application
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This is using Splunk release 6.6.8, with the log events being submitted via fluent-bit, I am open to trying splunk-connect-for-kubernetes (&lt;A href="https://github.com/splunk/splunk-connect-for-kubernetes"&gt;https://github.com/splunk/splunk-connect-for-kubernetes&lt;/A&gt;) however it feels as though I will need to deal wth the same data format either way.&lt;/P&gt;

&lt;P&gt;If you got this far, thanks for reading &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;mgh&lt;/P&gt;</description>
      <pubDate>Fri, 01 Feb 2019 01:32:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Indexing-a-text-string-within-a-json-field-submitted-via-HEC/m-p/441181#M76907</guid>
      <dc:creator>mgherman</dc:creator>
      <dc:date>2019-02-01T01:32:16Z</dc:date>
    </item>
    <item>
      <title>Re: Indexing a text string within a json field, submitted via HEC</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Indexing-a-text-string-within-a-json-field-submitted-via-HEC/m-p/441182#M76908</link>
      <description>&lt;P&gt;Have you tried “in”  with props.conf settings?&lt;/P&gt;

&lt;P&gt;For example:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;EXTRACT-anyname= REGEX in JSONFIELDNAME
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;It should apply the regex to the single json field you provide it instead of applying to the entire _raw field.&lt;/P&gt;</description>
      <pubDate>Fri, 01 Feb 2019 02:23:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Indexing-a-text-string-within-a-json-field-submitted-via-HEC/m-p/441182#M76908</guid>
      <dc:creator>jkat54</dc:creator>
      <dc:date>2019-02-01T02:23:20Z</dc:date>
    </item>
    <item>
      <title>Re: Indexing a text string within a json field, submitted via HEC</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Indexing-a-text-string-within-a-json-field-submitted-via-HEC/m-p/441183#M76909</link>
      <description>&lt;P&gt;Hello @mgherman &lt;/P&gt;

&lt;P&gt;HTTP Event Collector expects all the values to be prepared in the right format. Fluent-bit does not do that, connect for kubernetes does solve this problem.&lt;BR /&gt;
We can also offer our solution &lt;A href="https://www.outcoldsolutions.com/"&gt;https://www.outcoldsolutions.com/&lt;/A&gt; with the pre-build dashboards and alerts &lt;A href="https://splunkbase.splunk.com/app/3743/"&gt;https://splunkbase.splunk.com/app/3743/&lt;/A&gt;, it solves the problem of forwarding logs in the expected format. You can also define fields extraction on the source level by specifying the annotations, see &lt;A href="https://www.outcoldsolutions.com/docs/monitoring-kubernetes/v5/annotations/#extracting-fields-from-the-container-logs"&gt;https://www.outcoldsolutions.com/docs/monitoring-kubernetes/v5/annotations/#extracting-fields-from-the-container-logs&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;We have also a performance comparison between collectord (our solution), fluent-bit and fluentd &lt;A href="https://www.outcoldsolutions.com/blog/2018-11-19-performance-collectord-fluentd-fluentbit/"&gt;https://www.outcoldsolutions.com/blog/2018-11-19-performance-collectord-fluentd-fluentbit/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 17 Feb 2019 16:46:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Indexing-a-text-string-within-a-json-field-submitted-via-HEC/m-p/441183#M76909</guid>
      <dc:creator>outcoldman</dc:creator>
      <dc:date>2019-02-17T16:46:50Z</dc:date>
    </item>
  </channel>
</rss>

