Indexing a text string within a json field, submitted via HEC

mgherman — Fri, 01 Feb 2019 01:32:16 GMT

Hi Folks,

I am trying to extract fields from a text string that is included in a JSON event, submitted to Splunk via an HEC. (Additionally, modifying its sourcetype and index based on a regex.

An example of the event:

{"log":"2019-01-31 06:09:07:382+0000 pid=10  method=GET path=/ugc_service/ugcs/search format=json controller=ugcs action=search status=200 duration=124.48 view=15.78 db=3.25 es=100.61 time=20190131060907 service=rosi_ugc_service request_id=d79a3fd0175237c5d7d1 event=process_action.action_controller platform=keep params={\"page\":\"1\",\"per_page\":\"4\",\"query\":\"1380\",\"search_field\":\"%22\",\"sort\":\"most_recent\",\"_\":\"1482163239481\"} locale=en-US store=us x_forwarded_for=1.1.1.1, 2.2.2.2\n", "stream":"stdout", "time":"2019-01-31T06:09:07.382816774Z", "kubernetes":{"pod_name":"ugc-ugc-master-cc5b0195-unicorn-746bcc4765-sr697", "namespace_name":"rosi", "pod_id":"a134b3d9-1af0-11e9-bf52-005056b56c98", "labels":{"app":"ugc-unicorn", "pod-template-hash":"3026770321", "release":"ugc", "version":"ugc_master_cc5b0195"}, "annotations":{"cni.projectcalico.org/podIP":"10.52.4.36/32", "sidecar.istio.io/status":"{\\\"version\\\":\\\"89e068fb406338c02d8272a6459d5d4d7565ef2b228c8ad6778daffe24016ff0\\\",\\\"initContainers\\\":[\\\"istio-init\\\"],\\\"containers\\\":[\\\"istio-proxy\\\"],\\\"volumes\\\":[\\\"istio-envoy\\\",\\\"istio-certs\\\"],\\\"imagePullSecrets\\\":null}"}, "host":"us-perf-kubewrk-003.atl01.example.com", "container_name":"k8s-ugc", "docker_id":"727ca8a5aefb82385222900ef822875f7983b23550f34816e916b9836648030f"}}

When initially indexed it creates all of the fields present in the JSON, both top level and nested, however I would like to "re-parse" the "log" field so that the key=value pairs are available. I guess I could parse it all with a REGEX , however that seems a little fragile, in the event that the developers add / reorder the fields in the log.

I have tried using REPORT-xxx in transforms.conf without any success.

As mentioned I have been updating the source type based on a regex match against the log like this:

props.conf
-----------------
[source::http:perform-k8s]
        TRANSFORMS-changesourcetype = kubernetes_change_sourcetype_access_combined, kubernetes_change_sourcetype_rosi_application
        TRANSFORMS-changeindex = kubernetes_change_index_access_combined, kubernetes_change_index_rosi_application


transform.conf
----------------------
[kubernetes_change_index_access_combined]
DEST_KEY = _MetaData:Index
REGEX=^{"log":"\S+ \S+ \S+ \S* ?\[[^\]]+\] \\"[^"]*\\" \S+ \S+(?: \S+)?
FORMAT = apache

[kubernetes_change_index_rosi_application]
DEST_KEY = _MetaData:Index
REGEX=^{"log":"\S+\s+\S+\s+pid=\d+\s+method=\S+\s+path=\S+
FORMAT = rosi

[kubernetes_change_sourcetype_access_combined]
DEST_KEY = MetaData:Sourcetype
REGEX=^{"log":"\S+ \S+ \S+ \S* ?\[[^\]]+\] \\"[^"]*\\" \S+ \S+(?: \S+)?
FORMAT = sourcetype::k8s_access_combined

[kubernetes_change_sourcetype_rosi_application]
DEST_KEY = MetaData:Sourcetype
REGEX=^{"log":"\S+\s+\S+\s+pid=\d+\s+method=\S+\s+path=\S+
FORMAT = sourcetype::k8s_rosi_application

This is using Splunk release 6.6.8, with the log events being submitted via fluent-bit, I am open to trying splunk-connect-for-kubernetes (https://github.com/splunk/splunk-connect-for-kubernetes) however it feels as though I will need to deal wth the same data format either way.

If you got this far, thanks for reading 🙂

mgh

Re: Indexing a text string within a json field, submitted via HEC

jkat54 — Fri, 01 Feb 2019 02:23:20 GMT

Have you tried “in” with props.conf settings?

For example:

EXTRACT-anyname= REGEX in JSONFIELDNAME

It should apply the regex to the single json field you provide it instead of applying to the entire _raw field.

Re: Indexing a text string within a json field, submitted via HEC

outcoldman — Sun, 17 Feb 2019 16:46:50 GMT

Hello @mgherman

HTTP Event Collector expects all the values to be prepared in the right format. Fluent-bit does not do that, connect for kubernetes does solve this problem.
We can also offer our solution https://www.outcoldsolutions.com/ with the pre-build dashboards and alerts https://splunkbase.splunk.com/app/3743/, it solves the problem of forwarding logs in the expected format. You can also define fields extraction on the source level by specifying the annotations, see https://www.outcoldsolutions.com/docs/monitoring-kubernetes/v5/annotations/#extracting-fields-from-the-container-logs

We have also a performance comparison between collectord (our solution), fluent-bit and fluentd https://www.outcoldsolutions.com/blog/2018-11-19-performance-collectord-fluentd-fluentbit/

topic Indexing a text string within a json field, submitted via HEC in Getting Data In

Indexing a text string within a json field, submitted via HEC

Re: Indexing a text string within a json field, submitted via HEC

Re: Indexing a text string within a json field, submitted via HEC