Getting Data In

Indexing a text string within a json field, submitted via HEC


Hi Folks,

I am trying to extract fields from a text string that is included in a JSON event, submitted to Splunk via an HEC. (Additionally, modifying its sourcetype and index based on a regex.

An example of the event:

{"log":"2019-01-31 06:09:07:382+0000 pid=10  method=GET path=/ugc_service/ugcs/search format=json controller=ugcs action=search status=200 duration=124.48 view=15.78 db=3.25 es=100.61 time=20190131060907 service=rosi_ugc_service request_id=d79a3fd0175237c5d7d1 event=process_action.action_controller platform=keep params={\"page\":\"1\",\"per_page\":\"4\",\"query\":\"1380\",\"search_field\":\"%22\",\"sort\":\"most_recent\",\"_\":\"1482163239481\"} locale=en-US store=us x_forwarded_for=,\n", "stream":"stdout", "time":"2019-01-31T06:09:07.382816774Z", "kubernetes":{"pod_name":"ugc-ugc-master-cc5b0195-unicorn-746bcc4765-sr697", "namespace_name":"rosi", "pod_id":"a134b3d9-1af0-11e9-bf52-005056b56c98", "labels":{"app":"ugc-unicorn", "pod-template-hash":"3026770321", "release":"ugc", "version":"ugc_master_cc5b0195"}, "annotations":{"":"", "":"{\\\"version\\\":\\\"89e068fb406338c02d8272a6459d5d4d7565ef2b228c8ad6778daffe24016ff0\\\",\\\"initContainers\\\":[\\\"istio-init\\\"],\\\"containers\\\":[\\\"istio-proxy\\\"],\\\"volumes\\\":[\\\"istio-envoy\\\",\\\"istio-certs\\\"],\\\"imagePullSecrets\\\":null}"}, "host":"", "container_name":"k8s-ugc", "docker_id":"727ca8a5aefb82385222900ef822875f7983b23550f34816e916b9836648030f"}}

When initially indexed it creates all of the fields present in the JSON, both top level and nested, however I would like to "re-parse" the "log" field so that the key=value pairs are available. I guess I could parse it all with a REGEX , however that seems a little fragile, in the event that the developers add / reorder the fields in the log.

I have tried using REPORT-xxx in transforms.conf without any success.

As mentioned I have been updating the source type based on a regex match against the log like this:

        TRANSFORMS-changesourcetype = kubernetes_change_sourcetype_access_combined, kubernetes_change_sourcetype_rosi_application
        TRANSFORMS-changeindex = kubernetes_change_index_access_combined, kubernetes_change_index_rosi_application

DEST_KEY = _MetaData:Index
REGEX=^{"log":"\S+ \S+ \S+ \S* ?\[[^\]]+\] \\"[^"]*\\" \S+ \S+(?: \S+)?
FORMAT = apache

DEST_KEY = _MetaData:Index
FORMAT = rosi

DEST_KEY = MetaData:Sourcetype
REGEX=^{"log":"\S+ \S+ \S+ \S* ?\[[^\]]+\] \\"[^"]*\\" \S+ \S+(?: \S+)?
FORMAT = sourcetype::k8s_access_combined

DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::k8s_rosi_application

This is using Splunk release 6.6.8, with the log events being submitted via fluent-bit, I am open to trying splunk-connect-for-kubernetes ( however it feels as though I will need to deal wth the same data format either way.

If you got this far, thanks for reading 🙂


Tags (2)
0 Karma


Hello @mgherman

HTTP Event Collector expects all the values to be prepared in the right format. Fluent-bit does not do that, connect for kubernetes does solve this problem.
We can also offer our solution with the pre-build dashboards and alerts, it solves the problem of forwarding logs in the expected format. You can also define fields extraction on the source level by specifying the annotations, see

We have also a performance comparison between collectord (our solution), fluent-bit and fluentd

0 Karma


Have you tried “in” with props.conf settings?

For example:


It should apply the regex to the single json field you provide it instead of applying to the entire _raw field.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!