Getting Data In

Indexing a text string within a json field, submitted via HEC

mgherman
Explorer

Hi Folks,

I am trying to extract fields from a text string that is included in a JSON event, submitted to Splunk via an HEC. (Additionally, modifying its sourcetype and index based on a regex.

An example of the event:

{"log":"2019-01-31 06:09:07:382+0000 pid=10  method=GET path=/ugc_service/ugcs/search format=json controller=ugcs action=search status=200 duration=124.48 view=15.78 db=3.25 es=100.61 time=20190131060907 service=rosi_ugc_service request_id=d79a3fd0175237c5d7d1 event=process_action.action_controller platform=keep params={\"page\":\"1\",\"per_page\":\"4\",\"query\":\"1380\",\"search_field\":\"%22\",\"sort\":\"most_recent\",\"_\":\"1482163239481\"} locale=en-US store=us x_forwarded_for=1.1.1.1, 2.2.2.2\n", "stream":"stdout", "time":"2019-01-31T06:09:07.382816774Z", "kubernetes":{"pod_name":"ugc-ugc-master-cc5b0195-unicorn-746bcc4765-sr697", "namespace_name":"rosi", "pod_id":"a134b3d9-1af0-11e9-bf52-005056b56c98", "labels":{"app":"ugc-unicorn", "pod-template-hash":"3026770321", "release":"ugc", "version":"ugc_master_cc5b0195"}, "annotations":{"cni.projectcalico.org/podIP":"10.52.4.36/32", "sidecar.istio.io/status":"{\\\"version\\\":\\\"89e068fb406338c02d8272a6459d5d4d7565ef2b228c8ad6778daffe24016ff0\\\",\\\"initContainers\\\":[\\\"istio-init\\\"],\\\"containers\\\":[\\\"istio-proxy\\\"],\\\"volumes\\\":[\\\"istio-envoy\\\",\\\"istio-certs\\\"],\\\"imagePullSecrets\\\":null}"}, "host":"us-perf-kubewrk-003.atl01.example.com", "container_name":"k8s-ugc", "docker_id":"727ca8a5aefb82385222900ef822875f7983b23550f34816e916b9836648030f"}}

When initially indexed it creates all of the fields present in the JSON, both top level and nested, however I would like to "re-parse" the "log" field so that the key=value pairs are available. I guess I could parse it all with a REGEX , however that seems a little fragile, in the event that the developers add / reorder the fields in the log.

I have tried using REPORT-xxx in transforms.conf without any success.

As mentioned I have been updating the source type based on a regex match against the log like this:

props.conf
-----------------
[source::http:perform-k8s]
        TRANSFORMS-changesourcetype = kubernetes_change_sourcetype_access_combined, kubernetes_change_sourcetype_rosi_application
        TRANSFORMS-changeindex = kubernetes_change_index_access_combined, kubernetes_change_index_rosi_application


transform.conf
----------------------
[kubernetes_change_index_access_combined]
DEST_KEY = _MetaData:Index
REGEX=^{"log":"\S+ \S+ \S+ \S* ?\[[^\]]+\] \\"[^"]*\\" \S+ \S+(?: \S+)?
FORMAT = apache

[kubernetes_change_index_rosi_application]
DEST_KEY = _MetaData:Index
REGEX=^{"log":"\S+\s+\S+\s+pid=\d+\s+method=\S+\s+path=\S+
FORMAT = rosi

[kubernetes_change_sourcetype_access_combined]
DEST_KEY = MetaData:Sourcetype
REGEX=^{"log":"\S+ \S+ \S+ \S* ?\[[^\]]+\] \\"[^"]*\\" \S+ \S+(?: \S+)?
FORMAT = sourcetype::k8s_access_combined

[kubernetes_change_sourcetype_rosi_application]
DEST_KEY = MetaData:Sourcetype
REGEX=^{"log":"\S+\s+\S+\s+pid=\d+\s+method=\S+\s+path=\S+
FORMAT = sourcetype::k8s_rosi_application

This is using Splunk release 6.6.8, with the log events being submitted via fluent-bit, I am open to trying splunk-connect-for-kubernetes (https://github.com/splunk/splunk-connect-for-kubernetes) however it feels as though I will need to deal wth the same data format either way.

If you got this far, thanks for reading 🙂

mgh

Tags (2)
0 Karma

outcoldman
Communicator

Hello @mgherman

HTTP Event Collector expects all the values to be prepared in the right format. Fluent-bit does not do that, connect for kubernetes does solve this problem.
We can also offer our solution https://www.outcoldsolutions.com/ with the pre-build dashboards and alerts https://splunkbase.splunk.com/app/3743/, it solves the problem of forwarding logs in the expected format. You can also define fields extraction on the source level by specifying the annotations, see https://www.outcoldsolutions.com/docs/monitoring-kubernetes/v5/annotations/#extracting-fields-from-t...

We have also a performance comparison between collectord (our solution), fluent-bit and fluentd https://www.outcoldsolutions.com/blog/2018-11-19-performance-collectord-fluentd-fluentbit/

0 Karma

jkat54
SplunkTrust
SplunkTrust

Have you tried “in” with props.conf settings?

For example:

EXTRACT-anyname= REGEX in JSONFIELDNAME

It should apply the regex to the single json field you provide it instead of applying to the entire _raw field.

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...