Getting Data In

Excluding Special Characters from OTEL Splunk Format

padresman
Engager

We are having difficulty getting exclusions of logs that have fields in Camelcase or have entries that have special characters related to OTEL logs. Fields without capitalization and/or special character values are able to be parsed out, but not others.

Here is an example log that we are looking at (attached as yaml and key portion).

 

     

 filelog/kube-apiserver-audit-log:

        include:

        - /var/log/kubernetes/kube-apiserver.log

        include_file_name: false

        include_file_path: true

        operators:

        - id: extract-audit-group

          type: regex_parser

          regex: '\s*\"resourceGroup\"\s*\:\s*\"(?P<extracted_group>[^\"]+)\"\s*'

        - id: filter-group

          type: filter

          expr: 'attributes.extracted_beta == "batch"'

        - id: remove-extracted-group

          type: remove

          field: attributes.extracted_group

        - id: extract-audit-api

          type: regex_parser

          regex: '\"level\"\:\"(?P<extracted_audit_beta>[^\"]+)\"'

        - id: filter-api

          type: filter

          expr: 'attributes.extracted_audit_beta == "Metadata"'

        - id: remove-extracted-api

          type: remove

          field: attributes.extracted_api

        - id: extract-audit-verb

          type: regex_parser

          regex: '\"verb\"\:\"(?P<extracted_verb>[^\"]+)\"'

        - id: filter-verb

          type: filter

          expr: 'attributes.extracted_verb == "watch" || attributes.extracted_verb == "list"'

        - id: remove-extracted-verb

          type: remove

          field: attributes.extracted_verb

The resourceGroup field is compared to something else and failing, verb and level are succeeding.

Here is an example log that would be pulled in.

{"apiVersion":"batch/v1","component":"sync-agent","eventType":"MODIFIED","kind":"CronJob","level":"info","msg":"sent event","name":"agentupdater-workload","namespace":"vmware-system-tmc","resourceGroup":"batch","resourceType":"cronjobs","resourceVersion":"v1","time":"2024-03-14T18:17:11Z"}
Labels (1)

mattymo
Splunk Employee
Splunk Employee

Hey @padresman 

Will try your example. Gotta be very careful that your expression fields match the capture group you use, as it will store it in "attributes."capture group value" by default. 

Also, make sure to use golang regex on regex101. though your regex appears to be fine. 

Also its wise to iterate and NOT remove the fields you make to see what they look like when they arrive at splunk. Can help make sure your value is what you think it is.....

- MattyMo
0 Karma

yuanliu
SplunkTrust
SplunkTrust

This is a little confusing.  I do not see special characters in field values in the provided sample.  But I see a mismatch between operators about resourceGroup.  I assume that extract-audit-group and filter-group are intended to match resourceGroup.  Is this correct?  In the following snippets, extract-audit-group extracts a variable named extracted_group, whereas filter-group calls for one named attributes.extracted_beta.  Maybe filter-group should use extracted_group instead?

        - id: extract-audit-group

          type: regex_parser

          regex: '\s*\"resourceGroup\"\s*\:\s*\"(?P<extracted_group>[^\"]+)\"\s*'

        - id: filter-group

          type: filter

          expr: 'attributes.extracted_beta == "batch"'

 

0 Karma

padresman
Engager

Thanks for the response yuanliu, much appreciated, and sorry for the confusion. You're right that those fields should match up - it should look like the following:

        - id: extract-audit-group

          type: regex_parser

          regex: '\"resourceGroup\"\:\"(?P<extracted_group>[^\"]+)\"'

        - id: filter-group

          type: filter

          expr: 'attributes.extracted_group == "batch"'

        - id: remove-extracted-group

          type: remove

          field: attributes.extracted_group

 

The Id field can be named just about anything, so difference among names there doesn't matter. We've gone through quite a few iterations of testing which is why there was a discrepancy there. What we have narrowed down the problem in our testing is either the camelCase is causing a regex issue with the field, or special characters within a value are causing an issue (or both, my hunch is that it is the camelCase, but we haven't had success with either).  Putting these results into a regex RE2 parser gets the results we expect, but not with the actual deployed OTEL.

0 Karma
Get Updates on the Splunk Community!

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...