Getting Data In

Duplicate field values in Splunk events from Cribl

Raffaele53
Observer

Hello,

I’m using Cribl Cloud to pull JSON events from an Azure Event Hub and forward them to Splunk via HEC.

Each incoming event (on Cribl) contains a nested array field called records, for example:

{
  "records": [
    {
      "FileName": "xx",
      "FileType": "xx",
      "NetworkMessageId": "xx",
      "RecipientEmailAddress": "xx",
      "RecipientObjectId": "xx",
      "ReportId": "xx",
      "SHA256": "xx",
      "SenderDisplayName": "xx",
      "SenderObjectId": "x",
      "SenderFromAddress": "x",
      "FileSize": x,
      "Timestamp": "xx",
      "TimeGenerated": "xx",
      "_ItemId": "xx",
      "TenantId": "xx",
      "_TimeReceived": "xx",
      "_Internal_WorkspaceResourceId": "xx",
      "Type": "xx"
    },
    {
            "FileName": "xx",
      "FileType": "xx",
      "NetworkMessageId": "xx",
      "RecipientEmailAddress": "xx",
      "RecipientObjectId": "xx",
      "ReportId": "xx",
      "SHA256": "xx",
      "SenderDisplayName": "xx",
      "SenderObjectId": "x",
      "SenderFromAddress": "x",
      "FileSize": x,
      "Timestamp": "xx",
      "TimeGenerated": "xx",
      "_ItemId": "xx",
      "TenantId": "xx",
      "_TimeReceived": "xx",
      "_Internal_WorkspaceResourceId": "xx",
      "Type": "xx"
    },
    {
            "FileName": "xx",
      "FileType": "xx",
      "NetworkMessageId": "xx",
      "RecipientEmailAddress": "xx",
      "RecipientObjectId": "xx",
      "ReportId": "xx",
      "SHA256": "xx",
      "SenderDisplayName": "xx",
      "SenderObjectId": "x",
      "SenderFromAddress": "x",
      "FileSize": x,
      "Timestamp": "xx",
      "TimeGenerated": "xx",
      "_ItemId": "xx",
      "TenantId": "xx",
      "_TimeReceived": "xx",
      "_Internal_WorkspaceResourceId": "xx",
      "Type": "xx"
    }
  ],
  "_time": 1756902850.057,
  "cribl": "yes",
  "security_event_hub": "yes"
}

My goal is to split each element of the records array into a separate, flat event. Here’s what I’ve tried:

  • Unroll function (Cribl) on records to produce individual events

  • Flatten function (Cribl) to promote nested fields and delete records array

In Splunk, each field’s values are duplicated (and sometimes triplicated), as shown here: (censored values are equals between them)

Screenshot.png

 

I’ve identified that extracting nested values is causing this anomaly in Splunk.

I’ve tried numerous approaches to resolve it:

  • Replaced the Flatten function with an Eval expression like that (Cribl):
    Object.assign(__e, Object.assign({}, __e, __e.rec || {})); delete __e.rec; delete __e.records;

  • Tested various JavaScript snippets in Code functions (Cribl)

  • Used JSON Unroll and JSON Decode functions (Cribl)

  • Toggled KV_MODE, AUTO_KV_JSON, and INDEXED_EXTRACTIONS on Heavy Forwarders and Search Heads

None of these solutions work consistently; in some cases values were even triplicated.
Do you have any suggestions to resolve this issue?

Thank you in advance for any insights or working examples.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

1. You're showing us only the resulting fields without raw message contents. It's impossible to say what your _raw looks like. But typically duplication of values occurs when you have both fields extracted index time by indexed extraction as well as search-time extracted ones by kv_mode.

2. As @livehybrid said - first thing to do is check what's going on in your Cribl and what it does to your data and how it sends it to Splunk. And most probably fix it there. But that's beyond the scope of this forum.

0 Karma

Raffaele53
Observer

 

Sorry, I forgot to include the raw event going out from Cribl and coming into Splunk.

Raw event extracted from Splunk search:

{"cribl":"yes","security_event_hub":"yes","NetworkMessageId":"xxx","ReportId":"xxx","Timestamp":"2025-09-04T07:04:14.0000000Z","Url":"xxx","UrlDomain":"xxx","UrlLocation":"Body","TimeGenerated":"2025-09-04T07:04:14.0000000Z","_ItemId":"xxx","TenantId":"xxx","_TimeReceived":"2025-09-04T07:07:00.1639029Z","_Internal_WorkspaceResourceId":"xxx","Type":"EmailUrlInfo"}

Each field and its value appear only once.

On Cribl, the event looks the same:

{
  "_time": 1756969933.064,
  "cribl": "yes",
  "security_event_hub": "yes",
  "NetworkMessageId": "xxx",
  "ReportId": "xxx",
  "Timestamp": "2025-09-04T07:08:47.0000000Z",
  "Url": "xxx",
  "UrlDomain": "xxx",
  "UrlLocation": "Body",
  "TimeGenerated": "2025-09-04T07:08:47.0000000Z",
  "_ItemId": "xxx",
  "TenantId": "xxx",
  "_TimeReceived": "2025-09-04T07:12:07.2054241Z",
  "_Internal_WorkspaceResourceId": "xxx",
  "Type": "EmailUrlInfo",
  "cribl_pipe": "Azure_Event_Hub_processing"
}


I tried different combinations of KV_MODE, AUTO_KV_JSON, and INDEXED_EXTRACTIONS:

  • On the Heavy Forwarder:

    • KV_MODE=JSON, then KV_MODE=none

    • INDEXED_EXTRACTIONS=JSON

    • AUTO_KV_JSON=none

    • and also all of them combined

  • Afterwards, I tried:

    • On the Search Head: KV_MODE=none and AUTO_KV_JSON=none

    • On the Heavy Forwarder: INDEXED_EXTRACTIONS=JSON

But nothing changed.

On Cribl, I also tried several functions to delete and recreate fields/values, but that didn’t work either.

Do you have any suggestions?

Thanks a lot!

0 Karma

PickleRick
SplunkTrust
SplunkTrust

If you have a json event, don't use indexed extractions.

Having said that - raw event is one thing but cribl _might_ (I have no idea if it does in your case) be adding indexed fields anyway.

 

0 Karma

Raffaele53
Observer

I tried using Indexed Extractions to see if anything would change, but it didn’t work.

On the Cribl side, I only see the event going out and nothing else.

Thanks for the help—hopefully someone’s run into this before and can help me!

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @Raffaele53 

Im not too familiar with breaking up events in Cribl but if these are being sent as parsed events to Splunk then this should be done before it reaches Splunk. 

When you preview the output from Cribl do you see the raw json, output, or do you also see the fields? 

I'd start off by making sure you are happy with the output from Cribl using the preview option and then checking the KV_MODE etc is correect on the Splunk side to match as required.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

Raffaele53
Observer

Hi,

Thanks for your reply.
I shared the raw events going out from Cribl and coming into Splunk in my other message.

On the Cribl side, I can only see the raw events being sent to Splunk, while field extraction can only be checked on the Splunk side.

It’s strange because if I don’t extract fields from nested "records" field, everything works correctly.
It’s as if Cribl leaves some old metadata in the events, referring to fields that were previously modified.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

 Are you ready to revolutionize your IT operations? As digital transformation accelerates, the demand for ...

Calling All Security Pros: Ready to Race Through Boston?

Hey Splunkers, .conf25 is heading to Boston and we’re kicking things off with something bold, competitive, and ...

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Financial services organizations face an impossible equation: maintain 99.9% uptime for mission-critical ...