Duplicate field values in Splunk events from Cribl

Raffaele53

Hello,

I’m using Cribl Cloud to pull JSON events from an Azure Event Hub and forward them to Splunk via HEC.

Each incoming event (on Cribl) contains a nested array field called records, for example:

{
  "records": [
    {
      "FileName": "xx",
      "FileType": "xx",
      "NetworkMessageId": "xx",
      "RecipientEmailAddress": "xx",
      "RecipientObjectId": "xx",
      "ReportId": "xx",
      "SHA256": "xx",
      "SenderDisplayName": "xx",
      "SenderObjectId": "x",
      "SenderFromAddress": "x",
      "FileSize": x,
      "Timestamp": "xx",
      "TimeGenerated": "xx",
      "_ItemId": "xx",
      "TenantId": "xx",
      "_TimeReceived": "xx",
      "_Internal_WorkspaceResourceId": "xx",
      "Type": "xx"
    },
    {
            "FileName": "xx",
      "FileType": "xx",
      "NetworkMessageId": "xx",
      "RecipientEmailAddress": "xx",
      "RecipientObjectId": "xx",
      "ReportId": "xx",
      "SHA256": "xx",
      "SenderDisplayName": "xx",
      "SenderObjectId": "x",
      "SenderFromAddress": "x",
      "FileSize": x,
      "Timestamp": "xx",
      "TimeGenerated": "xx",
      "_ItemId": "xx",
      "TenantId": "xx",
      "_TimeReceived": "xx",
      "_Internal_WorkspaceResourceId": "xx",
      "Type": "xx"
    },
    {
            "FileName": "xx",
      "FileType": "xx",
      "NetworkMessageId": "xx",
      "RecipientEmailAddress": "xx",
      "RecipientObjectId": "xx",
      "ReportId": "xx",
      "SHA256": "xx",
      "SenderDisplayName": "xx",
      "SenderObjectId": "x",
      "SenderFromAddress": "x",
      "FileSize": x,
      "Timestamp": "xx",
      "TimeGenerated": "xx",
      "_ItemId": "xx",
      "TenantId": "xx",
      "_TimeReceived": "xx",
      "_Internal_WorkspaceResourceId": "xx",
      "Type": "xx"
    }
  ],
  "_time": 1756902850.057,
  "cribl": "yes",
  "security_event_hub": "yes"
}

My goal is to split each element of the records array into a separate, flat event. Here’s what I’ve tried:

Unroll function (Cribl) on records to produce individual events
Flatten function (Cribl) to promote nested fields and delete records array

In Splunk, each field’s values are duplicated (and sometimes triplicated), as shown here: (censored values are equals between them)

I’ve identified that extracting nested values is causing this anomaly in Splunk.

I’ve tried numerous approaches to resolve it:

Replaced the Flatten function with an Eval expression like that (Cribl):
Object.assign(__e, Object.assign({}, __e, __e.rec || {})); delete __e.rec; delete __e.records;
Tested various JavaScript snippets in Code functions (Cribl)
Used JSON Unroll and JSON Decode functions (Cribl)
Toggled KV_MODE, AUTO_KV_JSON, and INDEXED_EXTRACTIONS on Heavy Forwarders and Search Heads

None of these solutions work consistently; in some cases values were even triplicated.
Do you have any suggestions to resolve this issue?

Thank you in advance for any insights or working examples.

PickleRick

1. You're showing us only the resulting fields without raw message contents. It's impossible to say what your _raw looks like. But typically duplication of values occurs when you have both fields extracted index time by indexed extraction as well as search-time extracted ones by kv_mode.

2. As @livehybrid said - first thing to do is check what's going on in your Cribl and what it does to your data and how it sends it to Splunk. And most probably fix it there. But that's beyond the scope of this forum.

Raffaele53

Sorry, I forgot to include the raw event going out from Cribl and coming into Splunk.

Raw event extracted from Splunk search:

{"cribl":"yes","security_event_hub":"yes","NetworkMessageId":"xxx","ReportId":"xxx","Timestamp":"2025-09-04T07:04:14.0000000Z","Url":"xxx","UrlDomain":"xxx","UrlLocation":"Body","TimeGenerated":"2025-09-04T07:04:14.0000000Z","_ItemId":"xxx","TenantId":"xxx","_TimeReceived":"2025-09-04T07:07:00.1639029Z","_Internal_WorkspaceResourceId":"xxx","Type":"EmailUrlInfo"}

Each field and its value appear only once.

On Cribl, the event looks the same:

{
  "_time": 1756969933.064,
  "cribl": "yes",
  "security_event_hub": "yes",
  "NetworkMessageId": "xxx",
  "ReportId": "xxx",
  "Timestamp": "2025-09-04T07:08:47.0000000Z",
  "Url": "xxx",
  "UrlDomain": "xxx",
  "UrlLocation": "Body",
  "TimeGenerated": "2025-09-04T07:08:47.0000000Z",
  "_ItemId": "xxx",
  "TenantId": "xxx",
  "_TimeReceived": "2025-09-04T07:12:07.2054241Z",
  "_Internal_WorkspaceResourceId": "xxx",
  "Type": "EmailUrlInfo",
  "cribl_pipe": "Azure_Event_Hub_processing"
}

I tried different combinations of KV_MODE, AUTO_KV_JSON, and INDEXED_EXTRACTIONS:

On the Heavy Forwarder:
- KV_MODE=JSON, then KV_MODE=none
- INDEXED_EXTRACTIONS=JSON
- AUTO_KV_JSON=none
- and also all of them combined
Afterwards, I tried:
- On the Search Head: KV_MODE=none and AUTO_KV_JSON=none
- On the Heavy Forwarder: INDEXED_EXTRACTIONS=JSON

But nothing changed.

On Cribl, I also tried several functions to delete and recreate fields/values, but that didn’t work either.

Do you have any suggestions?

Thanks a lot!

PickleRick

If you have a json event, don't use indexed extractions.

Having said that - raw event is one thing but cribl _might_ (I have no idea if it does in your case) be adding indexed fields anyway.

Raffaele53

I tried using Indexed Extractions to see if anything would change, but it didn’t work.

On the Cribl side, I only see the event going out and nothing else.

Thanks for the help—hopefully someone’s run into this before and can help me!

livehybrid

Hi @Raffaele53

Im not too familiar with breaking up events in Cribl but if these are being sent as parsed events to Splunk then this should be done before it reaches Splunk.

When you preview the output from Cribl do you see the raw json, output, or do you also see the fields?

I'd start off by making sure you are happy with the output from Cribl using the preview option and then checking the KV_MODE etc is correect on the Splunk side to match as required.

🌟 Did this answer help you? If so, please consider:

Adding karma to show it was useful
Marking it as the solution if it resolved your issue
Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

Raffaele53

Hi,

Thanks for your reply.
I shared the raw events going out from Cribl and coming into Splunk in my other message.

On the Cribl side, I can only see the raw events being sent to Splunk, while field extraction can only be checked on the Splunk side.

It’s strange because if I don’t extract fields from nested "records" field, everything works correctly.
It’s as if Cribl leaves some old metadata in the events, referring to fields that were previously modified.

Duplicate field values in Splunk events from Cribl

field extraction

HTTP Event Collector

JSON

props.conf

Index This | What’s a riddle wrapped in an enigma?

BORE at .conf25

OpenTelemetry for Legacy Apps? Yes, You Can!

Are you a member of the Splunk Community?

Duplicate field values in Splunk events from Cribl

field extraction

HTTP Event Collector

JSON

props.conf

Index This | What’s a riddle wrapped in an enigma?

BORE at .conf25

OpenTelemetry for Legacy Apps? Yes, You Can!