Getting Data In

Json without duplicates

Dherom
New Member

Good afternoon guys,

We need help.

We have a JSON file in which duplicate events are written.

We want to know how to have a primary key so that it does not index those duplicates and is not in the Splunk index.

{
"security": {
"notices": [
{
"rss_published": "2019-02-12T13:33:31.000Z",
"rss_message": "Email provider VFEmail has suffered what the company is calling \"catastrophic destruction\" at the hands of an as-yet unknown intruder who trashed all of the company's primary and backup data in the United States. The firm's founder says he ....",
"rss_fuente": "rss_krebsonsecurity",
"rss_title": "Email Provider VFEmail Suffers \u2018Catastrophic\u2019 Hack",
"rss_link": "https://krebsonsecurity.com/2019/02/email-provider-vfemail-suffers-catastrophic-hack/"
}
]
}
}
{
"security": {
"notices": [
{
"rss_published": "2019-02-12T13:33:31.000Z",
"rss_message": "Email provider VFEmail has suffered what the company is calling \"catastrophic destruction\" at the hands of an as-yet unknown intruder who trashed all of the company's primary and backup data in the United States. The firm's founder says he ....",
"rss_fuente": "rss_krebsonsecurity",
"rss_title": "Email Provider VFEmail Suffers \u2018Catastrophic\u2019 Hack",
"rss_link": "https://krebsonsecurity.com/2019/02/email-provider-vfemail-suffers-catastrophic-hack/"
}
]
}
}
{
"security": {
"notices": [
{
"rss_published": "2019-02-12T11:33:54.000Z",
"rss_message": "El fallo afecta a otros productos derivados de Docker que usan runc y al propio LXC, permitiendo acceder a la m\u00e1quina host con permisos de superusuario. Los investigadores Adam Iwaniuk y Borys Pop\u0142awski han descubierto una vulnerabilidad en....",
"rss_fuente": "rss_hispasec",
"rss_title": "Vulnerabilidad en runc permite escapar de contenedor Docker con permisos root",
"rss_link": "https://unaaldia.hispasec.com/2019/02/vulnerabilidad-en-runc-permite-escapar-de-contenedor-docker-co..."
}
]
}
}
thank you!

Tags (2)
0 Karma

Dherom
New Member

But there is no method that at the time of indexing look at two fields of the json and make a hash or something so that these duplicates do not exist

0 Karma

jluo_splunk
Splunk Employee
Splunk Employee

There may be something possible using the DSP beta, but at this point in time, it would be much less efficient to do it inside of Splunk - you would potentially cause some amount of ingestion latency.

0 Karma

woodcock
Esteemed Legend

You can use Cribl to preprocess this... @clintsharp @dritan

0 Karma

jluo_splunk
Splunk Employee
Splunk Employee

I think you'd be better off doing this at the source rather than in Splunk. Is it possible to write a script to cleanse the data before it's written to a file that Splunk monitors?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Painting a Clearer Picture: Creating Cross-Domain Visibility with AI Canvas

    Thursday, June 25, 2026  |  11AM PDT / 2PM EDT  Duration: 1 Hour (Includes live Q&A) Register to ...

Analytics Workspace deprecation

As of Splunk Cloud Platform 10.4.2604 and Splunk Enterprise 10.4, Analytics Workspace is now deprecated. ...

Splunk Developer Day Recap: Building, Publishing, and Growing on the Splunk Platform

Splunk Developer Day brought the Splunk developer community together for a practical look at what it means to ...