Getting Data In

Json without duplicates

Dherom
New Member

Good afternoon guys,

We need help.

We have a JSON file in which duplicate events are written.

We want to know how to have a primary key so that it does not index those duplicates and is not in the Splunk index.

{
"security": {
"notices": [
{
"rss_published": "2019-02-12T13:33:31.000Z",
"rss_message": "Email provider VFEmail has suffered what the company is calling \"catastrophic destruction\" at the hands of an as-yet unknown intruder who trashed all of the company's primary and backup data in the United States. The firm's founder says he ....",
"rss_fuente": "rss_krebsonsecurity",
"rss_title": "Email Provider VFEmail Suffers \u2018Catastrophic\u2019 Hack",
"rss_link": "https://krebsonsecurity.com/2019/02/email-provider-vfemail-suffers-catastrophic-hack/"
}
]
}
}
{
"security": {
"notices": [
{
"rss_published": "2019-02-12T13:33:31.000Z",
"rss_message": "Email provider VFEmail has suffered what the company is calling \"catastrophic destruction\" at the hands of an as-yet unknown intruder who trashed all of the company's primary and backup data in the United States. The firm's founder says he ....",
"rss_fuente": "rss_krebsonsecurity",
"rss_title": "Email Provider VFEmail Suffers \u2018Catastrophic\u2019 Hack",
"rss_link": "https://krebsonsecurity.com/2019/02/email-provider-vfemail-suffers-catastrophic-hack/"
}
]
}
}
{
"security": {
"notices": [
{
"rss_published": "2019-02-12T11:33:54.000Z",
"rss_message": "El fallo afecta a otros productos derivados de Docker que usan runc y al propio LXC, permitiendo acceder a la m\u00e1quina host con permisos de superusuario. Los investigadores Adam Iwaniuk y Borys Pop\u0142awski han descubierto una vulnerabilidad en....",
"rss_fuente": "rss_hispasec",
"rss_title": "Vulnerabilidad en runc permite escapar de contenedor Docker con permisos root",
"rss_link": "https://unaaldia.hispasec.com/2019/02/vulnerabilidad-en-runc-permite-escapar-de-contenedor-docker-co..."
}
]
}
}
thank you!

Tags (2)
0 Karma

Dherom
New Member

But there is no method that at the time of indexing look at two fields of the json and make a hash or something so that these duplicates do not exist

0 Karma

jluo_splunk
Splunk Employee
Splunk Employee

There may be something possible using the DSP beta, but at this point in time, it would be much less efficient to do it inside of Splunk - you would potentially cause some amount of ingestion latency.

0 Karma

woodcock
Esteemed Legend

You can use Cribl to preprocess this... @clintsharp @dritan

0 Karma

jluo_splunk
Splunk Employee
Splunk Employee

I think you'd be better off doing this at the source rather than in Splunk. Is it possible to write a script to cleanse the data before it's written to a file that Splunk monitors?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...