Getting Data In

Json without duplicates

Dherom
New Member

Good afternoon guys,

We need help.

We have a JSON file in which duplicate events are written.

We want to know how to have a primary key so that it does not index those duplicates and is not in the Splunk index.

{
"security": {
"notices": [
{
"rss_published": "2019-02-12T13:33:31.000Z",
"rss_message": "Email provider VFEmail has suffered what the company is calling \"catastrophic destruction\" at the hands of an as-yet unknown intruder who trashed all of the company's primary and backup data in the United States. The firm's founder says he ....",
"rss_fuente": "rss_krebsonsecurity",
"rss_title": "Email Provider VFEmail Suffers \u2018Catastrophic\u2019 Hack",
"rss_link": "https://krebsonsecurity.com/2019/02/email-provider-vfemail-suffers-catastrophic-hack/"
}
]
}
}
{
"security": {
"notices": [
{
"rss_published": "2019-02-12T13:33:31.000Z",
"rss_message": "Email provider VFEmail has suffered what the company is calling \"catastrophic destruction\" at the hands of an as-yet unknown intruder who trashed all of the company's primary and backup data in the United States. The firm's founder says he ....",
"rss_fuente": "rss_krebsonsecurity",
"rss_title": "Email Provider VFEmail Suffers \u2018Catastrophic\u2019 Hack",
"rss_link": "https://krebsonsecurity.com/2019/02/email-provider-vfemail-suffers-catastrophic-hack/"
}
]
}
}
{
"security": {
"notices": [
{
"rss_published": "2019-02-12T11:33:54.000Z",
"rss_message": "El fallo afecta a otros productos derivados de Docker que usan runc y al propio LXC, permitiendo acceder a la m\u00e1quina host con permisos de superusuario. Los investigadores Adam Iwaniuk y Borys Pop\u0142awski han descubierto una vulnerabilidad en....",
"rss_fuente": "rss_hispasec",
"rss_title": "Vulnerabilidad en runc permite escapar de contenedor Docker con permisos root",
"rss_link": "https://unaaldia.hispasec.com/2019/02/vulnerabilidad-en-runc-permite-escapar-de-contenedor-docker-co..."
}
]
}
}
thank you!

Tags (2)
0 Karma

Dherom
New Member

But there is no method that at the time of indexing look at two fields of the json and make a hash or something so that these duplicates do not exist

0 Karma

jluo_splunk
Splunk Employee
Splunk Employee

There may be something possible using the DSP beta, but at this point in time, it would be much less efficient to do it inside of Splunk - you would potentially cause some amount of ingestion latency.

0 Karma

woodcock
Esteemed Legend

You can use Cribl to preprocess this... @clintsharp @dritan

0 Karma

jluo_splunk
Splunk Employee
Splunk Employee

I think you'd be better off doing this at the source rather than in Splunk. Is it possible to write a script to cleanse the data before it's written to a file that Splunk monitors?

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!