Getting Data In

Json without duplicates

Dherom
New Member

Good afternoon guys,

We need help.

We have a JSON file in which duplicate events are written.

We want to know how to have a primary key so that it does not index those duplicates and is not in the Splunk index.

{
"security": {
"notices": [
{
"rss_published": "2019-02-12T13:33:31.000Z",
"rss_message": "Email provider VFEmail has suffered what the company is calling \"catastrophic destruction\" at the hands of an as-yet unknown intruder who trashed all of the company's primary and backup data in the United States. The firm's founder says he ....",
"rss_fuente": "rss_krebsonsecurity",
"rss_title": "Email Provider VFEmail Suffers \u2018Catastrophic\u2019 Hack",
"rss_link": "https://krebsonsecurity.com/2019/02/email-provider-vfemail-suffers-catastrophic-hack/"
}
]
}
}
{
"security": {
"notices": [
{
"rss_published": "2019-02-12T13:33:31.000Z",
"rss_message": "Email provider VFEmail has suffered what the company is calling \"catastrophic destruction\" at the hands of an as-yet unknown intruder who trashed all of the company's primary and backup data in the United States. The firm's founder says he ....",
"rss_fuente": "rss_krebsonsecurity",
"rss_title": "Email Provider VFEmail Suffers \u2018Catastrophic\u2019 Hack",
"rss_link": "https://krebsonsecurity.com/2019/02/email-provider-vfemail-suffers-catastrophic-hack/"
}
]
}
}
{
"security": {
"notices": [
{
"rss_published": "2019-02-12T11:33:54.000Z",
"rss_message": "El fallo afecta a otros productos derivados de Docker que usan runc y al propio LXC, permitiendo acceder a la m\u00e1quina host con permisos de superusuario. Los investigadores Adam Iwaniuk y Borys Pop\u0142awski han descubierto una vulnerabilidad en....",
"rss_fuente": "rss_hispasec",
"rss_title": "Vulnerabilidad en runc permite escapar de contenedor Docker con permisos root",
"rss_link": "https://unaaldia.hispasec.com/2019/02/vulnerabilidad-en-runc-permite-escapar-de-contenedor-docker-co..."
}
]
}
}
thank you!

Tags (2)
0 Karma

Dherom
New Member

But there is no method that at the time of indexing look at two fields of the json and make a hash or something so that these duplicates do not exist

0 Karma

jluo_splunk
Splunk Employee
Splunk Employee

There may be something possible using the DSP beta, but at this point in time, it would be much less efficient to do it inside of Splunk - you would potentially cause some amount of ingestion latency.

0 Karma

woodcock
Esteemed Legend

You can use Cribl to preprocess this... @clintsharp @dritan

0 Karma

jluo_splunk
Splunk Employee
Splunk Employee

I think you'd be better off doing this at the source rather than in Splunk. Is it possible to write a script to cleanse the data before it's written to a file that Splunk monitors?

0 Karma
Get Updates on the Splunk Community!

Database Performance Sidebar Panel Now on APM Database Query Performance & Service ...

We’ve streamlined the troubleshooting experience for database-related service issues by adding a database ...

IM Landing Page Filter - Now Available

We’ve added the capability for you to filter across the summary details on the main Infrastructure Monitoring ...

Dynamic Links from Alerts to IM Navigators - New in Observability Cloud

Splunk continues to improve the troubleshooting experience in Observability Cloud with this latest enhancement ...