Splunk Search

Can the dedup command run within props.conf?

luck123813
Explorer

Hey everyone,

I have an issue where I am ingesting data via REST API, though I am getting a lot of duplicate data into the index. It seems the issue resides on the table where the API sources from, so in the meantime I have to dedup the results.

index=index1 sourcetype=dataset1 | dedup data_id | table column_1, column_2,  column_3

My question is, is there a way to run the dedup command within the props.conf file ?
I have read where I could do an eval =mvdedup(value) command, but I would need to dedup across the events and not just one field

Any thoughts?

0 Karma
1 Solution

nickhills
Ultra Champion

If I understand your problem and my assumptions are correct, dedup will likely not help you.

My first assumption is that you have a rest API method which polls a webservice on an interval and imports a number of events.
My second assumption is that on each subsequent poll, you are bringing in events which have already been collected.

Dedup is used to remove duplicates in a "stream" - it's concept could be useful if in one of your polls you have duplicated events, but it will not be able to evaluate a set of new events with those previously indexed.

Ideally your poll (and the API) would allow you to maintain a checkpoint of the last event you imported, and on subsequent polls, only collect events following that checkpoint.

That does rely on the API giving you a sequential record ID, (with which you would handle the checkpointing logic) or its own checkpointing function.

If my comment helps, please give it a thumbs up!

View solution in original post

luck123813
Explorer

Currently, I am using the REST API Modular Input from from splunkbase. Is there anyway I can manually put in a < checkpoint> as you mentioned within this REST API Modular Input (at the UI) ?

I also tried to run a script (external splunk), in which thje API writes to a file which and also produces the same data each time it is ran. @nickhillscpl

0 Karma

nickhills
Ultra Champion

If I understand your problem and my assumptions are correct, dedup will likely not help you.

My first assumption is that you have a rest API method which polls a webservice on an interval and imports a number of events.
My second assumption is that on each subsequent poll, you are bringing in events which have already been collected.

Dedup is used to remove duplicates in a "stream" - it's concept could be useful if in one of your polls you have duplicated events, but it will not be able to evaluate a set of new events with those previously indexed.

Ideally your poll (and the API) would allow you to maintain a checkpoint of the last event you imported, and on subsequent polls, only collect events following that checkpoint.

That does rely on the API giving you a sequential record ID, (with which you would handle the checkpointing logic) or its own checkpointing function.

If my comment helps, please give it a thumbs up!
Get Updates on the Splunk Community!

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

The latest enhancements across the Splunk Observability portfolio deliver greater flexibility, better data and ...

Alerting Best Practices: How to Create Good Detectors

At their best, detectors and the alerts they trigger notify teams when applications aren’t performing as ...

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...

Hey Splunky people! We are excited to share the latest updates in Splunk Cloud Platform 9.3.2408. In this ...