Solved: Can the dedup command run within props.conf?

luck123813 · ‎01-31-2020

Hey everyone,

I have an issue where I am ingesting data via REST API, though I am getting a lot of duplicate data into the index. It seems the issue resides on the table where the API sources from, so in the meantime I have to dedup the results.

index=index1 sourcetype=dataset1 | dedup data_id | table column_1, column_2,  column_3

My question is, is there a way to run the dedup command within the props.conf file ?
I have read where I could do an eval =mvdedup(value) command, but I would need to dedup across the events and not just one field

Any thoughts?

nickhills · ‎01-31-2020

If I understand your problem and my assumptions are correct, dedup will likely not help you.

My first assumption is that you have a rest API method which polls a webservice on an interval and imports a number of events.
My second assumption is that on each subsequent poll, you are bringing in events which have already been collected.

Dedup is used to remove duplicates in a "stream" - it's concept could be useful if in one of your polls you have duplicated events, but it will not be able to evaluate a set of new events with those previously indexed.

Ideally your poll (and the API) would allow you to maintain a checkpoint of the last event you imported, and on subsequent polls, only collect events following that checkpoint.

That does rely on the API giving you a sequential record ID, (with which you would handle the checkpointing logic) or its own checkpointing function.

If my comment helps, please give it a thumbs up!

View solution in original post

luck123813 · ‎02-06-2020

Currently, I am using the REST API Modular Input from from splunkbase. Is there anyway I can manually put in a < checkpoint> as you mentioned within this REST API Modular Input (at the UI) ?

I also tried to run a script (external splunk), in which thje API writes to a file which and also produces the same data each time it is ran. @nickhillscpl

nickhills · ‎01-31-2020

If I understand your problem and my assumptions are correct, dedup will likely not help you.

My first assumption is that you have a rest API method which polls a webservice on an interval and imports a number of events.
My second assumption is that on each subsequent poll, you are bringing in events which have already been collected.

Dedup is used to remove duplicates in a "stream" - it's concept could be useful if in one of your polls you have duplicated events, but it will not be able to evaluate a set of new events with those previously indexed.

Ideally your poll (and the API) would allow you to maintain a checkpoint of the last event you imported, and on subsequent polls, only collect events following that checkpoint.

That does rely on the API giving you a sequential record ID, (with which you would handle the checkpointing logic) or its own checkpointing function.

If my comment helps, please give it a thumbs up!

Can the dedup command run within props.conf?

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

Alerting Best Practices: How to Create Good Detectors

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...