We would like to use Splunk and DBConnect to archive unique records for several datasets. Each of these datasets are produced on a daily basis through batch processing. Over 95% of the records each day have not changed or are not new records. We would like to collect and index only those changed and new records from each daily run.
Has anyone created an effective, working method to do this type of filtering?
In each case, the inputs that need to be filtered are single table/view database sources. Therefore, we need a "blended" rising column (with multiple column values) that can look for changes in daily batch records vs. existing records on previously indexed data in Splunk.
I would think about creating a db input for each rising column that you might use. You can have the same basic query for every input, but use a different rising column in each input.
For example, if you use data from tables A, B, and C in your query, could you have one input that will ingest new/updated records from table A, and another input to ingest new/updated records from table B?