Splunk Search

Lookup Tables - Dedup

genesiusj
Builder

Hello,
I Googled and checked several answer posts, but perhaps I am not wording it correctly in the search engines.

I have a lookup table and I want to remove duplicates from the table itself. Not just when the table is being used.
There are 3 fields: ACCT, AUID, ADDR.
It is quite possible that a user may login from another PC, so I need to keep entries where the ACCT and AUID are the same but the ADDR is different. I using append=true in my outputlookup command to add new entries. Issue is, all entries are being added to the lookup, including those containing duplicate values of those 3 fields.
Here is my SPL (which is running in a dashboard).

index="linuxevents" AND host=rub.us AND source="/var/log/audit/audit.log" 
    AND acct="$userId_tok$"
| stats count by acct, auid, addr 
| fields acct, auid, addr 
| head limit=0 
| table acct, auid, addr --> 
| rename acct AS ACCT, auid AS AUID, addr AS ADDR 
| table ACCT, AUID, ADDR 
| outputlookup myAAAlookup.csv append=true

I am aware that I can run this to remove duplicates at search time.

| inputlookup myAAAlookup.csv 
| dedup ACCT,AUID,ADDR
| outputlookup myAAAlookup.csv append=true

However, I want to remove all duplicate entries from the lookup table itself. The table should contain only 5 rows at this time of testing. Instead, there are over 300 duplicate rows, and growing each time the dashboard is run.

Thanks and God bless,
Genesius

0 Karma
1 Solution

cmerriman
Super Champion

Add the inputlookup command to your saved search to dedup before you output.
Run it without the outputlookup command first for testing purposes.

index="linuxevents" AND host=rub.us AND source="/var/log/audit/audit.log" 
     AND acct="$userId_tok$"
 | stats count as _count by acct, auid, addr 
 | rename acct AS ACCT, auid AS AUID, addr AS ADDR 
 | inputlookup myAAAlookup.csv append=true
 | dedup ACCT AUID ADDR
 | outputlookup myAAAlookup.csv append=true

View solution in original post

cmerriman
Super Champion

Add the inputlookup command to your saved search to dedup before you output.
Run it without the outputlookup command first for testing purposes.

index="linuxevents" AND host=rub.us AND source="/var/log/audit/audit.log" 
     AND acct="$userId_tok$"
 | stats count as _count by acct, auid, addr 
 | rename acct AS ACCT, auid AS AUID, addr AS ADDR 
 | inputlookup myAAAlookup.csv append=true
 | dedup ACCT AUID ADDR
 | outputlookup myAAAlookup.csv append=true

genesiusj
Builder

@cmerriman
Thank you for your reply.
I have a couple of questions.

  1. What is _count?
  2. I understand "append=true" for inputlookup. Why is it used on the outputlookup?

Thanks and God bless,
Genesius

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Observe and Secure All Apps with Splunk

 Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

What's New in Splunk Observability - August 2025

What's New We are excited to announce the latest enhancements to Splunk Observability Cloud as well as what is ...

Introduction to Splunk AI

How are you using AI in Splunk? Whether you see AI as a threat or opportunity, AI is here to stay. Lucky for ...