Solved: Lookup Tables - Dedup

genesiusj · ‎10-04-2019

Hello,
I Googled and checked several answer posts, but perhaps I am not wording it correctly in the search engines.

I have a lookup table and I want to remove duplicates from the table itself. Not just when the table is being used.
There are 3 fields: ACCT, AUID, ADDR.
It is quite possible that a user may login from another PC, so I need to keep entries where the ACCT and AUID are the same but the ADDR is different. I using append=true in my outputlookup command to add new entries. Issue is, all entries are being added to the lookup, including those containing duplicate values of those 3 fields.
Here is my SPL (which is running in a dashboard).

index="linuxevents" AND host=rub.us AND source="/var/log/audit/audit.log" 
    AND acct="$userId_tok$"
| stats count by acct, auid, addr 
| fields acct, auid, addr 
| head limit=0 
| table acct, auid, addr --> 
| rename acct AS ACCT, auid AS AUID, addr AS ADDR 
| table ACCT, AUID, ADDR 
| outputlookup myAAAlookup.csv append=true

I am aware that I can run this to remove duplicates at search time.

| inputlookup myAAAlookup.csv 
| dedup ACCT,AUID,ADDR
| outputlookup myAAAlookup.csv append=true

However, I want to remove all duplicate entries from the lookup table itself. The table should contain only 5 rows at this time of testing. Instead, there are over 300 duplicate rows, and growing each time the dashboard is run.

Thanks and God bless,
Genesius

cmerriman · ‎10-04-2019

Add the inputlookup command to your saved search to dedup before you output.
Run it without the outputlookup command first for testing purposes.

index="linuxevents" AND host=rub.us AND source="/var/log/audit/audit.log" 
     AND acct="$userId_tok$"
 | stats count as _count by acct, auid, addr 
 | rename acct AS ACCT, auid AS AUID, addr AS ADDR 
 | inputlookup myAAAlookup.csv append=true
 | dedup ACCT AUID ADDR
 | outputlookup myAAAlookup.csv append=true

View solution in original post

cmerriman · ‎10-04-2019

Add the inputlookup command to your saved search to dedup before you output.
Run it without the outputlookup command first for testing purposes.

index="linuxevents" AND host=rub.us AND source="/var/log/audit/audit.log" 
     AND acct="$userId_tok$"
 | stats count as _count by acct, auid, addr 
 | rename acct AS ACCT, auid AS AUID, addr AS ADDR 
 | inputlookup myAAAlookup.csv append=true
 | dedup ACCT AUID ADDR
 | outputlookup myAAAlookup.csv append=true

genesiusj · ‎10-08-2019

@cmerriman
Thank you for your reply.
I have a couple of questions.

What is _count?
I understand "append=true" for inputlookup. Why is it used on the outputlookup?

Thanks and God bless,
Genesius

Lookup Tables - Dedup

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

Alerting Best Practices: How to Create Good Detectors

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...