Splunk Search

Lookup Tables - Dedup

genesiusj
Builder

Hello,
I Googled and checked several answer posts, but perhaps I am not wording it correctly in the search engines.

I have a lookup table and I want to remove duplicates from the table itself. Not just when the table is being used.
There are 3 fields: ACCT, AUID, ADDR.
It is quite possible that a user may login from another PC, so I need to keep entries where the ACCT and AUID are the same but the ADDR is different. I using append=true in my outputlookup command to add new entries. Issue is, all entries are being added to the lookup, including those containing duplicate values of those 3 fields.
Here is my SPL (which is running in a dashboard).

index="linuxevents" AND host=rub.us AND source="/var/log/audit/audit.log" 
    AND acct="$userId_tok$"
| stats count by acct, auid, addr 
| fields acct, auid, addr 
| head limit=0 
| table acct, auid, addr --> 
| rename acct AS ACCT, auid AS AUID, addr AS ADDR 
| table ACCT, AUID, ADDR 
| outputlookup myAAAlookup.csv append=true

I am aware that I can run this to remove duplicates at search time.

| inputlookup myAAAlookup.csv 
| dedup ACCT,AUID,ADDR
| outputlookup myAAAlookup.csv append=true

However, I want to remove all duplicate entries from the lookup table itself. The table should contain only 5 rows at this time of testing. Instead, there are over 300 duplicate rows, and growing each time the dashboard is run.

Thanks and God bless,
Genesius

0 Karma
1 Solution

cmerriman
Super Champion

Add the inputlookup command to your saved search to dedup before you output.
Run it without the outputlookup command first for testing purposes.

index="linuxevents" AND host=rub.us AND source="/var/log/audit/audit.log" 
     AND acct="$userId_tok$"
 | stats count as _count by acct, auid, addr 
 | rename acct AS ACCT, auid AS AUID, addr AS ADDR 
 | inputlookup myAAAlookup.csv append=true
 | dedup ACCT AUID ADDR
 | outputlookup myAAAlookup.csv append=true

View solution in original post

cmerriman
Super Champion

Add the inputlookup command to your saved search to dedup before you output.
Run it without the outputlookup command first for testing purposes.

index="linuxevents" AND host=rub.us AND source="/var/log/audit/audit.log" 
     AND acct="$userId_tok$"
 | stats count as _count by acct, auid, addr 
 | rename acct AS ACCT, auid AS AUID, addr AS ADDR 
 | inputlookup myAAAlookup.csv append=true
 | dedup ACCT AUID ADDR
 | outputlookup myAAAlookup.csv append=true

genesiusj
Builder

@cmerriman
Thank you for your reply.
I have a couple of questions.

  1. What is _count?
  2. I understand "append=true" for inputlookup. Why is it used on the outputlookup?

Thanks and God bless,
Genesius

0 Karma
Get Updates on the Splunk Community!

Splunk Classroom Chronicles: Training Tales and Testimonials

Welcome to the "Splunk Classroom Chronicles" series, created to help curious, career-minded learners get ...

Access Tokens Page - New & Improved

Splunk Observability Cloud recently launched an improved design for the access tokens page for better ...

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

🍂 Fall into November with a fresh lineup of Community Office Hours, Tech Talks, and Webinars we’ve ...