Splunk Search

Lookup Tables - Dedup

genesiusj
Builder

Hello,
I Googled and checked several answer posts, but perhaps I am not wording it correctly in the search engines.

I have a lookup table and I want to remove duplicates from the table itself. Not just when the table is being used.
There are 3 fields: ACCT, AUID, ADDR.
It is quite possible that a user may login from another PC, so I need to keep entries where the ACCT and AUID are the same but the ADDR is different. I using append=true in my outputlookup command to add new entries. Issue is, all entries are being added to the lookup, including those containing duplicate values of those 3 fields.
Here is my SPL (which is running in a dashboard).

index="linuxevents" AND host=rub.us AND source="/var/log/audit/audit.log" 
    AND acct="$userId_tok$"
| stats count by acct, auid, addr 
| fields acct, auid, addr 
| head limit=0 
| table acct, auid, addr --> 
| rename acct AS ACCT, auid AS AUID, addr AS ADDR 
| table ACCT, AUID, ADDR 
| outputlookup myAAAlookup.csv append=true

I am aware that I can run this to remove duplicates at search time.

| inputlookup myAAAlookup.csv 
| dedup ACCT,AUID,ADDR
| outputlookup myAAAlookup.csv append=true

However, I want to remove all duplicate entries from the lookup table itself. The table should contain only 5 rows at this time of testing. Instead, there are over 300 duplicate rows, and growing each time the dashboard is run.

Thanks and God bless,
Genesius

0 Karma
1 Solution

cmerriman
Super Champion

Add the inputlookup command to your saved search to dedup before you output.
Run it without the outputlookup command first for testing purposes.

index="linuxevents" AND host=rub.us AND source="/var/log/audit/audit.log" 
     AND acct="$userId_tok$"
 | stats count as _count by acct, auid, addr 
 | rename acct AS ACCT, auid AS AUID, addr AS ADDR 
 | inputlookup myAAAlookup.csv append=true
 | dedup ACCT AUID ADDR
 | outputlookup myAAAlookup.csv append=true

View solution in original post

cmerriman
Super Champion

Add the inputlookup command to your saved search to dedup before you output.
Run it without the outputlookup command first for testing purposes.

index="linuxevents" AND host=rub.us AND source="/var/log/audit/audit.log" 
     AND acct="$userId_tok$"
 | stats count as _count by acct, auid, addr 
 | rename acct AS ACCT, auid AS AUID, addr AS ADDR 
 | inputlookup myAAAlookup.csv append=true
 | dedup ACCT AUID ADDR
 | outputlookup myAAAlookup.csv append=true

genesiusj
Builder

@cmerriman
Thank you for your reply.
I have a couple of questions.

  1. What is _count?
  2. I understand "append=true" for inputlookup. Why is it used on the outputlookup?

Thanks and God bless,
Genesius

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...