Splunk Search

Lookup Tables - Dedup

genesiusj
Builder

Hello,
I Googled and checked several answer posts, but perhaps I am not wording it correctly in the search engines.

I have a lookup table and I want to remove duplicates from the table itself. Not just when the table is being used.
There are 3 fields: ACCT, AUID, ADDR.
It is quite possible that a user may login from another PC, so I need to keep entries where the ACCT and AUID are the same but the ADDR is different. I using append=true in my outputlookup command to add new entries. Issue is, all entries are being added to the lookup, including those containing duplicate values of those 3 fields.
Here is my SPL (which is running in a dashboard).

index="linuxevents" AND host=rub.us AND source="/var/log/audit/audit.log" 
    AND acct="$userId_tok$"
| stats count by acct, auid, addr 
| fields acct, auid, addr 
| head limit=0 
| table acct, auid, addr --> 
| rename acct AS ACCT, auid AS AUID, addr AS ADDR 
| table ACCT, AUID, ADDR 
| outputlookup myAAAlookup.csv append=true

I am aware that I can run this to remove duplicates at search time.

| inputlookup myAAAlookup.csv 
| dedup ACCT,AUID,ADDR
| outputlookup myAAAlookup.csv append=true

However, I want to remove all duplicate entries from the lookup table itself. The table should contain only 5 rows at this time of testing. Instead, there are over 300 duplicate rows, and growing each time the dashboard is run.

Thanks and God bless,
Genesius

0 Karma
1 Solution

cmerriman
Super Champion

Add the inputlookup command to your saved search to dedup before you output.
Run it without the outputlookup command first for testing purposes.

index="linuxevents" AND host=rub.us AND source="/var/log/audit/audit.log" 
     AND acct="$userId_tok$"
 | stats count as _count by acct, auid, addr 
 | rename acct AS ACCT, auid AS AUID, addr AS ADDR 
 | inputlookup myAAAlookup.csv append=true
 | dedup ACCT AUID ADDR
 | outputlookup myAAAlookup.csv append=true

View solution in original post

cmerriman
Super Champion

Add the inputlookup command to your saved search to dedup before you output.
Run it without the outputlookup command first for testing purposes.

index="linuxevents" AND host=rub.us AND source="/var/log/audit/audit.log" 
     AND acct="$userId_tok$"
 | stats count as _count by acct, auid, addr 
 | rename acct AS ACCT, auid AS AUID, addr AS ADDR 
 | inputlookup myAAAlookup.csv append=true
 | dedup ACCT AUID ADDR
 | outputlookup myAAAlookup.csv append=true

genesiusj
Builder

@cmerriman
Thank you for your reply.
I have a couple of questions.

  1. What is _count?
  2. I understand "append=true" for inputlookup. Why is it used on the outputlookup?

Thanks and God bless,
Genesius

0 Karma
Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

  🚀 Your data just got a serious AI upgrade — are you ready? Say hello to the Agentic Era with the ...

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...