Splunk Search

Why am I still seeing multiple entries in my results from a lookup table that should have been filtered out by another lookup?

lbogle
Contributor

Hello Splunkers,

I have what I think should be an easy question, but I'm not able to make it happen.
I have two lookup tables, each with one field listing host names. The field is named "computer" and the host names are all basic alpha-numeric field values "pc-username" or "pc-username2" etc.

| inputlookup filename.csv  
| inputlookup filename1.csv 

Filename.csv has 700 entries with hostname values (all lower case)
Filename1.csv has about 13,000 fields with hostname values (all lower case)

I would like to filter out all the values in filename1.csv from filename.csv.
I am doing this with the following search:

| inputlookup filename.csv  | fields computer  | search NOT [|inputlookup filename1.csv | fields computer] 

This drastically reduces the number of 13,000 computers down to about 611 however, when I spot check against the lookup table, there are still multiple entries from filename 1.csv in the resulting report. Both lookup tables have been deduplicated.

What am I missing?

Thanks for your help!

0 Karma
1 Solution

tom_frotscher
Builder

Hi, i think the solutions is, that a subsearch only returns 10,000 results by default. So the hostnames you "spot check" are maybe from the 3000 hosts left that are not reutrned by the subsearch.

You can set the limit to a higher number and you search will work. Here is some additional info on how to do it:

About Subsearches

Greetings

Tom

View solution in original post

tom_frotscher
Builder

Hi, i think the solutions is, that a subsearch only returns 10,000 results by default. So the hostnames you "spot check" are maybe from the 3000 hosts left that are not reutrned by the subsearch.

You can set the limit to a higher number and you search will work. Here is some additional info on how to do it:

About Subsearches

Greetings

Tom

lbogle
Contributor

Gooooood....
I added the following entries to my limits.conf file

[inputproc]
file_tracking_db_threshold_mb = 500
[searchresults]
maxresultrows=500000
maxout=500000
subsearch_maxout=500000

and nothing was coming back successfully.

I saw the line about piping the format maxresults=500000 to the end of your sub search and BAM! totally different results. Will spot check and let you know. This seems to be a more accurate number though based on my knowledge of the contents of each file and I suspect this was the missing piece of the puzzle.

Faith in Splunk restored.
Carry on.

0 Karma

markthompson
Builder

Would it not be easier to use the | dedup computer functionality, it removes duplicates from your results. try running that at the end of your search

dedup computer
0 Karma

lbogle
Contributor

Hi Mark,
Thank you for your reply. It would be but deduplicating the files is not my end goal. I am trying to filter a lookup table based on the contents of another lookup table. I was communicating that the dedup was done in the .csv files prior to the lookup tables being ingested in Splunk to erase any sort of duplicates being an issue w/ the results here.
Thank you for taking the time to reply, Mark.
I appreciate it.

0 Karma
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...