Solved: Why am I still seeing multiple entries in my resul...

lbogle · ‎02-25-2015

Hello Splunkers,

I have what I think should be an easy question, but I'm not able to make it happen.
I have two lookup tables, each with one field listing host names. The field is named "computer" and the host names are all basic alpha-numeric field values "pc-username" or "pc-username2" etc.

| inputlookup filename.csv  
| inputlookup filename1.csv

Filename.csv has 700 entries with hostname values (all lower case)
Filename1.csv has about 13,000 fields with hostname values (all lower case)

I would like to filter out all the values in filename1.csv from filename.csv.
I am doing this with the following search:

| inputlookup filename.csv  | fields computer  | search NOT [|inputlookup filename1.csv | fields computer]

This drastically reduces the number of 13,000 computers down to about 611 however, when I spot check against the lookup table, there are still multiple entries from filename 1.csv in the resulting report. Both lookup tables have been deduplicated.

What am I missing?

Thanks for your help!

tom_frotscher · ‎02-26-2015

Hi, i think the solutions is, that a subsearch only returns 10,000 results by default. So the hostnames you "spot check" are maybe from the 3000 hosts left that are not reutrned by the subsearch.

You can set the limit to a higher number and you search will work. Here is some additional info on how to do it:

About Subsearches

Greetings

Tom

View solution in original post

tom_frotscher · ‎02-26-2015

Hi, i think the solutions is, that a subsearch only returns 10,000 results by default. So the hostnames you "spot check" are maybe from the 3000 hosts left that are not reutrned by the subsearch.

You can set the limit to a higher number and you search will work. Here is some additional info on how to do it:

About Subsearches

Greetings

Tom

lbogle · ‎02-26-2015

Gooooood....
I added the following entries to my limits.conf file

[inputproc]
file_tracking_db_threshold_mb = 500
[searchresults]
maxresultrows=500000
maxout=500000
subsearch_maxout=500000

and nothing was coming back successfully.

I saw the line about piping the format maxresults=500000 to the end of your sub search and BAM! totally different results. Will spot check and let you know. This seems to be a more accurate number though based on my knowledge of the contents of each file and I suspect this was the missing piece of the puzzle.

Faith in Splunk restored.
Carry on.

markthompson · ‎02-26-2015

Would it not be easier to use the | dedup computer functionality, it removes duplicates from your results. try running that at the end of your search

dedup computer

lbogle · ‎02-26-2015

Hi Mark,
Thank you for your reply. It would be but deduplicating the files is not my end goal. I am trying to filter a lookup table based on the contents of another lookup table. I was communicating that the dedup was done in the .csv files prior to the lookup tables being ingested in Splunk to erase any sort of duplicates being an issue w/ the results here.
Thank you for taking the time to reply, Mark.
I appreciate it.

Why am I still seeing multiple entries in my results from a lookup table that should have been filtered out by another lookup?

Splunk Observability as Code: From Zero to Dashboard

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Shape the Future of Splunk: Join the Product Research Lab!

Are you a member of the Splunk Community?

Why am I still seeing multiple entries in my results from a lookup table that should have been filtered out by another lookup?

Splunk Observability as Code: From Zero to Dashboard

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Shape the Future of Splunk: Join the Product Research Lab!