Re: results are being truncated in join query

manishyadav91 · ‎09-25-2019

Problem:

i have 200000 splunk events from which i only want 15000 events ( like vlookup in excel)

Splunk events contain(200000 hosts ) : host version kernel
lookuptable contain (15000 hosts): host window
i only want result for the host listed in lookup table(15000)

i tried using join but it truncates the result .

gcusello · ‎09-26-2019

Hi manishyadav91,
In subsearches you have the limit of 50,000 results so you must put the search as first.
Then to use the join command with many events isn't a good idea becaus your search will be very slow.
So you could change your search like the following:

index=server_info platform=redhat message="SYSINFO*" host="*"  [ | inputlookup "host.csv" | rename HOST as host | fields host ]
| lookup host.csv HOST AS host OUTPUT version kernel window 
| table host version kernel window

Bye.
Giuseppe

manishyadav91 · ‎09-28-2019

Hi Giuseppe,

Search 1 : index=server_info platform=redhat message="SYSINFO*" host="*"

| dedup host | table host Window kernel version
results 29467

Search 2 : | inputlookup Host-Q3.csv | table host Window
Result: 15679

Search combined : index=cba_chef platform=redhat message="SYSINFO*" host="*"

| lookup Host-Q3.csv host OUTPUT Window | where Window != " " | dedup host |
table host Window kernel version
results 3599

i dont know why the results are not complete, ideally combined searc should give 15000 events but it doesn't , i have tried all the solutions listed below but same results .

is there any other way to to search only limited events/host from the whole load of events.

starcher · ‎09-26-2019

Do not inputlookup and join. Do the search and use lookup as a lookup. You are breaking map reduce in several ways plus hitting limits.

SearchHere | lookup host.csv HOST as host OUTPUT HOST as isFound | where isnotnull(isFound) | stats count by host version kernel window | fields - count

HiroshiSatoh · ‎09-25-2019

The default for subsearch is as follows.
I think that it is good to use the LOOKUP command by setting the LOOKUP definition.

https://docs.splunk.com/Documentation/Splunk/7.3.1/Search/Aboutsubsearches

[subsearch]
maxout = <integer>

Maximum number of results to return from a subsearch.
This value cannot be greater than or equal to 10500.
Defaults to 10000.
maxtime = <integer>

Maximum number of seconds to run a subsearch before finalizing
Defaults to 60.
ttl = <integer>

Time to cache a given subsearch's results, in seconds.
Do not set this below 120 seconds.
Defaults to 300.

manishyadav91 · ‎09-25-2019

yes , the default maxout is 50000 but my search output is 2450000. so i get the first 50000 which includes non relevant events.

HiroshiSatoh · ‎09-25-2019

Is there a problem with using the lookup command?

 index=server_info platform=redhat message="SYSINFO*" host="*"
|lookup host.csv host OUTPUT XXX,XXX,XXX

manishyadav91 · ‎09-25-2019

index=server_info platform=redhat message="SYSINFO*" host="*" |lookup host.csv host OUTPUT Window | table host Window kernel version |where Window != " "

i tried this too, it only give partial results 4400 instead of 15000 host in csv lookup table.

HiroshiSatoh · ‎09-25-2019

Please double-check the events that have not been set up and the contents of the lookup.

Or ...
LOOKUP does not exclude events. In the case of a distributed configuration, the LOOKUP table may not be synchronized on the indexer side.

Try setting "local=true"

Optional arguments
local
Syntax: local=
Description: If local=true, forces the lookup to run on the search head and not on any remote peers.
Default: false

manishyadav91 · ‎09-25-2019

Can this be done in any other way ?

HiroshiSatoh · ‎09-25-2019

It will synchronize after a while. See the following blog for commands.

https://www.splunk.com/blog/2017/06/08/syncing-lookups-using-pure-spl.html

results are being truncated in join query

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms