Problem:
i have 200000 splunk events from which i only want 15000 events ( like vlookup in excel)
Splunk events contain(200000 hosts ) : host version kernel
lookuptable contain (15000 hosts): host window
i only want result for the host listed in lookup table(15000)
i tried using join but it truncates the result .
Query : | inputlookup "host.csv" | rename HOST as host | join host [search index=server_info platform=redhat message="SYSINFO*" host="*" ] | table host version kernel window | where window != " "
Hi manishyadav91,
In subsearches you have the limit of 50,000 results so you must put the search as first.
Then to use the join command with many events isn't a good idea becaus your search will be very slow.
So you could change your search like the following:
index=server_info platform=redhat message="SYSINFO*" host="*" [ | inputlookup "host.csv" | rename HOST as host | fields host ]
| lookup host.csv HOST AS host OUTPUT version kernel window
| table host version kernel window
Bye.
Giuseppe
Hi Giuseppe,
Search 1 : index=server_info platform=redhat message="SYSINFO*" host="*"
| dedup host | table host Window kernel version
results 29467
Search 2 : | inputlookup Host-Q3.csv | table host Window
Result: 15679
Search combined : index=cba_chef platform=redhat message="SYSINFO*" host="*"
| lookup Host-Q3.csv host OUTPUT Window | where Window != " " | dedup host |
table host Window kernel version
results 3599
i dont know why the results are not complete, ideally combined searc should give 15000 events but it doesn't , i have tried all the solutions listed below but same results .
is there any other way to to search only limited events/host from the whole load of events.
Do not inputlookup and join. Do the search and use lookup as a lookup. You are breaking map reduce in several ways plus hitting limits.
SearchHere | lookup host.csv HOST as host OUTPUT HOST as isFound | where isnotnull(isFound) | stats count by host version kernel window | fields - count
The default for subsearch is as follows.
I think that it is good to use the LOOKUP command by setting the LOOKUP definition.
https://docs.splunk.com/Documentation/Splunk/7.3.1/Search/Aboutsubsearches
[subsearch]
maxout = <integer>
Maximum number of results to return from a subsearch.
This value cannot be greater than or equal to 10500.
Defaults to 10000.
maxtime = <integer>
Maximum number of seconds to run a subsearch before finalizing
Defaults to 60.
ttl = <integer>
Time to cache a given subsearch's results, in seconds.
Do not set this below 120 seconds.
Defaults to 300.
yes , the default maxout is 50000 but my search output is 2450000. so i get the first 50000 which includes non relevant events.
Is there a problem with using the lookup command?
index=server_info platform=redhat message="SYSINFO*" host="*"
|lookup host.csv host OUTPUT XXX,XXX,XXX
index=server_info platform=redhat message="SYSINFO*" host="*" |lookup host.csv host OUTPUT Window | table host Window kernel version |where Window != " "
i tried this too, it only give partial results 4400 instead of 15000 host in csv lookup table.
Please double-check the events that have not been set up and the contents of the lookup.
Or ...
LOOKUP does not exclude events. In the case of a distributed configuration, the LOOKUP table may not be synchronized on the indexer side.
Try setting "local=true"
Optional arguments
local
Syntax: local=
Description: If local=true, forces the lookup to run on the search head and not on any remote peers.
Default: false
Can this be done in any other way ?
It will synchronize after a while. See the following blog for commands.
https://www.splunk.com/blog/2017/06/08/syncing-lookups-using-pure-spl.html