i have 200000 splunk events from which i only want 15000 events ( like vlookup in excel)
Splunk events contain(200000 hosts ) : host version kernel
lookuptable contain (15000 hosts): host window
i only want result for the host listed in lookup table(15000)
i tried using join but it truncates the result .
Query : | inputlookup "host.csv" | rename HOST as host | join host [search index=server_info platform=redhat message="SYSINFO*" host="*" ] | table host version kernel window | where window != " "
In subsearches you have the limit of 50,000 results so you must put the search as first.
Then to use the join command with many events isn't a good idea becaus your search will be very slow.
So you could change your search like the following:
index=server_info platform=redhat message="SYSINFO*" host="*" [ | inputlookup "host.csv" | rename HOST as host | fields host ] | lookup host.csv HOST AS host OUTPUT version kernel window | table host version kernel window
Search 1 : index=server_info platform=redhat message="SYSINFO*" host="*"
| dedup host | table host Window kernel version
Search 2 : | inputlookup Host-Q3.csv | table host Window
Search combined : index=cba_chef platform=redhat message="SYSINFO*" host="*"
| lookup Host-Q3.csv host OUTPUT Window | where Window != " " | dedup host |
table host Window kernel version
i dont know why the results are not complete, ideally combined searc should give 15000 events but it doesn't , i have tried all the solutions listed below but same results .
is there any other way to to search only limited events/host from the whole load of events.
Do not inputlookup and join. Do the search and use lookup as a lookup. You are breaking map reduce in several ways plus hitting limits.
SearchHere | lookup host.csv HOST as host OUTPUT HOST as isFound | where isnotnull(isFound) | stats count by host version kernel window | fields - count
The default for subsearch is as follows.
I think that it is good to use the LOOKUP command by setting the LOOKUP definition.
[subsearch] maxout = <integer> Maximum number of results to return from a subsearch. This value cannot be greater than or equal to 10500. Defaults to 10000. maxtime = <integer> Maximum number of seconds to run a subsearch before finalizing Defaults to 60. ttl = <integer> Time to cache a given subsearch's results, in seconds. Do not set this below 120 seconds. Defaults to 300.
index=server_info platform=redhat message="SYSINFO*" host="*" |lookup host.csv host OUTPUT Window | table host Window kernel version |where Window != " "
i tried this too, it only give partial results 4400 instead of 15000 host in csv lookup table.
Please double-check the events that have not been set up and the contents of the lookup.
LOOKUP does not exclude events. In the case of a distributed configuration, the LOOKUP table may not be synchronized on the indexer side.
Try setting "local=true"
Description: If local=true, forces the lookup to run on the search head and not on any remote peers.