Hi all, getting to grips with SPL and would be forever grateful if someone could lend their brain for the below:
I've got the lookup in the format below:
(Fields) --> host, os, os version
-----------------------------------------
(Values) ---> Server01, Windows, Windows Server 2019
But in my case, this lookup has 3000 field values, I want to know their source values in Splunk (This lookup was generated by a match condition with another, so I KNOW that these hosts are present in my Splunk env)
I basically need a way to do the following:
"| tstats values(sources) where index=* host=(WHATEVER IS IN MY LOOKUP HOST FIELD) by index, host"
But i can't seem to find a way, I did try to originally match the below:
| tstats values(source) where index=* by host, index
| join type=inner host
| [|inputlookup mylookup.csv | fields host | dedup host]
But my results were too large to handle by Splunk, plz help
Forget join - that is not a splunk way of doing things.
Use either a subsearch or a lookup - they may perform differently depending on data volumes, but you can do this
Subsearch method
| tstats values(sources) where index=*
[ | inputlookup mylookjup.csv | fields host | dedup host ]
by index, host
The subsearch will effectively return with ( host=x OR host=y OR host=z...) which is then used in the outer search.
Lookup method
| tstats values(sources) where index=* by index, host
| lookup mylookjup.csv host
This gets ALL the data from the indexes and then does the lookup to get the OS details.
You can always do
| where isnull(os)
which will then show those hosts that do not exist in the lookup that are found in the data.
Note that the lookup CSV will be case sensitive - if you want to make it insensitive, make a lookup definition and configure it as case insensitive
Appreciate your speedy reply!
I've just checked to see how by "subsearch" would run, I've changed my time picker to 30 mins and it hasn't run anything, its been stuck at 0 of 0 events matched with bar flashing....
Also, the second way "lookup" uses the same concept of my previous search, I will most likely run into a "VV data is too large for serialization" error
The reason why your subsearch is taking a long time is _probably_ due to the volume of hosts, because using a large X=Y OR A=B OR C=D expression in the search can be very slow to get parsed and setup, hence the lookup option can often be the better option.
The second way is fundamentally on a different planet to your concept of the previous search. Using join in itself is limited and using join+inputlookup is a completely wrong way to use lookups. The lookup command is designed to enrich data with results from a lookup. If a result cannot be found in the lookup, you will not get results from the lookup and you can validate that state.
Have you tried it?
I've just checked to see how by "subsearch" would run, I've changed my time picker to 30 mins and it hasn't run anything, its been stuck at 0 of 0 events matched with
The question is: have you manually verified that your data in the new time period actually contain matching hosts? Here is a quick way to confirm that the subsearch method works.
1. Select a couple hosts that you know some events match in this time period match. Say, host1 and host2. Run this search with the chosen time picker.
| tstats values(source)
where index=* (host="host1" OR host="host2")
earliest=-30m@h latest=-0@h
by host, index
If this search gives you no output, you need to find another couple of hosts till you can get non-zero output. (Hint: it is best to run all tests with a time boundaries where you know your tests will not cross. I would suggesting using fixed earliest/latest rather than time picker. For example, earliest=-30m@h latest=-0m@h)
2. Make sure that host1 and host2 exist in mylookup.csv with this search
| inputookup mylookup.csv where host IN (host1, host2)
| fields host
| dedup host
| format
As @bowesmana explained, the output should be something like
search |
( ( host="host1" ) OR ( host="host2" ) ) |
If your result is different, that means mylookup.csv does not host1 and host2. You then need to redesign/repopulate your lookup table.
3. Run the following combined search
| tstats values(source)
where index=*
[inputookup mylookup.csv where host IN (host1, host2)
| fields host
| dedup host]
earliest=-30m@h latest=-0@h
by host, index
This search should give you the exact same results as the first one.
Then, you just remove there where clause in the subsearch.
Also, the second way "lookup" uses the same concept of my previous search, I will most likely run into a "VV data is too large for serialization" error
Why do you say @bowesmana's lookup method uses the same concept as your join method? Have you tried it? It is totally different because it doesn't involve join. Are you suggesting that | tstats values(source) where index=* by host, index always gives you that error? What about | tstats values(source) where index=* by host? What about | tstats values(source) where index=*? If these searches give you error, you may have some fundamental problem in your indexer. No amount of SPL can save the day.
Given the chance, however, I would use the subsearch method because it is the fastest.