Hello,
I would like to ask, if it is possible to pass a time restriction to a subsearch of an join ? Unfortunately I did not find anything fitting in the forum.
In my specific case I would like to enrich the results of search1 with the last event of search2, in which the ID is equal and the timestamp of search2 is not more than 5 minutes before the timestamp of search1.
index="summary_index" search_name="search1" ...|fields _time ID ...
|join type=left left=L right=R usetime=true earlier=true where L.ID=R.ID [search index="summary_index" search_name="search2" |fields ...]
Does someone have an idea?
Thanks in advance!
As @ITWhisperer already pointed out, the subsearch is evaluated first, so you can't affect it from within the outer search. You could try to reverse the order (do the subsearch as main search and vice-versa) or use map to spawn subsearch for each outer search result separately but that's a very ugly an inefficient solution.
The best way would be if you could rephrase your search to not use subsearches.
Hallo @PickleRick ,
thanks for your answer. Is there also no way to use a where-clause?
The outer search has a very low number of results. Maybe the inefficient solution would also be fine for me. How would a solution like this be designed?
https://docs.splunk.com/Documentation/Splunk/8.2.3/SearchReference/Map
But that's a very, very ugly solution. And very rarely it's the right way to go.
It's hard to be specific if we don't know what is your particular use case but there is probably a way to do it with where (to compare two timestamp fields of a result set and filter based on that information).
Thank you, in this case I will avoid the map.
The usecase is actually quite simple. The first search (search1) gets me error messages. They have a time and an ID. I have another search (search2) which gives me informations about operations made. They have the same ID (because the ID is referenced to the device).
index="summary_index" search_name="search1" ...|fields _time ID ...
|join type=left left=L right=R usetime=true earlier=true where L.ID=R.ID [search index="summary_index" search_name="search2" |fields ...]
so basically I want to use the current where clause, in which the ID are matched, with something where there is a time criterium. I thought about something similar to (I know, the syntax is wrong, sorry for this)
join type=left left=L right=R usetime=true earlier=true where L.ID=R.ID and (L._time-R._time)<300
Unless you have many-to-many relationship, which would make it a bit more complicated, the possible solution is to use eventstats.
Just get all your events into one set of results either with a simple alternative (search conditions 1) OR (search conditions 2) or with append/multisearch. Then you might need some trickery to set distinct fields for results of search1 vs search2 (if they "natively" have separate sets of fields you're good to go on the fly, but otherwise you might need some conditional evaluation. And finally you just call eventstats and where.
For example (I assume that both errors and ops indexes have events that have ID as well as message field):
index=errors OR index=ops
| eval opsname=if(index="ops",message,null())
| eval message=if(index="ops",null(),message)
| eventstats values(opsname) as opsname values(_time) as opstime by ID
| search message=*
| where opstime-_time<=300
That's without using append or multisearch.
You can also rephrase it to use a subsearch to generate some subset of your data to analyze:
index=errors
| fields _time message ID
| append
[ index=ops
| rename message as opsmessage
| fields opsmessage ID _time ]
| eventstats values(opsname) as opsname values(_time) as opstime by ID
| search message=*
| where opstime-_time<=300
EDIT: sorry for ommiting that, you'll also have to rename _time to opstime otherwise the stats won't work properly. But you should get my drift.
Essentially, this cannot be done with a join - information passes right to left as the subsearch is executed before being joined to events pipeline.
Thanks for the reply @ITWhisperer
Is there also no where clause that I could use in the join?
Or is there an alternative to the left join for enriching the data of search 1 with data of search 2 in case that the criteria are met (older than the event, but max 5 mins older and containing the same ID)?
You might be able to append the second search, sort by time and do some stats-style commands to gather the correlated events together. It is hard to say without a detailed view of the events you are dealing with.
You can use transactions to do this. However there are limits to transactions -- how large are the datasets and the number results you are expecting?
Thanks for your answer @johnhuang
The search results represent an error case which is quite rare. So the dataset itself is large (10k/day), but the expected result of the outer search is <10/day. The search will run on daily basis with the timespan of 1d.