Passing a time restriction while using join

blablabla · ‎12-03-2021

Hello,

I would like to ask, if it is possible to pass a time restriction to a subsearch of an join ? Unfortunately I did not find anything fitting in the forum.

In my specific case I would like to enrich the results of search1 with the last event of search2, in which the ID is equal and the timestamp of search2 is not more than 5 minutes before the timestamp of search1.

index="summary_index" search_name="search1" ...|fields _time ID ...
|join type=left left=L right=R usetime=true earlier=true where L.ID=R.ID [search index="summary_index" search_name="search2" |fields ...]

Does someone have an idea?

Thanks in advance!

PickleRick · ‎12-03-2021

As @ITWhisperer already pointed out, the subsearch is evaluated first, so you can't affect it from within the outer search. You could try to reverse the order (do the subsearch as main search and vice-versa) or use map to spawn subsearch for each outer search result separately but that's a very ugly an inefficient solution.

The best way would be if you could rephrase your search to not use subsearches.

blablabla · ‎12-08-2021

Hallo @PickleRick ,

thanks for your answer. Is there also no way to use a where-clause?

The outer search has a very low number of results. Maybe the inefficient solution would also be fine for me. How would a solution like this be designed?

PickleRick · ‎12-08-2021

https://docs.splunk.com/Documentation/Splunk/8.2.3/SearchReference/Map

But that's a very, very ugly solution. And very rarely it's the right way to go.

It's hard to be specific if we don't know what is your particular use case but there is probably a way to do it with where (to compare two timestamp fields of a result set and filter based on that information).

blablabla · ‎12-08-2021

Thank you, in this case I will avoid the map.

The usecase is actually quite simple. The first search (search1) gets me error messages. They have a time and an ID. I have another search (search2) which gives me informations about operations made. They have the same ID (because the ID is referenced to the device).

index="summary_index" search_name="search1" ...|fields _time ID ...
|join type=left left=L right=R usetime=true earlier=true where L.ID=R.ID [search index="summary_index" search_name="search2" |fields ...]

so basically I want to use the current where clause, in which the ID are matched, with something where there is a time criterium. I thought about something similar to (I know, the syntax is wrong, sorry for this)

join type=left left=L right=R usetime=true earlier=true where L.ID=R.ID and (L._time-R._time)<300

PickleRick · ‎12-08-2021

Unless you have many-to-many relationship, which would make it a bit more complicated, the possible solution is to use eventstats.

Just get all your events into one set of results either with a simple alternative (search conditions 1) OR (search conditions 2) or with append/multisearch. Then you might need some trickery to set distinct fields for results of search1 vs search2 (if they "natively" have separate sets of fields you're good to go on the fly, but otherwise you might need some conditional evaluation. And finally you just call eventstats and where.

For example (I assume that both errors and ops indexes have events that have ID as well as message field):

index=errors OR index=ops
| eval opsname=if(index="ops",message,null())
| eval message=if(index="ops",null(),message)
| eventstats values(opsname) as opsname values(_time) as opstime by ID
| search message=*
| where opstime-_time<=300

That's without using append or multisearch.

You can also rephrase it to use a subsearch to generate some subset of your data to analyze:

index=errors
| fields _time message ID
| append
  [ index=ops
    | rename message as opsmessage 
    | fields opsmessage ID _time ]
| eventstats values(opsname) as opsname values(_time) as opstime by ID
| search message=*
| where opstime-_time<=300

EDIT: sorry for ommiting that, you'll also have to rename _time to opstime otherwise the stats won't work properly. But you should get my drift.

ITWhisperer · ‎12-03-2021

Essentially, this cannot be done with a join - information passes right to left as the subsearch is executed before being joined to events pipeline.

blablabla · ‎12-03-2021

Thanks for the reply @ITWhisperer

Is there also no where clause that I could use in the join?

Or is there an alternative to the left join for enriching the data of search 1 with data of search 2 in case that the criteria are met (older than the event, but max 5 mins older and containing the same ID)?

ITWhisperer · ‎12-03-2021

You might be able to append the second search, sort by time and do some stats-style commands to gather the correlated events together. It is hard to say without a detailed view of the events you are dealing with.

johnhuang · ‎12-03-2021

You can use transactions to do this. However there are limits to transactions -- how large are the datasets and the number results you are expecting?

blablabla · ‎12-03-2021

Thanks for your answer @johnhuang

The search results represent an error case which is quite rare. So the dataset itself is large (10k/day), but the expected result of the outer search is <10/day. The search will run on daily basis with the timespan of 1d.

Passing a time restriction while using join

join

subsearch

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

Join the Conversation

Passing a time restriction while using join

join

subsearch

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability