Splunk Search

Passing a time restriction while using join

blablabla
Path Finder

Hello,

I would like to ask, if it is possible to pass a time restriction to a subsearch of an join ? Unfortunately I did not find anything fitting in the forum.

In my specific case I would like to enrich the results of search1 with the last event of search2, in which the ID is equal and the timestamp of search2 is not more than 5 minutes before the timestamp of search1.

 

index="summary_index" search_name="search1" ...|fields _time ID ...
|join type=left left=L right=R usetime=true earlier=true where L.ID=R.ID [search index="summary_index" search_name="search2" |fields ...]

 

 Does someone have an idea?

Thanks in advance!

Labels (2)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

As @ITWhisperer already pointed out, the subsearch is evaluated first, so you can't affect it from within the outer search. You could try to reverse the order (do the subsearch as main search and vice-versa) or use map to spawn subsearch for each outer search result separately but that's a very ugly an inefficient solution.

The best way would be if you could rephrase your search to not use subsearches.

0 Karma

blablabla
Path Finder

Hallo @PickleRick ,

thanks for your answer. Is there also no way to use a where-clause?

The outer search has a very low number of results. Maybe the inefficient solution would also be fine for me. How would a solution like this be designed?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

https://docs.splunk.com/Documentation/Splunk/8.2.3/SearchReference/Map

But that's a very, very ugly solution. And very rarely it's the right way to go.

It's hard to be specific if we don't know what is your particular use case but there is probably a way to do it with where (to compare two timestamp fields of a result set and filter based on that information).

0 Karma

blablabla
Path Finder

Thank you, in this case I will avoid the map.

The usecase is actually quite simple. The first search (search1) gets me error messages. They have a time and an ID. I have another search (search2) which gives me informations about operations made. They have the same ID (because the ID is referenced to the device).

index="summary_index" search_name="search1" ...|fields _time ID ...
|join type=left left=L right=R usetime=true earlier=true where L.ID=R.ID [search index="summary_index" search_name="search2" |fields ...]

so basically I want to use the current where clause, in which the ID are matched, with something where there is a time criterium. I thought about something similar to (I know, the syntax is wrong, sorry for this)

join type=left left=L right=R usetime=true earlier=true where L.ID=R.ID and (L._time-R._time)<300

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Unless you have many-to-many relationship, which would make it a bit more complicated, the possible solution is to use eventstats.

Just get all your events into one set of results either with a simple alternative (search conditions 1) OR (search conditions 2) or with append/multisearch. Then you might need some trickery to set distinct fields for results of search1 vs search2 (if they "natively" have separate sets of fields you're good to go on the fly, but otherwise you might need some conditional evaluation. And finally you just call eventstats and where.

For example (I assume that both errors and ops indexes have events that have ID as well as message field):

index=errors OR index=ops
| eval opsname=if(index="ops",message,null())
| eval message=if(index="ops",null(),message)
| eventstats values(opsname) as opsname values(_time) as opstime by ID
| search message=*
| where opstime-_time<=300

That's without using append or multisearch.

You can also rephrase it to use a subsearch to generate some subset of your data to analyze:

index=errors
| fields _time message ID
| append
[ index=ops
| rename message as opsmessage
| fields opsmessage ID _time ]
| eventstats values(opsname) as opsname values(_time) as opstime by ID
| search message=*
| where opstime-_time<=300

EDIT: sorry for ommiting that, you'll also have to rename _time to opstime otherwise the stats won't work properly. But you should get my drift.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Essentially, this cannot be done with a join - information passes right to left as the subsearch is executed before being joined to events pipeline.

0 Karma

blablabla
Path Finder

Thanks for the reply @ITWhisperer 

Is there also no where clause that I could use in the join?

Or is there an alternative to the left join for enriching the data of search 1 with data of search 2 in case that the criteria are met (older than the event, but max 5 mins older and containing the same ID)?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You might be able to append the second search, sort by time and do some stats-style commands to gather the correlated events together. It is hard to say without a detailed view of the events you are dealing with.

0 Karma

johnhuang
Motivator

You can use transactions to do this. However there are limits to transactions -- how large are the datasets and the number results you are expecting?

0 Karma

blablabla
Path Finder

Thanks for your answer @johnhuang 

The search results represent an error case which is quite rare. So the dataset itself is large (10k/day), but the expected result of the outer search is <10/day. The search will run on daily basis with the timespan of 1d.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...