Splunk Search

Nested searches

seanlon11
Path Finder

Is there a way to pass parameters from one search to another search?

Scenario: Our WebSphere servers will sometimes receive a java.lang.OutOfMemoryError, and probably 90% of the time it will heal itself. The way to tell if it healed itself or not is to wait a few minutes after the initial OutOfMemoryError and then search to see if the server re-started.

1st query: host="was0" java.lang.OutOfMemoryError

2nd query: host="specificHostFromFirstQuery" "Start Display Current Environment"

In the 2nd query, I'd like to know the specific host that failed from the 1st query so I can sleep for a few minutes and then run the 2nd query.

So far, I have started to do this by having the 1st query kick off a script on my Unix server that sleeps for a few minutes, and then runs the 2nd query for all servers and then I grep through the data to see if I need to send an email.

Any thoughts?

Thanks, Sean

1 Solution

sideview
SplunkTrust
SplunkTrust

I dont think you actually need a subsearch here. There's a lot of things that only subsearches can do, but I dont think your case here is one of them (and your case is a pretty cool example). It's a good practice to avoid subsearches wherever possible.

Here's one way to do it:

java.lang.OutOfMemoryError OR "Start Display Current Environment" | transaction host maxspan=5m startswith="OutOfMemoryError" | search java.lang.OutOfMemoryError NOT "Start Display Current Environment"

  • Dissecting this step by step, the first search command gets you a big interleaved search result set.
  • Then the transaction will roll up everything to where each row in the results is from a single host but the 'event' text is now a big multiline string of all the events, always starting with the OutOfMemoryError, but never spanning more than 5 minutes total.
  • Then at the end we simply filter out the hosts where it did actually restart.
  • NOTE: I actually have to include the java.lang.OutOfMemoryError term again in the final search. The reason is that if a transaction terminates because it reaches the maxspan, it'll actually start a new transaction right then and there, which will have random events in it but no java.lang.OutOfMemoryError. Filtering again by OutOfMemoryError just filters that noise away.

Hopefully that'll work.

As an aside, this approach opens up several other avenues: notably if you put this before the transaction,

eval errorTime = if(searchmatch("OutOfMemoryError"), _time, null) | eval restartTime = if (searchmatch("Start Display Current Environment"), _time, null)

and then after the transaction and search you put eval timeBeforeRestart = restartTime-errorTime, then you can graph the timeBeforeRestart by host which could be a useful trick.

View solution in original post

rayfoo
Path Finder

Do provide more info and samples of the log events that you want to look out for. Are there only two cases that can occur (only OutOfMemoryError, or OutOfMemoryError followed by "Start Display Current Environment")? Would the OutOfMemoryError repeat itself indefinitely before the restart? etc...

0 Karma

sideview
SplunkTrust
SplunkTrust

I dont think you actually need a subsearch here. There's a lot of things that only subsearches can do, but I dont think your case here is one of them (and your case is a pretty cool example). It's a good practice to avoid subsearches wherever possible.

Here's one way to do it:

java.lang.OutOfMemoryError OR "Start Display Current Environment" | transaction host maxspan=5m startswith="OutOfMemoryError" | search java.lang.OutOfMemoryError NOT "Start Display Current Environment"

  • Dissecting this step by step, the first search command gets you a big interleaved search result set.
  • Then the transaction will roll up everything to where each row in the results is from a single host but the 'event' text is now a big multiline string of all the events, always starting with the OutOfMemoryError, but never spanning more than 5 minutes total.
  • Then at the end we simply filter out the hosts where it did actually restart.
  • NOTE: I actually have to include the java.lang.OutOfMemoryError term again in the final search. The reason is that if a transaction terminates because it reaches the maxspan, it'll actually start a new transaction right then and there, which will have random events in it but no java.lang.OutOfMemoryError. Filtering again by OutOfMemoryError just filters that noise away.

Hopefully that'll work.

As an aside, this approach opens up several other avenues: notably if you put this before the transaction,

eval errorTime = if(searchmatch("OutOfMemoryError"), _time, null) | eval restartTime = if (searchmatch("Start Display Current Environment"), _time, null)

and then after the transaction and search you put eval timeBeforeRestart = restartTime-errorTime, then you can graph the timeBeforeRestart by host which could be a useful trick.

seanlon11
Path Finder

Thanks for the help. I will test and let you know how it goes.

0 Karma

sideview
SplunkTrust
SplunkTrust

Ha. that's a good point. I initially wasnt using maxspan, and was then searching for restartDuration<300 but the maxspan was simpler. I'll go back and delete them. Thanks.

0 Karma

rayfoo
Path Finder

Why eval errorTime and restartTime if they are never used subsequently? Just curious 🙂

0 Karma
Get Updates on the Splunk Community!

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Splunk Education Goes to Washington | Splunk GovSummit 2024

If you’re in the Washington, D.C. area, this is your opportunity to take your career and Splunk skills to the ...