Solved: Nested Searches Part 2

deeboh · ‎06-02-2010

There is a post regarding Nested searches which got me thinking about a problem i've been having. I have a very heterogenous collection of log data which spans data ranging from apache logs to tomcat logs to homegrown logs. The issue i'm having is how to pipe the output of subsearch into another search across a different sourcetype. Or would it be better to use transaction to auto-magically find the correlated data i'm looking for? Here's what i'm trying to do.

I "can" search my apache access logs to find errors we've thrown to our clients (error=blah) sourcetype=access. I want to then search our proprietary application logs to find details for same error (code=blah) sourcetype=events. Finally because there may be multiple sourcetype=events which match error code=blah I want to find which of the code=blah generated the error which was displayed in sourcetype=access (error=blah). Just to throw a further wrench into the works I want to further deep dive and search sourcetype=log4j for any other details I can use to get a picture of this events leading up to sourcetype=access (error=blah). I don't provide an error field for sourcetype=log4j because there isn't one:D.

I have three searches I use to solve the above problem but for the sake of my sanity i'm asking if I can nest these searches or somehow pipe the results into another search.

find errors i care about in the sourcetype=access -

sourcetype=access [ search sourcetype="access" Error="532" | fields client_ip_logformat_token,date_hour,date_minute,date_second ] | stats count by post_authid,date_hour,date_minute,date_second

manually take the post_authid date_hour, date_minute and (date_second +/- 1) second values and search event_log

sourcetype=event_log UserID=(numeric value) date_hour=(hr) date_minute=(min) date_second=(second-1)

(it would be cool to create a form searchable dashboard, but i'll work on that later.)

finally search the tomcat logs

sourcetype=log4j authid=(numeric value) date_hour=(hr) date_minute=(min) date_second=(second-1)

I've tried searches along these lines, but I have no clue what i'm looking at: (sourcetype=event_log OR sourcetype=access) 532 | eval code=Error | transaction code maxspan=1s

code is the field extraction for sourcetype=event_log - Error is the field extraction for sourcetype=access if figured "tie-ing" them together would be me the result I wanted... Nope.

I've been recently trying in vain to use pipe subsearch results into a new search across sourcetypes... No love.

Thanks in advance,

Deeboh

Lowell · ‎06-02-2010

Some random thoughts. In no particularly organized order.

Double check your eval field renaming approach (eval code=Error) I would try either rename Error as code or eval code=coalesce(Error,code) since you may only want to conditionally rename a field. (If you have the "Error" field in some events and "code" in other events)
Instead of trying to pull together events based on date_* fields, you really should look into using transactions instead, which have a building in time correlation effect.
You may find that using a dashboard with multiple panels would be a simpler approach to correlating this information manually. (Maybe not, just a thought)
If you want to jump into some really advanced XML, you can really customize the drill-down actions in Splunk 4.1. Take a look at the 3-level drill down provided in the search app: http://localhost:8000/en-US/app/search/indexing_volume (click on an index, then on an time in the timechart, and you see those events load. This is a really cool way to drill down into your log data, but it does take work to setup.)
Consider using workflow actions (based on eventtypes or field combinations) to let the user drill down from one search to another. See Create workflow actions in Splunk Web.
Combine all your sourcetypes into one big massive single search and use transaction to pull it all together. You hinted at this in your question. I made some notes in the following section...

For an all in one type search approach, you may find if useful to mentally split your search into separate phases. Each phase will contain at least one search command, but often more than one. It's generally helpful to develop your search starting with the first phase, and not moving on to the next until your sure the current phase is working properly. Trying to write this all in one shot will just cause lots of frustration. Hopefully when it all comes together you'll have a working search when your done, but it doesn't always work out that way. (Hints: Sometime I find it helpful to take a portion of the search an run it in a separate browser session and then copy it back into my full search once I have the desired change. Also, if you find some part of the search isn't working the way you expect, it's often easier to go build a simple test case with a simpler search, make sure your understand how/why works the way it does, and then go back to your massive search and adjust accordingly.)

Here's some suggested phases for this type of search:

Determine your base search criteria - This should include all host restrictions and all the sorucetypes your are looking for. Possibly leveraging eventtypes if you have them setup. Tags are also helpful in this step.
Common field tweaking - This is where you will have to do any field extractions, field renaming, or other search-based field interactions necessary. The idea here is to make sure that all of your events have enough common values to build a transaction. In your example this will probably be something like 'clientip' and possibly 'sessionid'.

Some examples of things that would need to be fixed in this phase would be: (1) different events have different fields names for the same value. (2) Some values appear in different cAsE, or you see "True" when other times you get a "1". (3) you have to conditionally create a field based on a search term... All of these can easily be accomplished with the eval command. (If you really want to get fancy, you can event compensate for a fixed clock-drift between your hosts in this phase)
build the transaction - Use the transaction command to correlate all your related events into transaction

post-transaction search - Use a search command to filter out non-error events. You could also put some kind of statistical analysis here if you want (e.g. stats or chart like command)

View solution in original post

Lowell · ‎06-02-2010

Some random thoughts. In no particularly organized order.

Double check your eval field renaming approach (eval code=Error) I would try either rename Error as code or eval code=coalesce(Error,code) since you may only want to conditionally rename a field. (If you have the "Error" field in some events and "code" in other events)
Instead of trying to pull together events based on date_* fields, you really should look into using transactions instead, which have a building in time correlation effect.
You may find that using a dashboard with multiple panels would be a simpler approach to correlating this information manually. (Maybe not, just a thought)
If you want to jump into some really advanced XML, you can really customize the drill-down actions in Splunk 4.1. Take a look at the 3-level drill down provided in the search app: http://localhost:8000/en-US/app/search/indexing_volume (click on an index, then on an time in the timechart, and you see those events load. This is a really cool way to drill down into your log data, but it does take work to setup.)
Consider using workflow actions (based on eventtypes or field combinations) to let the user drill down from one search to another. See Create workflow actions in Splunk Web.
Combine all your sourcetypes into one big massive single search and use transaction to pull it all together. You hinted at this in your question. I made some notes in the following section...

For an all in one type search approach, you may find if useful to mentally split your search into separate phases. Each phase will contain at least one search command, but often more than one. It's generally helpful to develop your search starting with the first phase, and not moving on to the next until your sure the current phase is working properly. Trying to write this all in one shot will just cause lots of frustration. Hopefully when it all comes together you'll have a working search when your done, but it doesn't always work out that way. (Hints: Sometime I find it helpful to take a portion of the search an run it in a separate browser session and then copy it back into my full search once I have the desired change. Also, if you find some part of the search isn't working the way you expect, it's often easier to go build a simple test case with a simpler search, make sure your understand how/why works the way it does, and then go back to your massive search and adjust accordingly.)

Here's some suggested phases for this type of search:

Determine your base search criteria - This should include all host restrictions and all the sorucetypes your are looking for. Possibly leveraging eventtypes if you have them setup. Tags are also helpful in this step.
Common field tweaking - This is where you will have to do any field extractions, field renaming, or other search-based field interactions necessary. The idea here is to make sure that all of your events have enough common values to build a transaction. In your example this will probably be something like 'clientip' and possibly 'sessionid'.

Some examples of things that would need to be fixed in this phase would be: (1) different events have different fields names for the same value. (2) Some values appear in different cAsE, or you see "True" when other times you get a "1". (3) you have to conditionally create a field based on a search term... All of these can easily be accomplished with the eval command. (If you really want to get fancy, you can event compensate for a fixed clock-drift between your hosts in this phase)
build the transaction - Use the transaction command to correlate all your related events into transaction

post-transaction search - Use a search command to filter out non-error events. You could also put some kind of statistical analysis here if you want (e.g. stats or chart like command)

deeboh · ‎01-22-2011

Hey Lowell my journey with the transaction command and my above dilemma has been a fun bit of investigation. eval and transaction are my friend. No longer to be feared. Thanks for your comments and suggestions. Thanks again

deeboh · ‎06-08-2010

Dang the indexing_volume search is sick! I'll play around with that when I get more free time. However i've been tinkering with the other suggestions and i'm making some headway. I need to read more documentation and examples of the transaction function to get the hand of it related to my results.

I'll keep you posted.

deeboh · ‎06-04-2010

Thanks Lowell. I'll chew on these suggestions for a few days. You correctly grok'd that I'm probably trying to bite off more than I can chew. However I'll continue to tinker until I find something workable. Thanks for the head start on ideas.

Nested Searches Part 2

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?

Splunk Education Goes to Washington | Splunk GovSummit 2024