Splunk Search

Threaded-applog event coalescing and searching

New Member

I have a multithreaded application that writes out intermingled logs and having performance issues searching with transactions, and looking for a simpler way to coalesce events.

simplified mylog.log example:

10:00 thread=50 start 
10:01 thread=51 start 
10:02 thread=50 dosomething
10:03 thread=51 dosomethingelse
10:03 thread=50 error
10:04 thread=50 end
10:05 thread=51 end

if I search with sourcetype=mylogtype | transaction tid startswith="start" | search error, the result will be correct but it is very slow, especially compared with just "sourcetype=mylog error". I think real issue here is that events are single-line and only joined later, but I don't see how intermingled log files like this can have the lines combined into single events appropriately.

Tags (2)
0 Karma

Splunk Employee
Splunk Employee

Here you go:

  • sourcetype=mylogtype [ search error sourcetype=mylogtype | dedup thread | table thread ] | transaction thread startswith="start" | search error`

The subsearch between the square brackets retrieves all the 'thread' (i.e., tid) values and generates an OR expression of (thread=val1 OR thread=val2 ...). Given your sample data, after the subsearch runs, the search would look like this:

  • sourcetype=mylogtype (thread=50) | transaction thread startswith="start" | search error`

and the results would be one transaction:

  • 10:00 thread=50 start
  • 10:02 thread=50 dosomething
  • 10:03 thread=50 error
  • 10:04 thread=50 end

Two more notes:

  • the final "| search error" is required because it's possible, and actually likely, that thread values are reused else where, and not all thread=50 transactions have errors. Essentially the subsearch gets you a set of candidate events, then they are put into transactions, and finally those transactions that don't have 'error' are filtered out. This will be massively faster because you're only looking at the events that have the 'thread' values that also have errors.

  • In the 4.2 timeframe there will be a search command to do all this for you (e.g. "| searchtxn mythreaddef error OR warn")

Motivator

You might want to look into the map and localize commands here, especially of the typical number of error events tends to be low. That should let you find the errors first, then you can look for the related events and build your transactions.

One other thing that can help improve the speed of transaction searches is to tune the values of maxspan and maxpause (link). Setting these to lower values can speed up the search, at the cost of losing any events that are outside the maximum time intervals.

0 Karma