I have a server with 16 threads executing stuff, and want to detect thread starvation (all threads busy, stopping all other activity). I have logs that get written at the end of every execution, with the execution time. With that I can build a table with threadNumber, start time, end time, and duration.
Using the Timeline visualization plugin, I can chart the activity of all threads as 16 lines that show many horizontal bars representing an execution, and see when that starvation happened (16 simultaneous horizontal bars stacking up), provided I make the time-frame small enough or else I'd get drowned in data. It works, but I can have systems with 32 or more threads, so it gets unreadable quickly.
I'd like to find a way to get the count of overlaping intervals and maybe chart it, so you'd see the number of busy threads slowly climbing up to 16 and staying there stuck for a few minutes until one of the 16 finally completes and starts picking up the backlog.
Any ideas welcome!
Have you looked at the
concurrency command? Concurrency measures the number of events which have spans that overlap with the start of each event.
Thanks for that command, did not know about it, even after some searching. I get approximate results (sometimes 18 busy threads out of 16. because of bucket size), but it gives you an idea that something is wrong when you are around the 16 mark.
Here is something that worked for me:
search ExecutionTime>2| eval _time = _time - round(ExecutionTime) | concurrency duration=ExecutionTime | timechart max(concurrency) span=1m