I am working on a call centre solution where alerts are raised (dropped calls, email queues building up, average call length too long, etc.) and displayed in a panel on a common Splunk application to a set of team leaders. When the problem goes away, then the alert status goes 'green' (and it should disappear from the display panel).
I want a team leader to be able to say that they're taking responsibility for the alert, so that no-one else has to concern themselves with it, and for this information to be propagated to all users.
I would expect there to be 5-20 alerts active at any one time (in theory there could be a few hundred, but this would represent Armageddon). What approach would people take to designing this solution - is it practical (say) to hold the alert information in a transient CSV file, and to capture an owner's decision to take responsibility for fixing the problem from an individual screen? Could I use inputcsv and outputcsv to control this mechanism, and would the status be propagated consistently across the system?
... View more