My logs output two consecutive lines in the case of a connection timeout:
... CONNECTION-x.x.x.x:y: connect() timeout
... [service_name] tearing down tcp connection [x.x.x.x.y]
Where x.x.x.x:y is the ip:port and service_name is some string. How do I put together a splunk query to basically end up with a table of the # of timeouts for each service_name?
In the "better late than never" category of answers (and I realize this answer might not have been available in previous versions of Splunk)...
It's unclear, from the original question, if the "ip:port" belongs to the service, or the client.
If it belongs to the service, then every timeout uniquely identifies the service, and all that needs to be done is to count the timeouts, and then map in the service name:
| makeresults | eval data="CONNECTION-1.1.1.1:1: connect() timeout,[service_with_2_timeouts] tearing down tcp connection [1.1.1.1.1],CONNECTION-1.1.1.2:2: connect() timeout,[service_with_1_timeout] tearing down tcp connection [1.1.1.2.2],[service_with_no_timeouts] tearing down tcp connection [1.1.1.3.3],CONNECTION-1.1.1.1:1: connect() timeout,[service_with_2_timeouts] tearing down tcp connection [1.1.1.1.1]"
| eval mvdata=split(data,",")
| mvexpand mvdata
``` Everything above this is to generate sample data ```
| eval is_timeout=if(like(mvdata,"%connect() timeout%"),1,0)
| rex field=mvdata "CONNECTION-(?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(?<port>\d+): connect\(\) timeout"
| rex field=mvdata "\[(?<service_name>[^\]]+)\] tearing down tcp connection \[(?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.(?<port>\d+)\]"
| stats first(service_name) as service_name, sum(is_timeout) as timeout_count by ip, port
If, on the other hand, the "ip:port" belong to the client accessing the service, this is a bit more complicated, with too many potential solutions depending on details not available here.