In my system each transaction logged in file transaction.log as sequence request/response events
for example:
event_1 (request)
2017-02-14 11:37:50,103
requestID=ID-4eb3c96facf7-34169-1486981992587-0-133
type=request
event_2 (response)
2017-02-14 11:37:51,343
requestID=ID-4eb3c96facf7-34169-1486981992587-0-133
type=response
transactionID=1000
...
event_N-1 (request)
2017-02-14 11:36:12,444
requestID=ID-4eb3c96facf7-34169-1486981992587-0-4000
type=request
event_N (response)
2017-02-14 11:38:44,234
requestID=ID-4eb3c96facf7-34169-1486981992587-0-4000
type=response
transactionID=1000
So request linked with response by requestID, which is always new for next request/response pair
transactionID appears only in response events
so if I write
source=transaction.log | transaction transactionID
it will collects only response events
if I write
source=transaction.log | transaction requestID | transaction transactionID
it will produce correct result
but, if transaction stop after first pair of request/result, then field "duration" will be 0
I think, that better solution will be if I some how eval transactionID in request event
Somethink like this:
source=transaction.log | eval trx=[search source=transaction.log type=response requestID=$requestID | return $transactionID] | transaction trx
but at this moment I have no idea how to write correct SPL
If you're mostly looking for the duration, you can skip transaction
and proceed right to stats
:
...
| stats min(_time) as start max(_time) as end values(transactionID) as transactionID by requestID
| stats min(start) as start max(end) as end by transactionID
| eval duration = end - start
The first stats
will compute the start and end for each request and note down the transaction ID, the second stats
will compute the overall start and end for each transaction, and the eval will give you your duration.
Other interesting fields can be carried through the stats
calls using first()
, values()
, max()
, etc. depending on the field and use case.
If you're mostly looking for the duration, you can skip transaction
and proceed right to stats
:
...
| stats min(_time) as start max(_time) as end values(transactionID) as transactionID by requestID
| stats min(start) as start max(end) as end by transactionID
| eval duration = end - start
The first stats
will compute the start and end for each request and note down the transaction ID, the second stats
will compute the overall start and end for each transaction, and the eval will give you your duration.
Other interesting fields can be carried through the stats
calls using first()
, values()
, max()
, etc. depending on the field and use case.
The sparkline would need the timestamp, so you'd have to yield some _time
values in your earlier stats
calls.
stats
can do that too: values(field) as field
, avg(field) as field
, etc.
is it possible to use sparkline after that?
something like that
| stats min(_time) as start max(_time) as end values(transactionID) as transactionID by requestID
| stats min(start) as start max(end) as end by transactionID
| eval duration = end - start
| stats sparkline count by requestID
splunk shows just a zero line instead of real sparkline
stats
is usually very fast, transaction
does lots of extra things you may or may not need.
Ultimately it depends on your data, your overall use case, other processing, etc.
Did you run the stats
on your data? Its main purpose is to link the events by requestID and then by transactionID.
Yes, your example perfectly calculate duration.
But I need duration with other fields which is come from request/response events
transaction did it for me
is there some opportunity for filldown in such SPL
... | filldown transactionID | transaction transactionID
?
It works fine, but transaction events should come strictly one by one to filldown works correctly
No need for that, the stats command handles it. Since by their format your RequestIDs are likely to be unique in the whole dataset, and since there are only two events in each transaction, you wouldn't need the "transaction" command to figure out the start and endpoints.
The only caveat is coding around any transactions that are still open. So add this at the end -
|where duration !=0
You're better off going by a count
or even checking that all parts of the transaction are there, going by duration
is dangerous - depending on things, a complete transaction might have identical timestamps all round.
But I still have to evaluate transactionID in Request event, which appears only in Response event, which are linked by requestID
It's very interesting!
Will it works faster then transaction duration?