A particular public transit line is served by, say, N vehicles concurrently at any given time in the range [0,M] where M is the maximum number that ever serve the line concurrently. From the time the line starts, say 6am, vehicles come onto the line one by one until they reach M. Towards the end, say 9pm, vehicles start to leave the line one by one until N again reaches 0. Graphically, it could look like:
AAAAAAAAAAAAAAAAA
BBBBBBBBBBBBB
CCCCCCCCCCCCCCCC
Time progresses from left-to-right. Vehicle "A" is the first to go into service and, in the time before "B" does, N is 1. Once "B" goes into service, N increases to 2; and to 3 for "C". At some later time, "B" is the first to leave service, so N decreases to 2; and so on until N once again reaches 0.
The actual log data looks like:
2016-08-29 17:39:59 GMT vehicle_id="1007" speed=12 distance=192 stop="5240"
2016-08-29 17:40:26 GMT vehicle_id="1008" speed=16 distance=246 stop="4504"
2016-08-29 17:40:50 GMT vehicle_id="1015" speed=0 distance=125 stop="35184"
2016-08-29 17:41:06 GMT vehicle_id="1009" speed=19 distance=280 stop="4530"
2016-08-29 17:41:48 GMT vehicle_id="1008" speed=12 distance=66 stop="4534"
2016-08-29 17:41:51 GMT vehicle_id="1006" speed=14 distance=210 stop="5240"
2016-08-29 17:42:11 GMT vehicle_id="1009" speed=15 distance=272 stop="5174"
2016-08-29 17:42:09 GMT vehicle_id="1006" speed=0 distance=97 stop="5240"
2016-08-29 17:42:32 GMT vehicle_id="1009" speed=4 distance=105 stop="5174"
2016-08-29 17:42:21 GMT vehicle_id="1015" speed=0 distance=105 stop="35184"
Note that it's possible for a vehicle to be temporarily taken out of service mid-day. For example, if "B" were taken out of service temporarily, then it would look like:
BBBBB BBBB
And N would decrease by 1 during the gap. In order for a gap to be considered a gap, there would be a gap in time when there are no log entries from "B". How long the gap has to be in order to be considered a gap should ideally be tunable, e.g., 30 minutes.
I want to plot N (the number of vehicles concurrently on the line) over time from the start of service to the end per day. How can I do that?
What does the value for stop
represent? Would it be accurate to assume duration=distance/speed? Regardless, here's a thought...
base search | eval duration=distance/speed | eval starttime=_time | eval endtime=starttime+duration | eval per_hr=mvrange(starttime, endtime, "1h") | mvexpand per_hr | eval per_hr=strftime(per_hr, "%H") | stats dc(vehicle_id) as vehicles by per_hr
*UPDATED*
... | timechart span=G dc(vehicle_id) as vehicles
Not writing out queries at this stage, but I think the only way you could really get 'N' is to apply some knowledge to the query in relation to the stop value. By this I mean, unless it is reflected in some way in the data (not described) you would have to know when a vehicle should reasonably expect to have reported at it's subsequent stop. So the most recent timestamp is valid unless that timestamp exceeds timestamp + maximum journey time to stop+1. If it exceeds the value it must be assumed it has been removed from service.
What does the value for stop
represent? Would it be accurate to assume duration=distance/speed? Regardless, here's a thought...
base search | eval duration=distance/speed | eval starttime=_time | eval endtime=starttime+duration | eval per_hr=mvrange(starttime, endtime, "1h") | mvexpand per_hr | eval per_hr=strftime(per_hr, "%H") | stats dc(vehicle_id) as vehicles by per_hr
*UPDATED*
... | timechart span=G dc(vehicle_id) as vehicles
What does the value for
stop
represent?
stop
is an arbitrary identifier for a physical bus stop.
Would it be accurate to assume duration=distance/speed?
No. distance
is how many feet the vehicle is from its physically closest stop (that may either be the stop that it just past by or the stop it's approaching, whichever is closer); speed
is the vehicle's speed at the time the GPS transponder on board the vehicle transmitted its data to be logged. Hence duration
as you've written it is incorrect.
I included distance
, speed
, and stop
for completeness, but I can't see any way they could possibly be part of a correct answer.
I would think the only things that matter are the timestamp and the vehicle_id
. For example, if there is a log entry at T1 for vehicle "A" and also again at T2 for "A" and T2 – T1 <= G (where G is a tunable "gap" threshold), then we will assume that vehicle "A" was in service for the entire time between T1 and T2.
So if you set the span
for timechart
to be >=G
, a dc
would give you # vehicles on the road as (may need to filter our speed=0). In other words, why not
speed>0 | timechart span=G dc(vehicle_id) as vehicles
Vehicles should not be filtered based on their speed. If a vehicle is idling at either a bus stop (allowing passengers to board or alight) or at a red light, its speed will be 0, but it is still "in service" serving the line.
That aside, your timechart
yields results that look like I would expect them to look. Thanks!
Awesome! Remember to accept the answer to close out this question
If there'a any way for you to convert your timechart
comment into the actual answer (rather than a comment on the previous answer), that would benefit future readers.
I've updated the original answer