Impala logs contain data at the level of query, server, and pool (global, shared) level. The data is mixed together into the log entries for each query and the only way to extract it is combining ...
See more...
Impala logs contain data at the level of query, server, and pool (global, shared) level. The data is mixed together into the log entries for each query and the only way to extract it is combining rows from a single query into a transaction and extracting them from there. Using timechart for count( query_id ), last( reserved ), last( max_mem ) provides a nice, accurate time-sample of the state of that one server. The query_id counts can and reserved values can be combined to produce useful values for the pool about the number of running queries and total reserved memory allocated in the pool. The max_mem value is per-pool and will have a single value (e.g, 123456) across all hosts in the pool. It may change over time but all of the pool members will show the same value all the time. At that point the last( max_mem ) will be have identical values for each value in the pool -- the value is common to all pool members and will invariably have the same value for log transactions acquired from the various hosts. If there were some way to simply take values( max_mem-* ) from the timechart then I'd have a valid sample of the pool-level max_mem value. There may be a way to do this with an addcols, there may be a keyword I don't know about similar to "addcols fieldname=x x-*" that will summarize multiple fields into a single value; if so then that command would work. I cannot be the first person who is trying to summarize values from multiple levels of a hierarchy of data in a timeslice, I think?