Gesture Architecture (2017–2019)
Designing Gestures for Data Editing describes a conceptual model for the specification of data editing gestures. In most interactive systems, basic event handling is implemented as a state machine (usually a very simple one) using procedural code. Touch support typically involves two levels of state, one to track low-level touches and one to track high-level swipes, pinches, etc. Our model diverges radically from the state machine model. It treats gesture handling as problem of pattern matching against interleaved event streams. Following this approach, one can specify a gesture using a small set of straightforward statements in a declarative language. For any given gesture, the statements describe (1) an event pattern, (2) a geometric aggregate, (3) a visual encoding/mapping, and (4) a data set transformation. Using these statements, one can (1) detect a partial or complete gesture, (2) extract its partial or complete shape, (3) render that shape as visual feedback, and (4) transform visually encoded data as a function of its geometric relationships to that shape.
While almost certainly much less expressive than procedural state machine event handling approaches in general, declarative gestures are very well suited to visualization design under the standard visualization pipeline model. To explore this hypothesis, we have developed a gesture architecture based on the above model, and are working to integrate into our existing Improvise visualization system. The structure of the architecture is shown below.
In the architecture, visualization designers specify gestures in the same declarative language that they use to specify data transformation, visual mapping, and rendering. A DataGesture consists of a data transformation expression for each of the four operations. Each View in a visualization can bind to any number of DataGestures defined by the designer. Individual expressions can be bundled in multiple DataGestures, and multiple Views can bind to any given DataGesture. As a result, gesture statements can be readily mixed and matched across multiple views. Improvise already provides the necessary support to edit such expressions and bind them to views dynamically.
On the other hand, on-the-fly gesture design calls for careful dynamic management of event streams and processing state for a potentially large number of bound gestures. A typical multiple view visualization with tens of views could potentially require instantiation of hundreds of DataGestures. (Note that writing code for more than a few, special-purpose gestures will rarely be necessary in practice. Improvise provides a feature to load a pre-defined library of expressions. This allows easy reuse and variation of the most common visualization gestures, such as zooming with the keyboard and lasso selection with the mouse, during design.) The architecture is structured to share both raw and filtered event streams across gestures for sake of time and space efficiency.
Each View delegates gesture processing to a GestureProxy that encapsulates all gesture processing for that View. The GestureProxy evaluates the four expressions of the DataGesture to map inputs into outputs along the gesture processing pipeline. Because Improvise is a live design system, each GestureProxy must reconstruct its pipeline in response to any changes to gesture expressions. We currently rebuild the entire pipeline, including raw and filtered stream dependencies. Although in theory this approach could wipe out an in-progress gesture that a user wishes to continue, it's unlikely to happen often in practice, and even if it did, it would be the user making the choice to transition to visualization design. pipeline, it xpression doesn't depend on individual events.
A pattern expression describes an EventPattern. An EventPattern is simply a sequence of EventSegments, each described in terms of a source of raw events (EventStream), a filter to apply to it (EventPredicate), and how many times that type of filtered event can or should recur in that portion of the pattern (EventQuantifier). EventStreams and EventQuantifiers are simple enumerations, allowing a designer to simply select them from a list. EventPredicates are simple boolean expressions defined in terms of an event's attributes (point on the screen, keyboard modifier attributes, etc.) For instance, the designer might define an EventSegment as (SwingMouseClick, if(ShiftDown), OnceOrMore) for a subsequence consisting of any number of shift-click events in the View.
The GestureProxy extracts a list of EventStream-EventPredicate pairs from the sequence of EventSegments. For each unique pair it creates an EventAccumulator that maintains a list of events from the stream that pass the predicate. From the EventAccumulators it creates an EventCombinator that interleaves the events from these EventAccumulators. As a result, the EventCombinator accumulates all and only the events that are pertinant to its gesture. Upon receiving each event, the EventCombinator applies the EventPattern to the accumulated event list, producing a sequence of GestureMatches. Any GestureMatches are passed down the pipeline in order. If a GestureMatch is complete, events up through its end are forgotten by the EventCombinator. If there is a most recent, incomplete GestureMatch, it will contain only match information up through the most recent event, and the EventCombinator will remember events from its start to match against as the gesture completes (or not).
We current implement pattern matching by encoding the pattern as a regular expression, using the regular expression capabilities built into Java. The EventCombinator's accumulated event list is encoded as a string to match against. The regular expression is encoded in a way that prohibits overlapping matches. Perhaps surprisingly, the architecture does not require that any pattern expression establish certain termination for a gesture. It is left up to the visualization designer to specify gestures that co-exist well as a useful and usable interaction suite in each View. It is also left to designer to incorporate terminating sentinals in final EventSegments where needed. There are a variety of ways to do this using combinations of predicates and quantifiers. Even so, very large accumulations are allowed, and the possibility of unbounded accumulation of event streams is problematic in principle. We currently rely on the user as human-in-the-loop to preclude accumulations of a size (likely at least thousands of events over tens of seconds) that might unduly overwhelm computational resources.
The remainder of the pipeline is substantially simpler. Each GestureMatch contains a list of the events involved in a match. Events are numbered by the matching EventSegment in which they happened (using regular expression match groups). The GestureProxy evaluates the geometry expression to map each GestureMatch into one or more GestureGeometries. Using the existing declarative language in Improvise, the expression can map individual and aggregated event attributes into common 2-D geometric objects such as points, line segments, circles, rectangles, polygons, simplicies, lines, rays, cones, etc. Operators like the conditional cascade can be used to generate appropriate geometries from the incoming events in an incomplete GestureMatch. For instance, a lasso matched by the pattern (Once(MousePressed), NoneOrMore(MouseDragged), Once(MouseReleased)) can be calculated as lasso = polyline(point@1 + points@3 + points@5).
The GestureProxy then evaluates the decoration expression to map each GestureGeometry into one or more GestureDecorations. These expressions are identical to the existing visual encoding expressions used in Improvise to map each View's data set into a set of view-specific Glyphs for drawing, except that they are drawn as additional graphics that "decorate" the View with its interactive state by visualizing the geometry of the gesture itself. For instance, the evolving lasso above could be drawn as decoration = oval(point@1) + ovals(points@2) + oval(point@3) + polyline(lasso) + linesegment(point@1, cascade(point@5, points@3.last) to show the vertices, interior, and closing edge. The decoration expression also describes how long to display the geometry of a gesture after it completes.
Finally, the GestureProxy evaluates the annotation expression to map each GestureGeometry into one or more data changes. These expressions can also induce changes to the interaction parameters of the visualization. The declarative language in Improvise supports creation and modification of records and fields in tabular data. The Museum Collection Example visualization uses these capabilities to support its data editing features. The visualization in our 2019 "Visualizing Oscillating Streams" paper (see bottom) similarly uses these capabilities to support annotation for the purposes of visual scenting. However, neither of these visualizations use geometries as inputs to their annotation expressions. Once we have completed integration of the gesture architecture into Improvise, we will proceed to extend the language library to include a wide range of geometry operators organized as a geometrie in support of Geometric Data Editing.