Applications Area Working GroupF. Echtler
Internet-DraftRegensburg University
Intended status: InformationalNovember 7, 2012
Expires: May 11, 2013 

GISpL: Gestural Interface Specification Language


This document introduces GISpL, the Gestural Interface Specification Language. GISpL enables the unambiguous description of gestures used in human-computer interfaces. This includes gestures on touch and multi-touch screens, with digital pens, with hand-held controllers or in free air. A matching engine analyzes motion data produced by the input device(s) and triggers registered gestures.

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

This Internet-Draft will expire on May 11, 2013.

Copyright Notice

Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents ( in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

1.  Introduction

This document describes GISpL (Gestural Interface Specification Language), a formal language for describing human-computer interfaces which use gestures. The term "gesture" is used in a very wide sense here, meaning any motion of the user which can be captured by an input device.

1.1.  Motivation

As novel types of human-computer interfaces such as multi-touch screens, digital pens, hand-held controllers or even free-air gestures become more and more common, so does the number of special-purpose applications built to interact with these devices.

Using a dedicated formal language to describe the various types of gestural interaction which are possible with an application has advantages for various distinct groups:

End users of gestural applications benefit in two ways: the behavior of such applications becomes more consistent across different platforms and the users are given the option to modify the behavior to suit their needs.
When the behavior of a gestural interface is specified formally, it is easier for user interface researchers to compare different interfaces. Also, prototypical implementations of such interfaces are easier to modify and adapt during the design, development and evaluation cycle.
Since the formal specification enables a dedicated matching engine to perform the actual detection and matching of gestures, developers are spared tedious tasks such as constant reimplementation of basic gesture matching algorithms.

1.2.  Design Considerations

GISpL serves two purposes:

Moreover, GISpL should be usable across a very wide range of platforms, even unconventional ones. One important example is the Firefox web browser which has recently acquired the capability to deliver multi-touch events to web applications.

Consequently, the requirements regarding GISpL are:

Therefore, the decision was made to base GISpL on JSON (Crockford, D., “The application/json Media Type for JavaScript Object Notation (JSON),” July 2006.) [RFC4627]. Compared to XML, JSON gives a better balance between readability for humans and code size. It is supported in nearly all programming languages and platforms.

2.  Specification

GISpL consists of three core elements: regions, gestures and features.

For the full formal ABNF specifictation, please see the appendix. This section will use a relaxed syntax for easier readability, with rule names enclosed by <angle braces>. JSON-related markup (i.e., { } [ ] : , " ) will be reproduced verbatim.

2.1.  Definitions

In the context of GISpL, the terms input object and input event will be used. An input object is any physical object which is detectable by the input hardware, such as a mouse, a Wiimote or the user's hand. Input objects are classified into one of 32 categories as given in Appendix B (Input Event Types). Every input object is given a unique ID number according to the capabilities of the hardware. I.e., a uniquely identifiable tangible object always has the same ID, while anonymous touch points on an interactive surface will have increasing IDs so that each touch point is uniquely identifiable during the duration of the touch. An input event is a single spatial measurement regarding the position (and optionally, orientation, dimensions etc.) of an input object as captured by one or more of the sensors used.

2.2.  Region

Regions define spatial areas in which a certain set of gestures is valid and which capture motion data that falls within their boundaries. Regions are defined in one of two reference coordinate systems:

Regions are managed in an ordered list. Arriving input events are checked against this list, starting from the first element. If the position of the input event falls within the boundaries of the region and if the bit in the filter bitmask which corresponds to the category of the input event is set, the the input event is captured by the region. Captured events and their history are subsequently used to check whether one or more of the gestures attached to this region have been triggered.

point     = [ <number>, <number>, <number> ]
pointlist = [ null / <point> *( , <point> ) ]

regionflags = "poly" / "hull"

region = {

2.3.  Gesture

Gestures appear in two variants: either as gesture specifications (describing what motions the user has to execute for a certain effect) or as gesture events (describing a. the fact that a matching motion event just took place and b. the motion data which triggered the event).

The "duration" parameter specifies how far back the history of input events for the containing region should be considered for this gesture. The first value determines the starting point in the history, counting backwards from the present. For example, a value of [ 5.0 ] means "the last five seconds". The optional second value determines the end point. E.g. a value of [ 5.0, 3.0 ] means "between 3 and 5 seconds earlier". When values are not specified as floating point numbers, but as integers, the unit changes from seconds to "ticks", i.e. sensor readings. The special value of [ -1 ] means "the entire history". If no duration value is given, the default is [ 1 ], meaning "the last sensor reading".

Whether a gesture object is a gesture event can be determined from looking at the flags. Should they contain the string "result", then it is a gesture event and the features composing this gesture will contain valid data in their "result" array. Otherwise, the object is a gesture specification and the features will contain valid data in their "constraints" array.

When a gesture has the "oneshot" flag, then it can only be triggered once by a given set of input IDs. Repeated triggering is only possible when the set of IDs captured by the containing region changes.

When a gesture has the "sticky" flag, then once this gesture has been triggered for the first time, all input events with participating IDs will continue to be sent to the original region, even if they subsequently leave the original boundaries.

When a gesture has the "default" flag set, it is added to a pool of default gestures. When a gesture specification with an empty feature list is encountered, then the name is looked up in the default gesture pool and if a match is found, the feature list is copied from the default gesture. Otherwise, the empty gesture is ignored.

When a gesture has the "bubble" flag set, then the result gesture will be sent to all regions that have been crossed by participating input events, even if the gesture itself has also been flagged as "sticky".

gesturelist  = [ null/<gesture> *( , <gesture> ) ]
gestureflag  = "oneshot" / "sticky" / "default" / "bubble" / "result"
gestureflags = [ null/<gestureflag> (* , <gestureflag> ) ]

duration     = [ <number> (, <number> ) ]

gesture = {

2.4.  Feature

are the building blocks of gestures and are atomic mathematical properties of the raw motion data, such as the average motion vector or the diameter of the convex hull of all input object locations.

featurelist = [ null/<feature> *( , <feature> ) ]
featureitem = <point>/<number>
itemlist    = [ null/<featureitem>  *( , <featureitem>  ) ]

feature = {

A feature is classified by its type, a filter bitmask for input events as described in Section 2.2 (Region), a type-specific list of constraints and a type-specific list of results. Optionally, a duration value can also be specified which will override the duration value specified for the whole gesture. Depending on whether the containing gesture is a "result" gesture or not (see Section 2.3 (Gesture)), either the constraint list or the result list will be empty. All temporal relations (such as motion vector) are expressed relative to a time unit of one sensor frame, i.e. for a sensor setup running at 60 Hz, the temporal unit will be 16.66 ms.

Features are divided into two groups. The first one are single-match features, which will only generate a single result instance regardless of the number of input objects.

Feature TypeConstraint TypesResult TypesDescription
Motion <point>, <point> <point> Average motion vector, lower and upper limits per component
Rotation <number>, <number> <number> Relative rotation angle in rad, lower and upper limits
Scale <number>, <number> <number> Relative size change, lower and upper limits
Path <point> *(, <point>) <number> Match for template path
Count <number>, <number> <number> Number of input objects, lower and upper limits
Delay <number>, <number> <number> Number of frames since first object entered, lower and upper limits

Single-Match Features

 Table 1 

The second group of features are multi-match features, which will potentially generate one result instance for each input object (depending on the constraint values).

Feature TypeConstraint TypesResult TypesDescription
ObjectID <number>, <number> <number> Match range of unique IDs (e.g. for tangible objects)
ObjectParent <number>, <number> <number> Match range of unique parent IDs (e.g. for fingers belonging to specific user)
ObjectPosition <point>, <point> <point> Position, lower and upper limits per component
ObjectDimension <point>, <point>, <point>, <point>, <number>, <number> <point>, <point>, <number> Object dimensions, lower and upper limits per component (major axis, minor axis, size)
ObjectGroup <number>, <number>, <number> <number>, <point> Match group of input objects (min. count, max. count, max. radius). Result = count/centroid

Multi-Match Features

 Table 2 

3.  Runtime Behavior

This section details the behavior of the gesture matching engine.

The following steps are performed every time an input event has been received:

  1. ...
  2. ...

The following steps are performed every time after a full set of input events has been received:

  1. ...
  2. ...

4.  Examples

For illustration purposes, this section gives some usage examples of GISpL.

4.1.  Motion

The following GISpL example describes a simple drag gesture with an arbitrary pointing device. The <filters;> token is set to allow any type ID representing a pointing device or touch input event.


4.2.  Horizontal two-finger swipe

The following GISpL example describes a quick horizontal swipe with two fingers. The <filters> token contains a value of 2, representing a bitmask filtering for "generic touch" events.


4.3.  React to tap by specific user

The following GISpL example describes a tap event which can only be triggered by a specific user with numeric ID 123. For this to work, the input sensor hardware has to support user identification. An example is the DiamondTouch system.


4.4.  React to tagged object

The following GISpL example describes an event which is triggered by one of two tagged objects with IDs 320 or 321. The filters are set to consider only tagged objects for this feature.


4.5.  Double click

The following GISpL example describes a double click. The filters are set to consider only mouse pointers for this feature, while the duration values select specific ranges in the input history.

    { "type":"Count", "constraints":[0,0], "duration":[150,100], "result":[] },
    { "type":"Count", "constraints":[1,1], "duration":[100, 50], "result":[] },
    { "type":"Count", "constraints":[0,0], "duration":[ 50,  1], "result":[] },
    { "type":"Count", "constraints":[1,1], "duration":[      1], "result":[] }

5.  Security Considerations

Gesture events, if intercepted while in transit over the network, may disclose private data about the users' actions. Consequently, they should preferably be transmitted over secure channels such as private networks, VPNs or SSL-encrypted links.

6.  IANA Considerations

This document has no actions for IANA.

7.  References

7.1. Normative References

[RFC5234] Crocker, D. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” STD 68, RFC 5234, January 2008 (TXT).

7.2. Informative References

[RFC4627] Crockford, D., “The application/json Media Type for JavaScript Object Notation (JSON),” RFC 4627, July 2006 (TXT).

Appendix A.  Formal GISpL Specification

The following formal specification of GISpL is given in ABNF (Crocker, D. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” January 2008.) [RFC5234]. JSON-related rule definitions (e.g. <number>, <string> ...) are to be found in [RFC4627] (Crockford, D., “The application/json Media Type for JavaScript Object Notation (JSON),” July 2006.).

quote = quotation-mark

point = begin-array
        number value-separator number value-separator number

item  = point / number

itemlist    = begin-array item    *( value-separator item    ) end-array
pointlist   = begin-array point   *( value-separator point   ) end-array
gesturelist = begin-array gesture *( value-separator gesture ) end-array
featurelist = begin-array feature *( value-separator feature ) end-array

filters     = quote "filter" quote name-separator number value-separator
regionid    = quote "id" quote name-separator string value-separator
regionflags = quote "flags" quote name-separator
              ( ( quote "poly" quote ) / ( quote "hull" quote ) )

duration     = begin-array number ( value-separator number ) end-array

gesturename  = quote "name" quote name-separator string value-separator
gestureflag  = ( quote "oneshot" quote ) / ( quote "default" quote )
gestureflags = quote "flags" quote name-separator
               begin-array gestureflag
               *( value-separator gestureflag )
               end-array value-separator

featuretype  = quote "type" quote name-separator string value-separator
results      = quote "result" quote name-separator itemlist
constraints  = quote "constraints" quote name-separator
               itemlist value-separator

feature      = begin-object
               featuretype filters duration constraints results

gesture      = begin-object
               gesturename gestureflags duration featurelist

region       = begin-object
               regionid regionflags filters
               pointlist value-separator gesturelist
 Figure 1 

Appendix B.  Input Event Types

This list of type IDs is taken from TUIO 2.0 specification. Consequently, TUIO 2.0 events can be directly fed into the gesture recognition engine. A <filters> token contains an integer representing a bitmask with the corresponding bit set for those input types which are to be captured.

undefined or unknown pointer
default ID for an unknown finger (= right index finger ID)
fingers of the right hand starting with the index finger (index, middle, ring, little, thumb)
same sequence for the left hand
laser pointer
WiiMote or similar device
generic object
tagged object (uniquely identifiable)
right hand pointing
right hand open
right hand closed
left hand pointing
left hand open
left hand closed
right foot
left foot

Author's Address

  Florian Echtler
  Regensburg University
  Regensburg ...
Phone:  +49 .. .... 3842