Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent Application
20020015064
Kind Code
A1
Robotham, John S. ; et al.
February 7, 2002
Gesture-based user interface to multi-level and multi-modal sets of bit-maps
Abstract
A method of navigating within a plurality of bit-maps through a client user interface, comprising the steps of displaying at least a portion of a first one of the bit-maps on the client user interface, receiving a gesture at the client user interface, and in response to the gesture, altering the display by substituting at least a portion of a different one of the bit-maps for at least a portion of the first bit-map.
Inventors:
Robotham; John S.
(Belmont, MA)
, Johnson; Charles Lee
(Newton, MA
)
Correspondence Name and Address:
Mirick, O'Connell 100 Front St.
Brian M. Dingman
Worcester
MA
01608
US
Series Code:
726163
Filed:
November 29, 2000
U.S. Current Class:
345/863;
345/854
U.S. Class at Publication:
345/863;
345/854
Intern'l Class:
G06F 003/00
Claims
What is claimed is:
1. A method of navigating within a plurality of bit-maps through a client user interface, comprising the steps of: displaying at least a portion of a first one of the bit-maps on the client user interface; receiving a gesture at the client user interface; and in response to the gesture, altering the display by substituting at least a portion of a different one of the bit-maps for at least a portion of the first bit-map.
2. The method of claim 1 wherein the bit-maps depict common subject matter at different resolutions.
3. The method of claim 1 wherein the gesture comprises a location gesture.
4. The method of claim 3 wherein the location gesture comprises a sequence of at least one client event.
5. The method of claim 3 wherein the gesture comprises at least one of a move and a hover.
6. The method of claim 5 wherein the user interface comprises a pointing device.
7. The method of claim 6 wherein the move gesture comprises a pointing device start location on the client interface and a pointing device end location on the client interface.
8. The method of claim 6 wherein the hover gesture comprises a hover start event followed by the pointing device remaining relatively still for at least a predetermined time interval.
9. The method of claim 1 wherein the gesture comprises a selection gesture.
10. The method of claim 9 wherein the gesture comprises at least one of a swipe, a drag, a pick, a tap, a double-tap, and a hold.
11. The method of claim 10 wherein the user interface comprises a pointing device.
12. The method of claim 11 wherein the swipe gesture comprises a pointing device movement of at least a certain distance within no more than a predetermined time.
13. The method of claim 12 wherein the swipe gesture further comprises a pointing device movement in a particular determined direction across the user interface.
14. The method of claim 12 wherein the swipe gesture further comprises a pointing device movement that begins within the client device viewport, and ends outside of the client device viewport.
15. The method of claim 11 wherein the drag gesture comprises a pointing device movement of at least a certain distance within no more than a predetermined time.
16. The method of claim 11 wherein the hold gesture comprises a hold start event followed by the pointing device remaining relatively still within a predetermined hold region for at least a predetermined hold time interval.
17. The method of claim 16 wherein the pick gesture comprises the pointing device continuing to remain relatively still within a predetermined hold region for at least a predetermined pick time interval beyond the hold time interval.
18. The method of claim 11 wherein the tap gesture comprises two sequential pointing device selection actions without substantial motion of the pointing device.
19. The method of claim 18 wherein the double tap gesture comprises four sequential pointing device selection actions, without substantial motion of the pointing device, within a predetermined double tap time.
20. The method of claim 1 wherein one bit-map includes a source visual content element rasterized into a bit-map representation through a first rasterizing mode and at least one other bit-map includes the source visual content element rasterized into a bit-map representation through a second rasterizing mode.
21. The method of claim 20 wherein the first and second rasterizing modes can differ from one another by at least one of a difference in a parameter of the rasterizing function, a difference in rasterizing algorithm, a difference in a parameter of a transcoding step, a difference in transcoding algorithm, and the insertion of at least one transcoding step before the rasterizing.
22. The method of claim 1 further including creating at least one correspondence map to map between corresponding parts of different bit-maps, to allow correspondences to be made between related areas of related bit-maps.
23. The method of claim 22 wherein a correspondence map is a source to source map that maps the correspondences from one source to another related source.
24. The method of claim 22 wherein a correspondence map is a source to raster map that maps the correspondences from a source element to a rasterized representation of that source element.
25. The method of claim 22 wherein a correspondence map is a raster to source map that maps the correspondences from a rasterized representation of a source element to that source element.
26. The method of claim 22 wherein a correspondence map is a raster to raster map that maps corresponding pixel regions within the raster representations.
27. The method of claim 20 wherein a first rasterizing mode is a rasterization and another rasterizing mode comprises a transcoding step.
28. The method of claim 27 further including an intermediate transcoding step to extract text-related aspects of the source visual content element and store them in a transcoded representation.
29. The method of claim 1 wherein one bit-map includes a source visual content element rasterized into a bit-map representation through one rasterizing mode, to accomplish an overview representation.
30. The method of claim 29 wherein another bit-map includes a text-related summary extraction of a source visual content element from the overview representation.
31. The method of claim 30 wherein the text-related summary extraction is displayed separately from the overview representation on the client user interface display.
32. The method of claim 31 wherein the text-related summary extraction is displayed over the portions of the overview representation containing the extracted source visual content element.
33. The method of claim 32 wherein the text-related summary extraction is displayed apart from the portions of the overview representation containing the extracted source visual content element.
34. The method of claim 1 wherein the method is accomplished in a client-server environment.
35. A system for navigating within a plurality of bit-maps comprising: a client user interface for entry of user interface events; a client display for displaying at least a portion of a first one of the bit-maps; and a client processor in communication with the client user interface and the client display, the client processor detecting a user interface event and determining a gesture type in response thereto, the client processor altering the display of the at least a portion of a first one of the bit-maps by substituting at least a portion of a different one of the bit maps for at least a portion of the first bit-map
Description
CROSS REFERENCE TO RELATED APPLICATION
[0001] This application claims priority of Provisional Application Ser. No. 60/223,251, filed on Aug. 7, 2000, and of Provisional application Ser. No. 60/229,641, filed on Aug. 31, 2000, and of a Provisional application entitled "Remote Browser Systems Using Server-Side Rendering", filed on Oct. 30, 2000, attorney docket number ZFR-001PR2.
BACKGROUND OF THE INVENTION
[0002] User Interface Actions
[0003] A client is a device with a processor and a bit-map display that supports a user interface. When a bit-map is displayed on the client's bit-map display, the client can support one or more user interface action(s) associated with the bit-map. These user interface actions provide input to the software function(s) generating the bit-map display. A user interface action can have an associated pixel location on the client display device, or it can be independent of any specific location.
[0004] A pointing device is commonly used to express the location of pixel(s) on the client display. Examples of pointing devices include a mouse, pen, touch-sensitive or pressure-sensitive surface, joystick, and the arrow buttons on a keyboard. Key presses on an alphanumeric keyboard (other than arrow keys) are typically location-independent, although an associated location of an input field may have been previously established.
[0005] For a given bit-map, a location-specific action can be a direct action or indirect action. A direct action is directly associated with a location on the given bit-map, while an indirect action is associated with a pixel region other than the given bit-map.
[0006] Direct actions allow the user to interact with the bit-map itself. For example, a typical paint program allows the user to "draw" on a bit-map and directly change the bit-map pixel values of the associated pixels. The bit-map can include rasterized representations of visual controls (or widgets) directly embedded into the given bit-map. In this case, direct actions can be associated with the embedded visual controls (or widgets). A hyperlink can be considered a special type of visual control, typically with a visual appearance of rasterized text or a bit-map image.
[0007] The software processing the direct action can provide visual feedback, either within or outside the given bit-map. For example, a cursor can be painted "over" the bit-map at the current location, or selected bit-map pixel intensities and/or colors can be changed to highlight the current location on the bit-map, or an (X,Y) location can be displayed either over or outside the bit-map.
[0008] Indirect actions are associated with a pixel location other than the bit-map itself. This includes interactions with areas of the client display not allocated for the bit-map (including window borders or other "decorations" around the given bit-map area), or interactions with pixel regions that may occlude some portion(s) of the bit-map but are not directly embedded within the bit-map. For example, menus, scroll bars, visual controls (or widgets) not embedded within the bit-map, tool palettes and pop-up dialog boxes are all commonly used to implement indirect actions.
[0009] The software processing indirect actions can provide visual feedback, such as displaying a cursor, highlighting a menu item, or visually simulating a user interface action on a visual control. Visual feedback for indirect actions can also include changes to the given bit-map.
[0010] Bit-Map Pixel Representations
[0011] Generally, bit-maps are displayed according to a single representation. While the bit-map might be scaled and/or clipped for display purposes, the underlying representation remains the same. The scaled and/or clipped versions are not maintained as a set, there are no data structures to maintain correspondences between and among the versions, and there are no user interface gestures to select and display a particular version within the set. Any scaling and/or clipping functions for display purposes are done dynamically and the intermediate results are usually not saved for future use.
[0012] When manipulating the scaling and/or clipping of a single bit-map pixel representations, the gestures are typically based on indirect actions rather than direct actions. These indirect actions include menu selections, pop-up dialog boxes with visual controls that select the desired level, scroll bars, or separate visual controls displayed within or outside the bit-map's window border.
[0013] Clipping is often done through indirect user actions on horizontal or vertical scroll bars, placed as user interface "decorations" around the bit-map's display area. Some user interfaces provide clipping through gestures to directly "drag" a bit-map around within a given display area (or window). Scaling is typically done by indirect actions to adjust a scaling factor as a percentage of the bit-map's pixel resolution. Sometimes visual controls are placed below the bit-map's display area which support indirect actions to toggle "zoom in" or "zoom out" functions.
[0014] Dynamic "zoom in" or "zoom out" over a selected portion of a bit-map has been provided by gestures that combine an indirect action to select a "magnifying glass" tool and a direct action of moving the tool over the bit-map. The "zoom factor" is often adjusted through the +/-key presses. Note that "zoom in" and "zoom out" are typically implemented through pixel replication or decimation on a per-pixel basis, rather than a filtered scaling that computes each resulting pixel from a surrounding neighborhood of source pixels.
[0015] Icons
[0016] Icons (also called "thumbnails") are commonly used to represent a software function and/or an element of visual content (such as a document). An icon can be generated as a scaled version of a rendered representation of the associated visual content element. A double-click is a commonly used as a direct action gesture on the icon, to launch software associated with the icon and view the associated visual content element (if any).
[0017] The generation of an icon from a rendered visual content element is typically done as a software service, with little or no user control over scaling and/or clipping functions. In some systems, the user can choose which page of a multi-page document to represent in the icon. User control over scaling, if available, is typically limited to choosing from a limited set of available icon pixel resolutions.
[0018] There is typically no direct action gesture to access the associated icon from the rendered visual content element. Often there is no user interface mechanism whatsoever to access the icon from a display of the rendered visual content element. When available, this access is typically done through one or more indirect actions, such as a menu pick or selecting a visual control displayed within an associated window border.
[0019] An icon/document pair is not a multi-level set of bit-map pixel representations, as defined herein. The icon/document pair is not maintained as a set. Once the icon is generated, it is typically maintained and stored separately from the associated visual content element. The icon will contain data to identify the associated visual content element, but the associated visual content element will not contain data to identify any or all associated icons. Often the icon is maintained as an independent visual content element, and multiple independent icons can be generated from a single associated visual content element.
[0020] Location correspondence information is not maintained within an icon/document pair. When a specific pixel location is selected within an icon, there is no information maintained to determine the corresponding pixel location(s) within the rendered visual content element. Since there is no information, there are also no gestures within the prior art to make such a location-specific selection.
[0021] Location correspondence information is not maintained from the rendered visual content element to the corresponding pixel location(s) on the icon, and there are no location-specific gestures from the rendered visual content element to pixel location(s) on a corresponding icon.
[0022] Background Summary
[0023] Bit-maps often have a pixel resolution greater than the pixel resolution of the allocated display area. Therefore, improved gestures to support scaling and/or clipping are highly desirable. There is a need for improved gestures that emphasize direct actions, require less user movement and/or effort, and are based on a more intuitive model that connects the gestures to corresponding software processing functions. This is particularly true for new classes of intelligent devices with limited screen display areas, such as personal digital assistant (PDA) devices and cellular telephones with bit-map displays.
[0024] Furthermore, the processing power to support scaling (particularly high-quality filtered scaling) can be greater than certain client devices can provide while still maintaining rapid responsiveness to user actions. Therefore, there is a need to provided pre-scaled representations that are stored as a set. Finally, there is a need for improved gestures that allow the user to work directly with a multi-level or multi-modal set of bit-maps, and easily move among and between the different representation levels or modes, taking advantage of the correspondences (such as pixel location correspondences) between levels or modes.
SUMMARY OF THE INVENTION
[0025] Overview of the Invention
[0026] The client displays one or more levels of the multi-level or multi-modal set of bit-map pixel representations on its bit-map display device. The multi-level set can be derived from any input bit-map pixel representation, including but not limited to images, rendered representations of visual content, and/or frame-buffers. The multi-modal set can be derived from different renderings of the same visual content element. One or more of these renderings can be transformed into multi-level set. Consequently, a multi-level set can be a member of a multi-modal set. The client then interprets certain user interface actions as gestures that control the navigation through and/or interaction with the multi-level or multi-modal set.
[0027] Client Device
[0028] A client device provides a user interface to a multi-level or multi-modal set of bit-map pixel representations. The client device can be a personal computer, hand-held device such as a PalmPilot or other personal digital assistant (PDA) device, cellular telephone with a bit-map display, or any other device or system with a processor, memory and bit-map display.
[0029] Displaying a Bit-Map Pixel Representation
[0030] A client device with a bit-map display device is capable of displaying one or more bit-map pixel representations. A bit-map pixel representation (or "bit-map") is an array of pixel values. A bit-map can represent any or all of the following:
[0031] a) one or more image(s),
[0032] b) rendered visual content, and/or
[0033] c) a frame-buffer captured from:
[0034] i) the output of an application, application service or system service,
[0035] ii) a "window", using a windowing sub-system or display manager,
[0036] iii) some portion (or all) of a computer "desktop"
[0037] An image is a bit-map data structure with a visual interpretation. An image is one type of visual content. Visual content is data and/or object(s) that can be rendered into one or more bit-map pixel representation(s). A frame-buffer is the bit-map output from one or more software function(s). A frame-buffer has a data structure specifically adapted for display on a bit-map display device. Visual content can be rendered into one or more image(s) or frame-buffer(s). Frame-buffers can be stored as images.
[0038] The terms "render" and "rendering" are used herein to mean the creation of a raster (bit-map pixel) representation from a source visual content element. If the source visual content element is already in a raster form, the rendering function can be the identity function (a 1:1
mapping) or include one or more pixel transform function(s) applied to the source raster. "Render" and "rendering" are used herein interchangeably with the terms "rasterize", "rasterizing", respectively.
[0039] The term "transcoding" is used herein to mean the source transformation of a source visual content element into a derived visual content element. The output of a transcoding function is a representation in a source format. A source format is an encoding of visual content other than a bit-map, although it may include an encapsulation of a bit-map or a reference to a bit-map. HTML (hypertext markup language) is an example of a source format. A source format requires a rasterizing (or rendering step) to be displayed as a fully rasterized (bit-map) representation.
[0040] Examples of visual content include electronic documents (such as word-processing documents), spreadsheets, Web pages, electronic forms, electronic mail ("e-mail"), database queries and results, drawings, presentations, images and sequences of images.
[0041] Each element of visual content ("visual content element") can have one or more constituent components, with each component having its own format and visual interpretation. For example, a Web page is often built from multiple components that are referenced in its HTML, XML or similar coding. Another example is a compound document with formatted text, an embedded spreadsheet and embedded images and graphics.
[0042] The constituent component(s) of a visual content element can be retrieved from a file system or database, or dynamically generated (computed as needed). When using object-oriented technologies to define object components and their behaviors, a constituent component can be (but is not required to be) an object. The data (and/or methods) for the constituent component(s) can be stored locally on one computer system or accessed from any number of other computer or file systems.
[0043] Rasterizing (or rendering) is a function for converting a visual content element from the data (and/or object) format(s) of its constituent component(s) into a bit-map pixel representation.
[0044] The display of rasterized visual content can be presented on an entire display screen, or within a "window" or "icon" that uses a sub-region of the display screen. Computer "desktops" are visual metaphors for accessing and controlling the computer system, typically using windows and icons to display rendered representations of multiple visual content elements.
[0045] Any bit-map generated for output to a display screen can be captured as a frame-buffer. A frame-buffer can represent any portion of rasterized visual content (including rendered images), or any portion of a window or computer desktop. A frame-buffer is a bit-map intended for display on a bit-map display device. When a frame-buffer is captured, it can be saved as an image or some other type of visual content element. A remote frame-buffer system transmits a frame-buffer from one computer system to another, for eventual display on a remote system's bit-map display device.
[0046] Gestures
[0047] A gesture is a semantic interpretation of one or more user interface actions. A gesture has an implied semantic meaning, which can be interpreted by the software receiving user input. The software determines how to interpret the user interface action(s) into associated gesture(s).
[0048] The interpretations can differ based on modifiers. A modifier can be set within a specific user interface action, by a previous user interface action, or by software. For example, a simultaneous button press can be used to modify the meaning of a movement or selection over some portion of the bit-map. The level of pressure on a pressure-sensitive surface can also be used as a modifier. The interpretation of a modifier can be set by user preference, typically through previous user interface action(s), or set by software.
[0049] When the gesture involves more than one user interface action, the sequencing of the actions can carry semantic information. This allows two or more actions in different sequences to be interpreted as different gestures. For example, a movement followed by a selection might be interpreted as different gesture from a selection followed by a movement.
[0050] When a gesture is composed of multiple actions, direct actions can be combined with indirect actions. For example, an indirect selection within an external menu, dialog box or visual control can be combined with a direct movement or selection on the bit-map.
[0051] While gestures are commonly used to express semantic intent within user interfaces, they can vary widely in their ease of expression and applicability. Gestures to express similar semantic intents can vary in the number and sequence of actions of actions required for a gesture, and the amount of effort and/or movement required on the part of the user. For example, a sequence of direct actions typically takes less movement and effort than a combination of direct and indirect actions.
[0052] Gestures can also vary in their appropriateness to the semantic intent being expressed. The appropriateness of a gesture depends on a shared mental model, between the user and the software designer, of the gesture and its meaning. Within a set of gestures, each new gesture is appropriate if it fits within the shared model or readily extends the model. When the shared mental model is easily understood, and the set of gestures readily fits within (and/or extends) this model, then the user interface is generally considered more "intuitive".
[0053] For example, a gesture that traces a check mark to signify "OK" and a gesture that traces an "X" to signify "NO" are based on familiar paper-and-pencil symbols. But reversing the meanings of these gestures would be very confusing (counter-intuitive) to most users. Another example is the use of platform-specific "style guide" conventions which define certain gestures and their meanings on a class of client devices. Sometimes it is appropriate to follow these conventions, other times not. Following a style guide makes the gestures more compatible with other user interfaces on the same platform, but breaking the style guide can often create a more intuitive user interface within a given application domain.
Multi-Level Set of Bit-Map Pixel Representations
[0054] Multi-level sets of bit-map pixel representations have been used primarily within the technical domain of image processing, and not for more general-purpose display of visual content (such as Web pages, word processing documents, spreadsheets or presentation graphics).
[0055] In a multi-level set, each version within the set is related to an input bit-map pixel representation, and represents a scaled (possibly 1:1) version of some portion of the input bit-map pixel representation. In a multi-level set:
[0056] a) the scaled and/or clipped versions are maintained as a set,
[0057] b) there are data structures to maintain correspondences between and among the versions, and
[0058] c) the correspondence data structures support mapping from pixel location(s) in one version to the corresponding pixel location(s) in at least one other version within the set.
[0059] Novel techniques for using a multi-level or multi-modal set of bit-map pixel representations are described in the co-pending Provisional patent application "Content Browsing Using Rasterized Representations", Provisional application Ser. No. 60/223,251, filed Aug. 7, 2000, and the related non-Provisional application filed on even date herewith (attorney docket no ZFR-001), entitled "Visual Content Browsing Using Rasterized Representations", and Provisional application Ser. No. 60/229,641, filed Aug. 31, 2000, all of which are incorporated herein by reference.
[0060] User interfaces for manipulating a multi-level set of bit-map representations have favored indirect actions over direct actions. Often, there are no specific user interface gestures that reflect the relationships between members of a set. For example, there are typically no specific gestures for switching between representation levels within a given set. Instead, each member of the set is treated as a separate bit-map and the indirect gestures for displaying each level are the same as for selecting any bit-map (within or outside a given set). These indirect gestures are typically provided through menu selections or external visual controls (e.g. tool bars) coupled with pop-up dialog boxes to select the bit-map to display.
[0061] Methods are disclosed for a gesture-based user interface to multi-level and multi-modal sets of bit-map pixel representations.
[0062] A client device provides a user interface to a multi-level or multi-modal set of bit-map pixel representations. In a multi-level set, an input bit-map pixel representation is transformed through one or more pixel transform operation(s) into a set of at least two derived bit-map pixel representations. Each level represents a scaled (possibly 1:1) view of the input bit-map pixel representation.
[0063] The representation levels in a multi-level set are ordered by the relative resolution of the derived bit-map pixel representation in comparison to the equivalent region of the input bit-map. The ordering is from lowest relative pixel resolution to highest. Applying different scaling factors (including 1:1) during the pixel transformation operation(s) creates the different relative pixel resolution levels.
[0064] In a multi-modal set, multiple rendering modes generate multiple bit-map representations of a source visual content element. The resulting bit-map representations are associated into a multi-modal set. A multi-modal set can include one or more multi-level representations.
[0065] The representations in a multi-modal set are grouped by rasterizing mode. For any given rasterizing mode, there can be multi-level representations that are internally ordered by relative pixel resolution. There can also be partial representations within a multi-modal or multi-level set, representing a partial subset of the source visual content element or original input bit-map .
[0066] The user interface gestures allow the user to control various aspects of navigating and/or browsing through the multi-level or multi-modal set of bit-maps. This includes gestures to control the process of:
[0067] a) panning across one or more bit-map(s) in the multi-level or multi-modal set,
[0068] b) scrolling across one or more bit-map(s) in the multi-level or multi-modal set,
[0069] c) moving to a location on one or more bit-map(s) in the multi-level or multi-modal set,
[0070] d) selecting a location on one or more bit-map(s) in the multi-level or multi-modal set,
[0071] e) selecting or switching from one representation level to another within the multi-level or multi-modal set of bit-maps, and/or
[0072] f) changing the input mode associated with one or more bit-map(s) in the multi-level or multi-modal set
[0073] Applications of the present invention for multi-level or multi-modal representations of various types of visual content including Web pages, e-mail attachments, electronic documents (including word processing documents and spreadsheets), electronic forms, database queries and results, drawings, presentations, images and sequences of images are presented. Applications for multi-level representations of frame buffers captured from user interfaces, windowing systems, and/or computer "desktops" are also presented.
[0074] The applications can be provided on a variety of devices including personal computers (PCs), handheld devices such as personal digital assistants (PDAs) like the PalmPilot, or cellular telephones with bit-map displays. A variety of user interface styles, including mouse/keyboard and pen-based user interface styles, can be supported. The present invention has particular advantages in pen-based handheld devices (including PDAs and cellular telephones with bit-map displays).
[0075] The present invention provides new methods to work more effectively and/or more conveniently with a multi-level or multi-modal set of bit map pixel representations. The user is no longer constrained to working with a single level or mode at a time. Neither is the user limited to prior methods for working with a multi-level or multi-modal set of bit-maps, where the gestures of the present invention are not available within the user interface.
[0076] Client Display Surfaces
[0077] The client's bit-map display allows the client to provide visual output, represented as two-dimensional bit-maps of pixel values. The client's bit-map display device is typically refreshed from its bit-map display memory. The pixel values stored within the display memory are logically arranged as a two-dimensional array of pixels, which are displayed on the bit-map display device. Client software can write directly into the bit-map display memory, or work cooperatively with a window subsystem (or display manager) that mediates how the bit-map display memory is allocated and used.
[0078] A display surface is an abstraction of a two-dimensional array of bit-map pixels. The client application, application service or system service writes its output pixel values into one or more client display surfaces.
[0079] The client displays the multi-level or multi-modal set of bit-maps using one or more client display surface(s). The client display function maps pixels from one or more representation level(s) or rasterized mode(s) into an allocated client display surface. The client display surface is then viewed on the client's bit-map display device, as further described below in the section "Client Viewports". The mapping to the display surface can include optional clipping and/or scaling. Clipping selects certain pixel region(s) of the representation level(s) or rasterized mode(s). Scaling transforms the selected pixels to a scaled bit-map pixel representation.
[0080] The client display function controls how the client display surface is generated. Along with the pixels mapped from the multi-level or multi-modal set of bit-maps, the client display function can add additional pixels to a client display surface. These additional pixels can represent window borders, rendered (and rasterized) visual controls, or other bit-maps being displayed within a given client display surface. These additional pixels can be adjacent to the pixels mapped from the multi-level or multi-modal set and/or generated as one or more overlay(s) over the pixels mapped from the multi-level or multi-modal set.
[0081] When a pixel location is given in terms of a client display surface, the client maps this back to the associated pixel(s) of a representation from the multi-level or multi-modal set being displayed. The client is responsible for maintaining this mapping, which is the inverse of the mapping used to generate the client display surface. If the pixel on the client display surface is not related to a bit-map pixel representation of the multi-level or multi-modal set (e.g. it represents a window border or additional visual control), then the mapping is null.
[0082] A single client display surface can include pixels mapped from multiple representation levels of a multi-level set. However, in an illustrative embodiment, each client display surface includes pixels mapped from only one representation of a multi-level or multi-modal set (along with any other additional pixels generated by the client display function). This makes it easier for the user to mentally associate a given client display surface with a single representation of a multi-level or multi-modal set.
[0083] Display Surface Attributes
[0084] The primary attributes of a display surface are its pixel resolution, pixel aspect ratio and pixel format. Pixel resolution can be expressed as the number of pixels in the horizontal and vertical dimensions. For example, a 640.times.480 bit-map is a rectangular bit-map with 640 pixels in the horizontal dimension and 480 pixels in the vertical dimension.
[0085] The pixel aspect ratio determines the relative density of pixels as drawn on the display surface in both the horizontal and vertical dimensions. Pixel aspect ratio is typically expressed as a ratio of horizontal density to vertical density. For example a 640.times.480
bit-map drawn with a 4:3 pixel aspect ratio will appear to be a square on the drawing surface, while the same bit-map drawn with a 1:1 pixel aspect ratio will appear to be a rectangle with a width to height ratio of 640:480 (or 4:3).
[0086] Pixel aspect ratio can also be expressed as the "dots per inch" (or similar measure) in both the horizontal and vertical dimensions. This provides the physical dimensions of a pixel on the display surface, while the ratio only describes the relative dimensions. Some rendering algorithms take into account the physical dimensions and pixel density of the display surface, others use the aspect ratio (with or without the physical dimensions), and still others render the same results regardless of the aspect ratio or physical dimensions.
[0087] Pixel format describes how each pixel is represented in the bit-map representation. This includes the number of bits per pixel, the tonal range represented by each pixel (bi-tonal, grayscale, or color), and the mapping of each pixel value into a bi-tonal, grayscale or color value. A typical bit-map pixel representation uses the same pixel format for each pixel in the bit-map, although it is possible to define a bit-map where the pixel format differs between individual pixels. The number of bits per pixel defines the maximum number of possible values for that pixel. For example, a 1-bit pixel can only express two values (0 or 1), a 2-bit pixel can express four different values, and so on.
[0088] The tonal range determines if the pixel values should be interpreted as bi-tonal values, grayscale values or color values. Bi-tonal has only two possible values, usually black or white. A grayscale tonal range typically defines black, white, and values of gray between. For example, a 2-bit grayscale pixel might define values for black, dark gray, light gray and white. A color tonal range can represent arbitrary colors within a defined color space. Some pixel formats define a direct mapping from the pixel value into a color value. For example, a 24-bit RGB color pixel may have three 8-bit components, each defining a red, green, and blue value. Other pixel formats define a color map, which uses the pixel value as an index into a table of color values.
[0089] The pixel format can also define other per-pixel data, such as an alpha value. The alpha value provides the "transparency" of the pixel, for combining this pixel value with another related pixel value. If the rendering function combines multiple bit-map pixel representations into a single bit-map pixel representation, the alpha values of each pixel can be used to determine the per-pixel blending. In rendering of three-dimensional data into a bit-map pixel representation, the pixel format may define a depth value per pixel. When other per-pixel data is required, this can also be defined in the pixel format.
[0090] Client Viewports
[0091] A display surface can be allocated directly within the bit-map display memory, or allocated outside the bit-map display memory and mapped into a client viewport.
[0092] A client viewport is an allocated pixel region within the bit-map display memory. A client viewport can be the entire display region, or a subset. Client viewports are a convenient way for a window subsystem (or display manager) to mediate how different software applications, application services and/or system services share the bit-map display device. The window subsystem (or display manager) can determine which client viewport(s) are visible, how each is mapped to the actual bit-map display device, and manage any overlapping between viewports.
[0093] Each display surface is painted into one or more client viewport(s). The painting function selects which portion(s) of the client display surface should be realized within each client viewport. The painting function provides a level of indirection between a client display surface and a client viewport, which is the basis for most windowing or display management schemes.
[0094] If the client display surface is allocated directly within the display memory, then the client display surface and the client viewport share the same data structure(s). In this case, the painting process is implicitly performed while writing the output pixels to the display surface.
[0095] The painting function (see FIG. 1) maps the display surface to the bit-map display device. In the simplest case, this is a direct 1:1
mapping. The mapping function can include these optional steps:
[0096] clipping the display surface to the assigned output area on the actual bit-map display device, and/or
[0097] a) performing simple pixel replication ("zoom") or pixel decimation ("shrink") operations,
[0098] b) translating the pixel format to the native pixel format of the bit-map display device, and/or
[0099] c) transfer of the pixels to the client viewport(s) allocated for viewing the display surface
[0100] The optional clipping function (1-2) selects one or more sub-region(s) (1-3) of the rendered display surface that correspond(s) to the "client viewport" (1-7): the assigned output area on the actual bit-map display device. Clipping is used when the pixel resolution of the rendered display surface is greater than the available pixels in the client viewport. Clipping is also used to manage overlapping windows in a windowing display environment.
[0101] Clipping is simply a selection function. Clipping does not resize the display surface (or any sub-region of the display surface), nor does it re-compute any pixels in the display surface. Resizing (other than shrink or zoom) or re-computing are considered bit-map conversion operations, and are therefore part of a rendering or pixel transform function and not the painting function.
[0102] Optional pixel zoom or shrink are simple pixel replication or pixel decimation operations (1-4), performed on one or more selected sub-region(s) of the clipped display surface. Zoom and shrink are done independently on each selected pixel. They do not require averaging among pixels or re-computing any pixels in the display surface, which are bit-map conversion operations that are not part of the painting function. In FIG. 1, there is no pixel zoom or shrink performed, so the clipped sub-region after pixel replication or decimation (1-5) is the same as the input clipped sub-region (1-3).
[0103] Optional pixel format translation (1-6) is a 1:1 mapping between the pixel format of each pixel in the display surface and the pixel format used by the actual bit-map display device. Pixel format translation is often done through a look-up table. Pixel format translation does not re-compute the pixel values of the display surface, although it may effectively re-map its tonal range. Any translation operation more complex than a simple 1:1 mapping of pixel formats should be considered a bit-map conversion operation, which is not part of the painting function.
[0104] The final optional step in the painting function is the pixel transfer (1-8) to the client viewport (1-9): the allocated pixel region within the display memory for the bit-map display device. If the display surface was directly allocated within that display memory, this step is not required. Pixel transfer is typically done through one or more bit block transfer ("bit blt") operation(s).
[0105] Note that the ordering of the optional steps in the painting function can be different than that presented in FIG. 1 and the above description. For example, the optional pixel translation might be done before optional clipping. Also note that a display surface can be painted into multiple client viewport(s), each with its own clipping, pixel format translation and/or pixel transfer parameters.
[0106] User Interface Actions and Events
[0107] User interface actions are typically reported to the client as "events". A user interface event is a software abstraction that represents the corresponding user interface action. An event informs the client that the action has occurred. The client can respond to the event, or ignore it. User interface events typically provide (or provide access to) event-related information. This can include information about the event source, along with any event-related information such as the pixel location associated with the event.
[0108] Along with user interface events, the client can process other types of events. For example, timer events can signal that a specified time interval has completed. Other software running on the client device, or communicating with the client device, can generate events. Events can be triggered by other events, or aggregated into semantically "higher-level" events. For example, a "mouse click" event is typically aggregated from two lower-level events: a mouse button press and mouse button release.
[0109] The client software will typically have one or more "event loops". An event loop is a set of software instructions that waits for events (or regularly tests for events), and then dispatches to "event handlers" for processing selected types of events. Events and event loops will be used as the framework for discussing the processing of user interface actions. However, any software mechanism that is capable of reporting user interface actions and responding to these actions can be used as an alternative to event-based processing.
[0110] There are two primary types of user interface events:
[0111] a) location events: events which define the location of a pointing device on a client display surface
[0112] b) selection events: events which define a selection action associated with a client display surface
[0113] In a location event, the pointing device is typically a mouse, pen, touch-pad or similar locating device. The location is typically an (X,Y) pixel location on the client display surface. This may be captured initially as an (X,Y) pixel location on the client viewport on the client's bit-map display device, which is then mapped to the to an (X,Y) pixel location on the associated client display surface. If the location on the client display surface is currently not being displayed within the client viewport, the client device may pan, scroll, tile or otherwise move the client viewport to include the selected location.
[0114] The client device may also define other user interface actions that generate location events. For example, moving a scroll bar outside the client viewport might generate a location event on the client display surface. Another example might be a client timer event that automatically generates a location event.
[0115] In a selection event, a selection action is associated with the client display surface. While many selection actions also have an explicit or implicit (X,Y) pixel location on the client display surface, this is not required of all selection events. If there is an (X,Y) pixel location, this may also have been initially an (X,Y) location on the client viewport which is mapped to the client display surface. Selection events are typically generated by user interface actions where the user has made a choice to start, continue or end a selection action. Examples include mouse-button state changes (mouse-button up/mouse-button down, or combined mouse click), pen state changes (pen up/pen down, or combined pen "tap"), or key state changes (key up/key down, or combined key press).
[0116] Movements of a pointing device can be reported as selection events, if there is an appropriate selection modifier during the movement. For example, a mouse move with a simultaneous mouse-button press can be reported as a selection event. Similarly, a pen movement with the pen down (e.g. applying pressure to a pressure-sensitive surface) can be reported as a selection event. These selection events have an associated pointing device location. Each client implementation determines which selection modifiers are associated with selection events, and how to report the selection modifiers as data elements within an event data structure.
[0117] The client device may also define other user interface actions that generate selection events. For example, clicking within a certain sub-region of a separate client viewport might generate a selection event on a client display surface. Another example might be a client timer event that automatically generates a selection event.
[0118] Multi-Level Set of Bit-Map Pixel Representations
[0119] A input bit-map pixel representation is transformed through one or more pixel transform operation(s) into a multi-level set of at least two derived bit-map pixel representations. Each representation level represents a scaled (possibly 1:1) view of the input bit-map pixel representation. Methods for generating a multi-level set of bit-map pixel representations are further described in the co-pending patent application "Visual Content Browsing Using Rasterized Representations" (Attorney Docket No. ZFR-001), filed Nov. 29, 2000, incorporated herein by reference.
[0120] The representation levels are ordered by the relative resolution of the derived bit-map pixel representation in comparison to the equivalent region of the input bit-map. The ordering is from lowest relative pixel resolution to highest. Applying different scaling factors (including 1:1) during the pixel transformation operation(s) creates the different relative pixel resolution levels.
[0121] Each representation level provides a scaled (possibly 1:1) view of at least one common selected region of the input bit-map pixel representation. The common selected region can be the entire input bit-map pixel representation, or one or more sub-region(s) of the input bit-map. The scaling factor applied to the common selected region is the one used to order the levels by relative pixel resolution. In an illustrative embodiment, each level has a different scaling factor, and therefore a different relative pixel resolution.
[0122] Also in an illustrative embodiment, a scaling factor is consistently applied within a given level of a multi-level set. All views of the input bit-map within a given level, whether within or outside the common selected region, use the same scaling factor. This makes it easier for the user to perceive the intended proportions and overall layout of the input bit-map, as displayed within a given level.
[0123] In an illustrative embodiment, the view of the common selected region is at least 1/2 of each representation level in both the vertical and horizontal pixel dimensions. This degree of commonality allows the user to more easily maintain a mental image of the relationships between the different levels of the multi-level set. If the representation level is a partial representation (a pixel sub-region of an equivalent full representation), then this commonality requirement is instead applied to the equivalent full representation.
[0124] The multi-level set consists of at least two bit-map pixel representations derived from the input bit-map pixel representation. One of these derived representations can be the input bit-map, or a copy of the input bit-map.
[0125] The representation levels are:
[0126] 1) an overview representation: providing a reduced scaled view of the common selected region at a pixel resolution that provides at least an iconic view (at least 10.times.10 pixels) of the common selected region, but at no more than one-half the pixel resolution of the common selected region in at least one dimension (the overview representation is between 96.times.96 and 320.times.320 pixels in an illustrative embodiment),
[0127] 2) an optional intermediate representation: providing a scaled (possibly 1:1) view of the common selected region at a pixel resolution suitable for viewing and/or navigating the major viewable elements of the common selected region, and of a higher pixel resolution in at least one dimension from the view of the common selected region in the overview representation,
[0128] 3) a detail representation: providing a scaled (possibly 1:1) view of the common selected region at a pixel resolution that presents most of the viewable features and elements of the common selected region, at a higher resolution in at least one dimension from the overview representation and (if an intermediate representation is present) at a higher resolution in at least one dimension from the view of the common selected region in the intermediate representation (between 640.times.480
and 1620.times.1280 pixels in an illustrative embodiment)
[0129] While the intermediate representation is entirely optional, it is also possible within the present invention to have multiple levels of intermediate representation. Each of these optional levels presents a scaled (possibly 1:1) view of the common selected region at a pixel resolution that is higher in at least one dimension from the preceding intermediate representation.
[0130] If there are multiple intermediate representation levels, the lowest level of intermediate representation has a view of the common selected region at a higher pixel resolution (in at least one dimension) from the view of the common selected region in the overview representation. Also, the highest level of intermediate representation has a view of the common selected region at a lower pixel resolution (in at least one dimension) from the view of the common selected region in the detail representation.
[0131] A derived representation can be based on a clipped version of the input bit-map pixel representation. Clipping can be used to remove:
[0132] a) unneeded region(s) of the input bit-map pixel representation (such as "white space"),
[0133] b) unwanted region(s) (such as advertising banners), and/or
[0134] c) region(s) that are considered less important (such as the lower or lower right portion of a Web page)
[0135] Different levels of the multi-level set can apply different clipping algorithms, provided that at least a portion of a common selected region is included in all representation levels. In an illustrative embodiment, a clipped region used for the overview representation is the same as, or a proper subset of, the corresponding region used for the detail representation. Also in an illustrative embodiment, a similar rule is applied between the overview representation and any optional intermediate representation(s), and between any optional intermediate representation(s) and the detail representations. This reduces the complexity of mapping (mentally or computationally) between representation levels. When a given level is a partial representation, this clipping rule is applied to the equivalent full representation.
[0136] The derived representations can differ in their pixel aspect ratios, tonal ranges, and/or pixel formats. For example, the overview representation might have a pixel aspect ratio matched to the client viewport while the detail representation has a pixel aspect ratio closer to the original input bit-map. In an illustrative embodiment, any and all pixel scaling operations applied at any given level use the same scaling factor.
[0137] FIG. 2 shows an example of an input bit-map pixel representation (2-1) for a Web page and a set of derived representations: a sample overview representation (2-2), a sample intermediate representation (2-3), and a sample detail representation (2-4). FIG. 3 is an example of a rendered spreadsheet, with an input bit-map pixel representation (3-1), a sample overview representation (3-2) and a sample detail representation (3-3).
[0138] FIG. 4 shows an example of displaying two levels of transformed representations on a client device. These are taken from a PalmPilot emulator that runs on a personal computer, which emulates how the representations would appear on an actual PalmPilot device. FIG. 4 shows a sample overview representation (4-1) and a clipped region of a sample detail representation (4-2), as displayed within an allocated client viewport.
[0139] If a representation does not fit within the client viewport of the client device's display, the client paints a sub-region of the associated client display surface through a clipping operation. In this case, the client display surface can be treated as a set of tiled images. The tiles are constructed such that each tile fits into the client viewport of the display device, and the client device switches between tiles or scrolls across adjacent tiles based on user input.
[0140] In an illustrative embodiment, the overview representation should be displayable in its entirety within an allocated client viewport of 140.times.140 pixels or greater (and thus is a single tile). Also in an illustrative embodiment, an optional lowest level intermediate representation should have no more than four tiles in each dimension within an allocated client viewport of 140.times.140 pixels or greater.
[0141] Multi-Modal Set of Bit-Map Pixel Representations
[0142] A source visual content element is rasterized into two or more bit-map representations through at least two different rasterizing modes. One rasterizing mode can differ from another through any or all of the following:
[0143] 1. differences in the parameter(s) to the rasterizing (or rendering) function,
[0144] 2. differences in rasterizing (or rendering) algorithms,
[0145] 3. insertion of one or more transcoding step(s) before the rasterizing (or rendering function),
[0146] 4. differences in the parameter(s) used in a transcoding step, and/or
[0147] 5. differences in transcoding algorithm(s) used in a transcoding step.
[0148] For example, the expected or preferred horizontal dimension of the client viewport can be a parameter to a rasterizing function. One rasterizing mode can generate a display surface optimized for a display viewport with 1024 pixels in the horizontal dimension, while another rasterizing mode generates a display surface that is optimized for a display viewport with 160 pixels in the horizontal dimension. Another example is a parameter that controls the point size of a text component. The text component can be rasterized in one mode with 10 point Times Roman type, and in another mode with 12 point Arial type.
[0149] Different rasterizing (or rendering) algorithms can produce different bit-map pixel representations, often with different layouts. For example, one rendering mode can use a rasterizing algorithm that intermixes the layout of text and non-text components (such as images or tables), like a typical layout of a Web page on a PC. Another mode can use a rasterizing algorithm where each text component is visually separated in the layout from non-text components (such as images or tables).
[0150] Two different rendering algorithms can generate different representations of the same visual component. For example, one can be capable of generating a fully graphical representation of an HTML table while the other renders a simplified text-oriented representation of the same table. Some rendering algorithms are not capable of rasterizing certain types of visual components, and will either not include them in the rasterized representation or include some type of substitute place-holder representation. These algorithms produce a different rasterized representation from an algorithm that can fully render the same visual components.
[0151] Transcoding is a function that converts a visual content element from one source format to another, before a rasterizing (or rendering) function is performed. The transcoding function can include filtering or extractive steps, where certain types of encoded content are converted, transformed or removed from the derived source representation. Transcoding can also perform a complete translation from one source encoding format to another. Transcoding can be loss-less (all of the visually significant encoding and data are preserved) or lossy (some portions are not preserved).
[0152] For example, an HTML document can be rendered by an HTML rendering function in one rasterizing mode. This HTML source can also be transcoded to a WML (Wireless Markup Language) format and then rasterized by a WML rendering function in a second rasterizing mode. The two different representations can be associated as a multi-modal set, based on their relationship to the original HTML-encoded visual content element.
[0153] Transcoding can also be used to generate a different version of the source visual content element using the same encoding format as the original. For example, an HTML document can be transcoded into another HTML document, while changing, translating or removing certain encoded data. For example, references to unwanted or objectionable content can be removed, automatic language translation can be applied to text components, or layout directives can be removed or changed to other layout directives.
[0154] FIG. 17 illustrates an example of a multi-modal set of bit-map pixel representations. In this example, the source visual content element (17-1) is:
[0155] a) rasterized (17-2) to a multi-level set (17-3),
[0156] b) transcoded (17-4) to a derived source format (17-5) which is then rasterized (17-6) to a bit-map representation (17-7), and
[0157] c) rasterized (17-8) using a different rasterizing algorithm to produce an alternative bit-map representation (17-9).
[0158] Correspondence Maps for Multi-Level and Multi-Modal Sets
[0159] In a multi-level or multi-modal set, a correspondence map can be created to map between corresponding parts of the different representations. This correspondence map assists in providing functions that require mappings between representations, such as supporting a user interface that selects or switches between the different representations. For example, the correspondence map can allow the user to select a pixel region on one rendered representation and then view the corresponding region rendered from a different representation. A reverse mapping (from the second representation to the first) can also be generated.
[0160] There are four types of possible correspondence maps, based on the type of each representation being mapped. A representation can be a "source" or a "raster". A source representation encodes the visual content in a form suitable for eventual rasterizing (or rendering). An HTML document, or Microsoft Word document, is an example of a source representation. A transcoding operation takes a source representation as input and generates a transcoded source representation as output.
[0161] A "raster" representation is a bit-map pixel representation of rasterized (or rendered) visual content. A raster can be the bit-map pixel output of a rasterizing (or rendering) process, but it can be any bit-map pixel representation (such as an image or frame buffer).
[0162] The four types of correspondence maps are:
[0163] a) Source-to-source: This maps the correspondences from one source to another related source. These correspondences can be positional (corresponding relative positions within the two sources) and/or structural (corresponding structural elements within the two sources). Source-to-source maps are typically used to map between a transcoded visual content element and its original source.
[0164] b) Source-to-raster: This maps the correspondences from a source element to a rendered representation of that source. Each entry in the map provides a positional and/or structural reference to the source representation, along with a corresponding pixel region within the raster representation. A source-to-raster correspondence map can be generated as a by-product of a rendering function. Some rendering functions provide programmatic interfaces that provide source-to-raster or raster-to-source mappings.
[0165] c) Raster-to-source: This is the inverse of a source-to-raster mapping.
[0166] d) Raster-to-raster: This is a mapping between corresponding pixel regions within two related raster representations. If the corresponding pixel regions are related through one or more transform operations (such as scaling), then these transform operations can be referenced within the correspondence map.
[0167] A correspondence map allows correspondences to be made between related areas of different (but related) representations. Correspondence maps support functions such as switching or selecting between related representations, based on a "region of interest" selected within one representation. Correspondence maps are also used to process user input gestures, when a pixel location on one raster representation must be related to a different (but related) raster or source representation.
[0168] Some source formats define a formal data representation of their contents, including layout directives encoded within the contents. Source-to-source, source-to-raster or raster-to source correspondence maps can be statically or dynamically derived through appropriate software interfaces to such a data representation.
[0169] For example, the HTML specification defines a Document Object Model (DOM). Both Microsoft's Internet Explorer and Netscape's Navigator software products support their own variants of a DOM and provide software interfaces to the DOM. Internet Explorer also provides interfaces to directly map between a rendered (rasterized) representation of a visual content element and the DOM. These types of interfaces can be used instead of, or in addition to, techniques that map raster-to-source (or source-to-raster) correspondences through software interfaces that simulate user interface actions on a rasterized (or rendered) proxy display surface.
[0170] FIG. 18 illustrates examples of correspondence mapping. An entry in a raster-to-raster map is shown as 18-1, between on overview representation and detail representation of a multi-level set. An entry in a raster-to-source map (18-2) maps the detail representation to the corresponding segment of the source visual content element. This, in turn, is mapped by an entry in a source-to-raster map (18-3) to a text-related rendering of the visual content element.
[0171] It is possible to "chain" related correspondence maps. For example, consider a source visual content element that is rendered first to one raster representation and then transcoded to a second source representation. When the transcoded source representation is rendered, the rendering process can generate its own correspondence map. In this example, chaining can be used to determine correspondences (if any) between the first raster representation and the second (transcoded) raster representation. The second raster-to-source map can be chained to the transcoded source-to-source map, which in turn can be chained to the first source-to-raster map.
[0172] Correspondence maps have an implicit "resolution", related to the density of available mapping data. At a high "resolution", there are a relatively high number of available mappings. A low "resolution" correspondence map has relatively fewer available mappings. The "resolution" determines the accuracy of the mapping process between a given place within one representation and the corresponding place within a different representation.
[0173] The density of the mappings can vary across different parts of the different representations, which results in variable "resolution" of correspondence mappings. The client (or server) can interpolate between entries in the correspondence map, in order to improve the perceived "resolution" of the mapping process. A technique such as location sampling (as described in the section "Server-Side Location Sampling") can be used to initially populate or increase the density of a correspondence map.
[0174] There can be some areas of a given representation with no direct correspondence to a different representation. This occurs, for example, when an intermediate transcoding removes some of the visual content data from the transcoded representation. These areas of no direct correspondence can be either handled through an interpolation function, or treated explicitly as areas with no correspondence.
[0175] In a client/server configuration of the present invention, correspondence map(s) can be transmitted from the server to the client as required. This allows the client to directly handle mapping functions, such as user requests that select or switch between representations. The correspondence map(s) can include reverse mappings, if appropriate, and can be encoded for efficient transmittal to the client.
[0176] To improve perceived user responsiveness, a correspondence map can be separated into multiple segments, based on sections of the mapped content and/or multiple "resolution" levels. When segmenting into multiple "resolution" levels, a lower "resolution" map is created and is then augmented by segments that provide additional "resolution" levels. Segmenting can be done such that a smaller map is first generated and/or transmitted to the client. Subsequent segments of the map can be generated and/or transmitted later, or not at all, based on the relative priority of each segment using factors such as current or historical usage patterns, client requests and/or user preferences.
[0177] Multi-Modal Combination of Rasterizing and Text-Related Transcoding
[0178] In an illustrative embodiment of the present invention, rasterizing of a visual content element is combined with a transcoding step, in order to provide an alternative representation of the text-related content within a visual content element. This combination creates a multi-modal set, where a text-related representation is used either instead of, or in addition to, the initial rasterized representation.
[0179] Since text is often an important part of a visual content element, this combination allows text-related aspects to be viewed, navigated and manipulated separately through a client viewport and/or user interface optimized for text. The multi-modal combination of rasterizing and transcoding preserves, and takes advantage of, the correspondences between the text and the overall design and layout of the content (including the relationships between the text and non-text aspects of the visual content).
[0180] FIG. 19 shows an example of combining rasterizing and text-related transcoding. A rasterized overview representation of a Web page is shown in 19-1. A rasterized detail representation of the same Web page is shown in 19-2. Note that the detail representation is presented within a client viewport, and the user can pan or scroll within the viewport to see the entire detail representation. A text-related version of the same Web page is shown in 19-3, this time with word-wrapping and a scroll bar for scrolling through the text.
[0181] When combining rasterizing and text-related transcoding, an intermediate transcoding step can extract the text-related aspects of the visual content and store these in a transcoded representation. The transcoded text-related content can then be rasterized (or rendered). If a server performs the transcoding function and a client performs the rasterizing (or rendering) of the transcoded content, then the transcoded content can be transmitted to the client for eventual rasterizing (or rendering) by the client.
[0182] The text-related aspects of the visual content can include the relevant text and certain attributes related to the text. Text-related attributes can include appearance attributes (such as bold, italic and/or text sizing), structural attributes (such as "new paragraph" or "heading" indicators), and/or associated hyper-links (such as HTML "anchor" tags). Text-related formatting, such as lists and tables (e.g. HTML tables) can also be included in the text-related transcoding. The transcoded text-related content can be represented in any suitable format including text strings, Microsoft Rich Text Format (RTF), HTML, Compact HTML, XHTML Basic, or Wireless Markup Language (WML).
[0183] The text-related transcoding can be done as part of a more general transcoding function that supports additional structural attributes beyond those that are text-related. In other cases, an alternate version of the visual content element may already be available that is more suitable for text-related rendering and can be used instead of transcoding. The text-related rendering can be restricted to rendering only text-related attributes, or it can support additional structural attributes. These can include forms (e.g. HTML forms) or other specifications for visual controls that will be rendered into the text-related rendering.
[0184] In this illustrative embodiment, the server-side or client-side rasterizing function generates one or more bit-map pixel representation(s) of the visual content and its associated layout. This is combined with rendering that is limited to text-related aspects of the visual content. If multiple rasterized representations are generated from the results of the initial rasterizing function, this can be a multi-level set of bit-map pixel representations.
[0185] By rendering the text separately, the text rendering function can optimize the readability and usability of the visual content's text-related aspects. This includes providing appropriate word-wrapping functions tailored to the client viewport being used to view the rendered text representation. Text rendering can also support user control over text fonts and/or font sizes, including customization to the user's preferences.
[0186] During the transcoding process, one or more correspondence map(s) can be generated to map between the initial rasterized representation(s) and the text-related transcoding of the visual content (raster-to-source and/or source-to-raster maps). A correspondence map assists in providing a user interface that selects or switches between the text representation and the rasterized representation(s). A correspondence map can also allow the user to select a pixel region on a rasterized representation and then view the associated text (as rendered from the text-related transcoding). Reverse mapping, from the rendered text to an associated pixel region within a rasterized representation, is also possible.
[0187] If a server performs the transcoding function and a client performs the rendering of the transcoded content, the relevant correspondence map(s) from the initial rasterized representation(s) to the text-related representation can be transmitted from the server to the client. This allows the client to directly handle user requests that switch between representations. If a reverse-mapping (from text-based transcoding to rasterized version) is supported, this can also be transmitted to the client. There can also be a mapping generated between the text-based transcoding and its rendered bit-map pixel representation, as part of the rasterizing (or rendering) function applied to the transcoded source representation.
[0188] For example, text-related transcoding on a server can include information that a region of text has an associated hyper-link, but the server can retain the data that identifies the "target" of the hyper-link (such as the associated URL) while sending the client a more compact identifier for the "target" information. This reduces the amount of data transmitted to the client and simplifies the client's required capabilities. In this example, the client sends hyper-link requests to the server with the server-supplied identifier, so that the server can access the associated data and perform the hyper-linking function.
[0189] If at least one of the initial rasterized representation(s) is at a lower relative pixel resolution (such as an overview representation), then multi-level browsing can be provided between this rasterized representation and the rendered text-related representation. The text-related representation can be used instead of, or in addition to, an initially rasterized representation at a higher relative pixel resolution (such as a detail representation).
[0190] In an illustrative embodiment, at least one initially rasterized representation is used as the overview representation. This overview representation acts as an active navigational map over the text-related representation, in addition to acting as a map over any other rasterized representations at higher relative pixel resolutions. A pixel region selection within the overview representation can be used to select or switch to a corresponding part of the rendered text-related representation. The appropriate correspondence maps can also be used to select or switch between the rendered text-related representation and a corresponding pixel region of a rasterized representation (such as a detail representation).
[0191] Multi-Modal Combination of Rasterizing with a Text-Related Summary Extraction
[0192] When an overview representation is displayed in a client viewport, this display can be supplemented with additional information taken from a text-related summary extraction of the associated visual content element. The summary extraction is a transcoding function that extracts text-related data providing summary information about the visual content element. In one embodiment, this includes any titles; "header" text elements; and text-related representations of hyperlinks. A correspondence map can be generated between the summary information and the overview representation.
[0193] In response to a user request for summary information at a specified pixel location, the corresponding summary text can be rendered and displayed in the client viewport. As a result, the extracted summary text is "revealed" to the user while selecting or moving across the overview representations based on correspondence map data. The "revealed" text can be rendered and displayed in a pop-up window over the client viewport, or in a designated location within the client viewport. The client can provide a mechanism to select and process a "revealed" hyperlink. The client can then switch the client viewport to display a rasterized representation of the hyperlink's "target" visual content element.
[0194] The summary representation is typically much smaller than either a text-related transcoding of the entire visual content element or a detail level rasterization of the visual content element. This is well suited for implementations where a server generates the summary representation and transmits this to the client. In this case, the client can request the server to send the entire associated correspondence map, or make individual requests for correspondence data as required. If the server performs the summary extraction, it can encode hyperlink "targets" as more compact identifiers known to the server, to further reduce the size of the summary representation transmitted to the client.
[0195] Partial Representations
[0196] In both a multi-level and a multi-modal set, a representation can be a partial representation. A partial representation is the result of a selection operation. The selection can be applied either in source form to the source visual content element, or in raster form to a rasterized representation. A selection in source form can be applied during a transcoding function or within the rasterizing (or rendering) function. A selection in raster form can be applied after the rasterizing (or rendering function).
[0197] The selection function, and its results, can be reflected in the appropriate correspondence map(s). The correspondence map can have entries for the selected portion of the source or raster, but no entries for those portions of the associated source or raster excluded from the selection.
[0198] When only a partial representation is available for a given mode or given level of a multi-level set, then the remaining portions outside the selection are null. These null areas can be either be not displayed, or displayed with a special "null representation" (such as white, gray or some special pattern). When multiple partial representations are available for the same mode, or for the same level of a multi-level set, they can be combined into a composite representation (in either raster or source form, as appropriate).
[0199] Partial representations, and composite partial representations, can save processing, communications and/or storage resources. They represent the portion of the visual content element or input bit-map representation of interest to the user, without having to generate, transmit and/or store those portions not needed.
[0200] By providing a user interface to these partial and composite partial representations, the present invention makes these advantages available within the context of a consistent set of user interface gestures. These gestures provide easy and consistent user access to full representations, partial representations and composite partial representations within a multi-level or multi-modal set. They also provide new means to specify, generate and/or retrieve partial or composite partial representations based on gestures applied to related full, partial or composite partial representations within a multi-level or multi-modal set.
[0201] Partial and composite partial representations provide significant advantages in configurations where the client has limited processing, power and/or storage resources. This is the case for most handheld devices such as Personal Digital Assistants (PDAs, like the PalmPilot or PocketPC) or cellular telephones with bit-map displays. Partial representations also provide advantages when a representation is being sent from a client to a server over a communications link with limited bandwidth, such as a serial communications port or the current cellular telephone network.
[0202] Pointing Devices
[0203] The gestures require that the client device support at least one pointing device, for specifying one or more pixel location(s) on the client's bit-map display device. Commonly used pointing devices include:
[0204] a) a mouse,
[0205] b) a "pen" or stylus (typically used with an input tablet or pressure-sensitive display screen),
[0206] c) a pressure-sensitive surface (such as a touch-pad or pressure-sensitive display screen) which may or may not use a pen or stylus,
[0207] d) a joystick,
[0208] e) the "arrow" keys on a keyboard.
[0209] There are numerous types and variations of these devices, and any that supplies pointing functionality can be used.
[0210] Voice-activated, breath-activated, haptic (touch-feedback), eye-tracking, motion-tracking or similar devices can all provide pointing functionality. These alternative input modalities have particular significance to making the present invention accessible to persons with physical handicaps. They can also be used in specialized applications that take advantage of the present invention.
[0211] Some gestures combine a selection action with a location specification. The selection action can be provided by:
[0212] a) a button press on a mouse device,
[0213] b) a press of a pen or stylus on an appropriate surface,
[0214] c) a press on a touch-sensitive surface,
[0215] d) a keyboard button press,
[0216] e) a physical button press on the client device (or device that communicates with the client device), or
[0217] f) any other hardware and/or software than can provide or simulate a selection action.
[0218] Keyboard/Mouse and Pen-Based Interface Styles
[0219] Illustrative embodiments of the present invention can support gestures for two user interfaces styles: "keyboard/mouse" and "pen-based". For purposes of describing an illustrative embodiment, the following distinctions are made between the "keyboard/mouse" and "pen-based" user interface styles:
[0220] a) in the "keyboard/mouse" user interface, the pointing device has one or more integrated button(s), and the state of each button can be associated with the current location of the pointing device,
[0221] b) in the "pen-based" user interface, the pointing device can report both its location and an associated state that differentiates between at least two modes (pen-up and pen-down),
[0222] c) in the "keyboard/mouse" user interface, alphanumeric input can be entered through a keyboard or keypad,
[0223] d) in the "pen-based" user interface, alphanumeric input can be entered through gestures interpreted by a handwriting recognition function (such as the Graffiti system on a PalmPilot).
[0224] In a pen-based device with a pressure-sensitive surface, the pen modes are typically related to the level of pen pressure on the surface. Pen-down means that the pressure is above a certain threshold, pen-up means that the pressure is below the threshold (or zero, no pressure). Some pen-based devices can differentiate between no pressure, lighter pressure and heavier pressure. In this case, a lighter pressure can correspond to location mode, while a heavier pressure can correspond to selection mode. Some pen-based devices can differentiate between three or more levels of pressure, and the client can determine which level(s) correspond to location and selection modes.
[0225] It is possible to emulate a mouse with a pen, or a pen with a mouse. It is also possible to emulate either a pen or mouse with any other pointing device. For example, a finger pressing on a touch-sensitive screen can emulate most pen functions. A keyboard can be emulated by displaying a keypad on the display screen, with the user selecting the appropriate key(s) using a pointing device.
[0226] Therefore, the distinctions between "keyboard/mouse" and "pen-based" are not about the physical input devices but instead about the user interface style(s) implemented by client software. The client software can blend these styles as appropriate, or support a subset of features from either style. The style distinctions are simply a way to clarify different gestures and their meanings within an illustrative embodiment.
[0227] Personal computers (PCs), intelligent terminals (with bit-map displays), and similar devices typically support a keyboard/mouse interface style. The mouse is the primary pointing device, with one or more selection button(s), while the keyboard provides alphanumeric input. The keyboard can also provide specialized function keys (such as a set of arrows keys), which allows the keyboard to be used as an alternate pointing device.
[0228] In a pen-based user interface, the primary pointing device is a pen (or stylus) used in conjunction with a location-sensitive (typically pressure-sensitive) surface. The surface can be a separate tablet, or a pressure-sensitive display screen. Handheld devices, such as a personal digital assistant (PDA) like the PalmPilot, typically support a pen-based user interface style. Cellular telephones with bit-map displays can combine a pen-based user interface style with a telephone keypad.
[0229] A pen-based user interface can support alphanumeric data entry through any or all of the following:
[0230] a) an alphanumeric keyboard or keypad,
[0231] b) handwriting recognition of pen gestures (e.g. the Graffiti system on a PalmPilot), and/or
[0232] c) displaying a keypad on the display screen and allowing the user to select the appropriate key(s).
[0233] A single client device can support various combinations of keyboard/mouse and pen-based user interface styles. If a client device supports both multiple simultaneous pointing devices (physical or virtual), it can provide a means to determine which is the relevant pointing device at any given time for interpreting certain gestures of the present invention.
[0234] Interpreting Events as Gestures
[0235] User interface actions are typically reported to the client as user interface events. Location events specify one or more location(s) on a client viewport. Pointing devices can generate location events. Selection events specify a selection action, and may also provide one or more associated location(s). When a pointing device generates a selection event, it typically also provides location information.
[0236] As the client processes these events, it interprets some subset of these events as gestures. A gesture is interpreted from a sequence of one or more events. The gesture is determined by the ordering of these events, the information associated with each event (such as location information) and the relative timing between events.
[0237] Gesture-Based User Interface
[0238] The user interface gestures allow the user to control various aspects of navigating and/or browsing through the multi-level and/or multi-modal sets of bit-maps. This includes gestures to control the process of:
[0239] a) panning across one or more bit-map(s) in the multi-level or multi-modal set,
[0240] b) scrolling across one or more bit-map(s) in the multi-level or multi-modal set,
[0241] c) moving to a location on one or more bit-map(s) in the multi-level or multi-modal set,
[0242] d) selecting a location on one or more bit-map(s) in the multi-level or multi-modal set,
[0243] e) selecting or switching from one representation level to another within the multi-level or multi-modal set of bit-maps, and/or
[0244] f) changing the input mode associated with one or more bit-map(s) in the multi-level or multi-modal set.
[0245] The client device can maintain the multi-level or multi-modal set as one or more client display surface(s). In an illustrative embodiment, each level and each mode is maintained as a separate client display surface. The client can allocate one or more client viewport(s) for displaying the contents of the client display surface(s). If a client display surface is directly allocated within the display memory of the client's bit-map display device, then this client display surface and its associated viewport share the same underlying data structure(s).
[0246] Based on user input at the client device, the client device paints one or more client display surface(s) into its client viewport(s), and thus displays one or more of the bit-map representation(s) on its display screen. In an illustrative embodiment, the client device can display pixels from one or more representation levels or modes at any given time, by displaying selected portions of multiple display surfaces (one per representation level) in multiple client viewports (one viewport per display surface).
[0247] In an illustrative embodiment, two or more client viewports can be displayed simultaneously on the client's bit-map display device, or a user interface provided to switch between client viewports. The decision to display multiple viewports simultaneously is based on client device capabilities, the number of pixels available in the client bit-map display device for the client viewport(s), software settings and user preferences.
[0248] In an illustrative embodiment, when the overview representation of a multi-level set is being displayed, the client displays as much of this representation as possible within a client viewport that is as large as possible (but no larger than required to display the entire overview representation). This gives the overview representation precedence over display of any sub-region(s) of different representation level(s) or representation mode(s). This is to maintain the advantages of viewing and working with as much of the overall layout as possible at the overview level.
[0249] In an illustrative embodiment, the client device can divide a representation into multiple tiles, where the tile size is related to the size of a client viewport. The client device can provide a user interface to select or switch between tiles, pan across adjacent tiles, and/or scroll across adjacent tiles.
[0250] Unified Set of Gestures
[0251] The present invention provides a unified set of gestures that support navigation through and/or interaction with the multi-level or multi-modal set of bit-maps. Within the unified set of gestures, there are three general classes of gestures: location gestures, selection gestures and input-mode gestures. Location and selection gestures are described in the sections "Location Gestures" and "Selection Gestures", while other input-mode gestures are described below in the section "Special Input Modes and Input-Mode Gestures".
[0252] These gestures can be implemented in different ways on different clients. Some clients will implement only a subset of the gestures, or assign different meanings to certain gestures. An implementation in accordance with the present invention can:
[0253] a) support at least one "swipe" or "drag" gesture (as defined below in the "Selection Gestures" section), and
[0254] b) interpret this swipe or drag gesture as a switch or selection from one level of a multi-level set of bit-maps to another level within the same multi-level set, or as a switch or selection from one modal representation to another within the same multi-modal set.
[0255] Advantages of the Unified Set of Gestures
[0256] The unified set of gestures provides new ways to navigate through and/or interact with a multi-level or multi-modal set of bit-map pixel representations. Compared to indirect actions such as scroll bars, menu selections and pop-up "zoom" dialog boxes, the unified gestures provide direct actions that allow the user to keep the pointing device in the same part of the screen where bit-map is being displayed. Back-and-forth movements to various auxiliary menus, visual controls or tools are minimized or eliminated. The unified gestures greatly reduce the amount that the pointing device (e.g. mouse or pen) has to be moved, and hence greatly improve ease of use.
[0257] The user is saved the tedium (and repetitive stress) of back-and-forth movements to scroll bars painted around the perimeter of the client viewport, scrolling across the bit-map to find the region of interest. Instead, the user has direct access through swipe gestures to higher resolution or different modal versions of the region of interest. The user also has direct access to overview (or intermediate) versions that show the overall layout of the input bit-map, without having to assemble a mental image by scrolling through a single representation.
[0258] The unified set of gestures are particularly advantageous when using a hand-held device such as a personal digital assistant (PDA) like a PalmPilot or cellular telephone with a bit-map display. In these devices, the bit-map display area is relatively small compared to a standard personal computer (PC), and a pen-based user interface style is typically preferred over a mouse/keyboard user interface style. The unified set of gestures provide a better means to control the interaction with and/or navigation of any input bit-map that has a resolution greater than the bit-map display resolution, and does this in a way that maximizes the utility of a pen-based user interface style. Certain control gestures typically used within a mouse/keyboard user interface style (particularly those that assume two-handed operation) are not available with a pen-based handheld device, but can be provided with the unified set gestures.
[0259] These advantages can be grouped into two major categories. The first category consists of advantages from working with a multi-level or multi-modal set of bit-map pixel representations versus working with only a single bit-map pixel representation. The unified set of gestures makes working with a multi-level or multi-modal set simple and practical. The second major category consists of those advantages over previous methods of working with multi-level bit-map pixel representations. The unified set of gestures makes it more efficient and easier to work with multi-level bit-maps.
[0260] The advantages versus using a single bit-map pixel representation are as follows:
[0261] First, the overview representation is small enough to rapidly download (if supplied by a server), rapidly process on the client device and rapidly display on the client device's bit-map display. This increases perceived user responsiveness. If the user decides, based on viewing the overview representation, that the intermediate and/or detail representation(s) are not needed, then some or all of the processing and display time for these other representation level(s) can be avoided. This further increases perceived user responsiveness, while reducing client processing and client power requirements.
[0262] Second, the overview representation is typically small enough to fit entirely within the allocated client viewport on most client devices. This provides the user with a single view of the overall layout of the input bit-map pixel representation. Even if the overview representation cannot fit entirely into the client viewport, it is small enough so that the user can rapidly gain a mental image of the overall layout by scrolling, panning and/or tiling of the overview representation.
[0263] Third, the overview representation provides a convenient means of navigating through the input bit-map pixel representation. The user can select those areas to be viewed at a higher resolution (an intermediate representation and/or detail representation), or to be viewed in a different modal representation (such as a text-related rendering with scrolling and word-wrap optimized for the current client viewport). This saves the user considerable time in panning, scrolling and/or tiling through a single full-resolution rendered representation. This also allows the user to choose the most appropriate modal representation of the detail, by selecting a "region of interest" from the overview or intermediate level, and move back and forth quickly and easily between both levels and modes.
[0264] Fourth, the user can optionally make selections or perform other user actions directly on the overview representation. This can be an additional convenience for the user, particularly on client devices with a relatively low-resolution bit-map display (such as a PDA device or cellular telephone with a bit-map display). If the intermediate and/or detail representation(s) have not been fully processed, perceived user responsiveness is improved by allowing user actions on the overview representation overlapped with processing the other representation(s).
[0265] Fifth, the optional intermediate representation(s) provide many of the advantages of the overview representation while providing increased level(s) of detail.
[0266] Sixth, the detail representation provides sufficient detail to view and use most (if not all) aspects of the input bit-map pixel representation. A system implemented in accordance with the present invention lets the user easily switch back and forth among the representation levels, allowing the user to take advantage of working at all available levels. The user is not constrained to work at a single level of detail, but can move relatively seamlessly across levels, while the system maintains the coherency of visual representation and user actions at the different levels.
[0267] Seventh, a multi-modal set of representations allows the user to select and view the a rasterized representation of a source visual content element using whatever mode is the most convenient, most efficient, and/or most useful. The present invention provides a set of direct gestures that access the underlying correspondences being maintained between the different modal representations. By combining multi-modal with multi-level, selecting a "region of interest" from an overview in one mode and then viewing the corresponding detail within another mode can be accomplished through a single swipe or "overview drag" gesture.
[0268] The advantages over previous methods of working with multi-level bit-map pixel representations are as follows:
[0269] First, unified gestures that combine location specification with selection properties reduces the number of required gestures. These save time and can considerably reduce user fatigue (including reduction of actions that can lead to repetitive stress injuries). For example, a "swipe" that moves up or down a level while simultaneously defining a "region of interest" can be compared favorably to any or all of the following methods:
[0270] a) moving the location from the client viewport to a menu, selecting a menu item to specify scaling, and then scrolling the scaled viewport to the desired region of interest,
[0271] b) moving the location from the client viewport to a menu, selecting a menu item that generates a pop-up dialog box to control scaling, moving the location to the dialog box, selecting one or more scaling control(s) in the pop-up dialog box, and then scrolling the scaled viewport to the desired region of interest,
[0272] c) moving the location from the client viewport to a user interface control (or widget) outside the client viewport that controls scaling, selecting the appropriate control (or widget), possibly dragging the control (or widget) to make the appropriate level selection, and then scrolling the scaled viewport to the desired region of interest, and
[0273] d) moving the location from the client viewport to an external tool palette that defines a "zoom" tool, selecting the "zoom" tool, moving the location back to the client viewport, dragging the zoom tool across the region of interest, and moving the location back to the tool palette to de-select the "zoom" tool.
[0274] Second, unified gestures provide a uniform method of moving up and down the levels of a multi-level set of bit-map pixel representations. In conventional icon/document pairs, there are only two levels: an icon that is a reduced scale version of a bit-map pixel representation, and a full-scale version of the bit-map pixel representation. One set of user interface gestures selects the full-scale version from the icon, a completely different set of gestures creates an icon from the full-scale version. There are typically no intermediate levels, or gestures for selecting or switching to an intermediate level. There are typically no gestures for selecting the region of interest within the icon representation and only displaying the region of interest of the full-scale version within a client viewport. Similarly, there are typically no gestures for displaying only a region of interest within the lower level (icon) representation.
[0275] Third, unified gestures provide a uniform method of moving up and down the levels within a single client viewport. In the typical icon and a full-scale bit-map version, the icon and full-scale bit-map are displayed in separate client viewports. There is no notion of sharing a single client viewport between the icon and full-scale version, and then switching between the two. Even when the user interface provides switching between levels within a single client viewport, this switching is done through one of the methods previously described above. These methods not only take more steps, they are often not uniform. Different menu items or visual controls (or widgets) are required to move down a level compared to those required to move up a level. Often there is not even a gesture to move up or down a level, but requires explicitly choosing a level (or zoom factor).
[0276] Fourth, the unified set of gestures provides methods to use similar gestures to not only move up or down representation levels but also perform other actions associated with the bit-map pixel representation. For example, a "swipe" up can move to a less detailed (lower) level, a "swipe" down can move to a more detailed (higher) level. In the same example, a horizontal "swipe" can perform a selection action at the current level (such as displaying a menu or displaying additional information about the visual content element). This unifies the level-oriented navigational gestures with a different type of common gesture. A "drag" along a similar path can activate a panning or scrolling navigational operation within the current level, instead of requiring an entirely different navigational paradigm for pan/scroll as compared to zoom. A "tap" at the same location can activate the same selection action as a "swipe", or activate a different context-dependent action.
BRIEF DESCRIPTION OF THE DRAWINGS
[0277] Other objects, features and advantages will occur to those skilled in the art from the following description of the preferred embodiments, and the accompanying drawings, in which:
[0278] FIG. 1 is a schematic diagram of a display surface painting function used in an embodiment of the invention;
[0279] FIG. 2A is a view of an input bit-map pixel representation of a web page according to this invention;
[0280] FIG. 2B is a sample overview representation of the web page shown in FIG. 2A;
[0281] FIG. 2C is a sample intermediate representation of the web page of FIGS. 2A and 2B;
[0282] FIG. 2D is a sample detail representation of the web page of FIGS. 2A, 2B and 2C;
[0283] FIG. 3A is a view of an input bit-map pixel representation of a spreadsheet according to this invention;
[0284] FIG. 3B is a sample overview representation of the spreadsheet shown in FIG. 3A;
[0285] FIG. 3C is a sample production representation of the spreadsheet of FIGS. 3A and 3B;
[0286] FIG. 4A is a sample display of the overview level on a client device according to this invention;
[0287] FIG. 4B is a sample display of the detail level from the overview level of FIG. 4A, on a client device according to this invention;
[0288] FIG. 5 is a flowchart of client processing of events according to this invention;
[0289] FIG. 6 is a flowchart of end gesture processing according to this invention;
[0290] FIG. 7 is a partial event list for this invention;
[0291] FIG. 8 is a flowchart of gesture processing according to this invention;
[0292] FIG. 9 is a chart of two location mode gestures according to this invention;
[0293] FIG. 10 is a flowchart of location mode gesture processing according to this invention;
[0294] FIG. 11 is a chart of selection mode gestures according to this invention;
[0295] FIG. 12 is a flowchart of selection mode gesture processing according to this invention;
[0296] FIG. 13 is a flowchart of special input mode gesture processing according to this invention;
[0297] FIG. 14 is a flowchart of tap processing according to this invention;
[0298] FIG. 15 is a schematic diagram of pixel transform functions according to this invention;
[0299] FIG. 16 is a schematic diagram of mapping client locations to input bit-map according to this invention;
[0300] FIG. 17 is a schematic diagram of multi-modal set of representations according to this invention;
[0301] FIG. 18 shows an example of correspondence maps according to this invention; and
[0302] FIG. 19 shows an example of combining rasterizing and text-related transcoding according to this invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0303] Location Gestures
[0304] A location gesture is interpreted from a sequence of one or more location event(s). Location gestures support movement within a given client viewport. The location gestures can include:
[0305] a) "move": a traversal along a path of one (X,Y) location on a given client viewport to one or more other (X,Y) location(s) on the same client viewport, and
[0306] b) "hover": maintaining the pointing device at the same (X,Y) location on a given client viewport for an interval of time greater than a specified minimum "hover interval"
[0307] Some user interfaces cannot support location events, and therefore cannot provide location gestures. This is true for pointing devices that can only provide a pixel location in conjunction with a selection event. For example, a pen device using a pressure-sensitive surface is typically unable to report the pen's location when it is not touching the surface. However, if the pointing device can differentiate between two types of location-aware states, it can use one state for location events and the other for selection events. For example, some pressure-sensitive surfaces can distinguish between levels of pressure. In this case, a lighter pressure can be associated with a location event, and a heavier pressure associated with a selection event.
[0308] Move
[0309] A move gesture typically requires a minimum of two location events, one for the first (X,Y) location and one for the second. However, some clients will report both a start and end location for a location event, which allows the client to determine if a move gesture was made.
[0310] In response to a move gesture, the client can optionally provide appropriate feedback to echo the current position of the pointing device on the bit-map display device. For example, this position feedback can be supplied by painting an appropriate cursor image, highlighting a related portion of the client viewport, or supplying a display of an associated (X,Y) coordinate pair.
[0311] Client feedback is not required for move gestures. For example, a pen move over a pressure-sensitive display screen does not necessarily require any visual feedback, since the user is already aware of the pen's position on the display. Hardware limitations and/or stylistic preferences can also limit the client device's echoing of move gestures.
[0312] Hover
[0313] A hover gesture requires only a single location event. The client can then determine when the recommended "hover start" interval expires, relative to the time associated with the location event. A client typically uses one or more timer event(s) to time the "hover start" interval. The hover gesture is recognized if the pointing device remains within the same location (or within a small radius of this location) and the "hover start" interval expires. In an illustrative embodiment, the recommended "hover start" time-out interval is 1 to 2 seconds.
[0314] The client can provide visual and/or audio feedback for a hover gesture. For example, a blinking cursor or other type of location-based feedback can alert the user to the hovering location.
[0315] In an illustrative embodiment, the client can interpret a hover gesture as a request for context-dependent information. Context-dependent information can inc