UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

User models for intent-based authoring Csinger, Andrew 1995

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1996-090663.pdf [ 11.41MB ]
JSON: 831-1.0051594.json
JSON-LD: 831-1.0051594-ld.json
RDF/XML (Pretty): 831-1.0051594-rdf.xml
RDF/JSON: 831-1.0051594-rdf.json
Turtle: 831-1.0051594-turtle.txt
N-Triples: 831-1.0051594-rdf-ntriples.txt
Original Record: 831-1.0051594-source.json
Full Text

Full Text

USER MODELS FOR INTENT-BASED AUTHORING By Andrew Csinger B.Eng. McGiil University, 1985 M.Sc. (Computer Science) University of British Columbia,  A THESIS  S U B M I T T E D IN PARTIAL F U L F I L L M E N T O F  THE REQUIREMENTS FOR T H E DEGREE OF DOCTOR OF PHILOSOPHY  in T H E FACULTY O F G R A D U A T E STUDIES DEPARTMENT OF COMPUTER SCIENCE  We accept this thesis as conforming to the required standard  THE U^VERSITY  OJ^RITISH  COLUMBIA  November 1995 © Andrew Csinger, 1995  1991  In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission.  Department of Computer Science The University of British Columbia Vancouver, Canada  Date:  Abstract Authoring is the collection, selection, preparation and presentation of information to one or more readers by an author. The thesis takes a new, critical look at traditional approaches to authoring, by asking what knowledge is required and at which stages of the process. From this perspective, traditional authoring is seen to entrench an early commitment to both form and content. Although the late binding of form is now commonplace in structured document preparation systems, a similar delay in the binding of content is necessary to achieve user-tailored interaction. The authoring paradigm we have developed to service this goal is called intent-based authoring, because the author supplies at compile-time a communicative goal, or intent. Just as S G M L editors and H T M L browsers defer rendering decisions until run-time by referring to a local stylesheet, intent-based authoring systems defer content-selection decisions until runtime when they refer to models of both author and reader(s). This thesis shows that techniques from artificial intelligence can be developed and used to acquire, represent and exploit such models. Probabilistic abduction is used to recognize user models, and cost-based abduction to design tailored presentations. These techniques are combined in a single framework for best-first recognition and design. These reasoning techniques are further allied with an interaction paradigm we call scrutability, whereby users critique the model in pursuit of better presentations; users see a critical subset of the model determined by sensitivity analysis and can change values through a graphical user interface. The interactivity is modelled to ensure that representations of the user model to the user are made in the most perceptually salient manner. A prototype for intent-based video authoring is described. Video is used as a test medium because it is a "worst case" temporally linear medium; a viable solution to video authoring problems should apply easily to more tractable traditional media. The primary contribution of this dissertation is to the field of applied artificial intelligence, specifically to the emerging field of user modelling. The central contribution is the intent-based authoring framework for separating intent from content.  ii  Contents Abstract  ii  Table of Contents  iii  List of Tables  vi  List of Figures  vii  Acknowledgements  I  Introduction  1  Systems have it Easy  1.1 1.2 1.3 1.4  2  1 2  Introduction Traditional Approaches Intent-based Authoring Valhalla and the Departmental Hyperbrochure 1.4.1 Sample Session Overview  4 5 8 11 11 13  Literature Survey and Formal Background  15  1.5  II  viii  User Modelling  17  2.1 2.2  18 23 23  Overview Issues: Acquisition and Exploitation 2.2.1 Acquisition in  2.2.2 Exploitation 2.2.3 Correction: Situations requiring correction 2.2.4 Scope: What is Represented? 2.2.5 Extent and Adaptivity Multiple Agents, Multiple Models User Modelling Shells User Modelling: A Definition by Consensus  2.3 2.4 2.5  3  Authoring  35  3.1 3.2 3.3 3.4 3.5  36 41 46 50 53 59 63 69 72 75  3.6 3.7 3.8 3.9  Dimensions of Authoring Graph Generation Data Analysis and Visualization Psychophysical Research Multimedia 3.5.1 Video Authoring Computer-Supported Cooperative Work Structured Documents: Content from F o r m Hypertext and Hypermedia Putting it all Together: Cyberspace?  4 Formal Background 4.1  4.2  4.3  III 5  24 26 27 28 29 29 31  80  Symbolic Logic and Default Reasoning 4.1.1 Default-Programming with Theorist 4.1.2 Summary and Conclusions Decision M a k i n g Under Uncertainty 4.2.1 Bayesian Decision Theory 4.2.2 Example 4.2.3 Decision Analysis Speech Acts  Contribution  92  User Models for Intent-based Authoring 5.1 5.2 5.3  80 82 85 86 86 87 89 89  Overview Motivation Components of the Theory  96 96 97 98  iv  5.4  5.5 5.6 5.7  5.8  6  7  105 106 107 119 128 131 137 137  5.7.2 Costs and Utilities 5.7.3 Other similar approaches Conclusions  139 141 142  Implementation  143  6.1  144 151 153  6.2  IV  5.3.1 Summary: Inputs, Outputs and Roles A n Abductive Framework for Recognition and Design 5.4.1 Recognizing User Models 5.4.2 Designing Presentations Interactivity and Scrutable Models Example Alternative Approaches 5.7.1 Decision Theory for Multimedia Presentation Design . . .  Architecture 6.1.1 The Reasoner Functionality  Conclusions  158  Conclusions  159  7.1 7.2  161 161 162 166 168  Implications Future Work 7.2.1 Learning: Updating Prior Probabilities 7.2.2 Privacy 7.2.3 Future Development  Bibliography  172  v  List of Tables 2.1  Errors in user models  26  3.1 3.2 3.3 3.4  44 49 57 64  3.5  Mackinlay and related work Data Analysis and Visualization Intermedia CSCW time and space diagram KMS  4.1  Domain-Formulation  84  5.1 5.2 5.3 5.4  Priors and normalized posteriors Myopic Tradeoff Table Results of Sensitivity Analysis before user action Results of Sensitivity Analysis after user action  vi  67  118 121 135 136  List of Figures 1.1 1.2 1.3 1.4  The traditional approach to authoring The intent-based approach to authoring The Valhalla User Model Window The Valhalla Control Window  4 8 12 13  2.1  User and system models  25  3.1 3.2  EX VIS five-dimensional stick Iconograph  5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8  A partially elaborated presentation Roles, Inputs and Outputs Representative recognition assumables Example facts and rules Rules for perceptual salience The Valhalla User Model Window Assumables for perceptual salience calculations Show: an example intent  104 105 107 108 Ill 113 114 126  6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9  Valhalla's Distributed Architecture The hierarchy of interfaces to the video server The Valhalla Control Window A frame from the Departmental Hyperbrochure The Valhalla User Model Window Valhalla implementation The Valhalla Agents Authorial Intent as a Prolog query Knowledge-based Video Annotation and Presentation  144 146 148 150 151 152 152 153 154  vii  figures  47 48  Acknowledgements And further, by these, my son, be admonished: of the making of many books there is no end; and much study is a weariness of the flesh. —Ecclesiastes 12:12  M y wife, Sue Rathie, who kept me on the path where I would have wandered, oh, so many times. For reading my thesis more often than anyone else. For helping me with all those details. For listening. For being there, all the time... M y parents, Helen and Joseph, who got me to where I could do the rest on my own. There were times over the last few years when I think the only reason I kept going was because I promised my mother I would "get a P h D . " I was six years old when I made that promise... K e l l y Booth has been the source of much good advice, and a partner in many a stimulating discussion. When I went to Germany for a summer of study, he left me with the sage words: "take lots of toilet paper and don't drink the water." It later turned out that the water supply at the German Center for Artificial Intelligence was tainted, and I remain forever grateful... Dave Poole loves to argue. He won't admit it, but he does. A n d he is infuriatingly good at it, as all of his graduates agree. The strange thing is, that after years of bickering, and endless iterations of frustrating non-felicitous utterances and failed mutual beliefs with unintended perlocutionary effects, his students still like him. Valerie M c R a e has looked in on these often boisterous meetings from time to time, wondering if it wouldn't be prudent to invite the R C M P . Very strange, indeed... Steve Gribble, for implementing the Next interface. It was a pleasure to work with him. He's the best undergraduate I've ever met, with technical and interpersonal skills far beyond his years. M i k e Horsch, who taught me everything I know. Steve Mason, for taking just as long to finish his P h D . Scott Flinn, who knows everything about N e X T . Imouttahere.  viii  Thanks Thanks to my co-supervisors, Kellogg S. Booth and David Poole, and to the other members of the thesis advisory committee: • Tom Calvert (Computer Science, Simon Fraser University), • Bob Goldstein (MIS, Faculty of Commerce, U B C ) , • Richard Rosenberg (Computer Science, U B C ) . The examiners were: • Jim Little (Departmental Examiner, Computer Science, U B C ) , • R i c k i Goldman-Segal (University Examiner, Education, U B C ) , • David Chin (External Examiner, Computer Science, University of Hawaii). The student reader was Michael Horsch. I am grateful for all their contributions to this work.  ix  Parti Introduction  i  Chapter 1 Systems have it Easy  Each of the shots must be physically precede and follow it...  spliced with cement or tape to the shots that  —James M o n a c o How to Read a Film, page 117  Do not be afraid to seize whatever you have written and cut it to ribbons; it can always be restored to its original condition in the morning... —The Elements of Style, page 72  The work reported in this dissertation is rooted in the belief that the best way to make progress towards cooperative computational systems is to take stock of 1  human capabilities and limitations, and then to pursue human-computer relationships which exploit these capabilities and overcome these limitations. A humancomputer symbiosis is warranted, where each participant in the relationship does what it does best. Advances in the foreseeable future will likely revolve about the "Instead of the passive-aggressive error messages that are currently given in response to i n correct or incomplete specifications, intelligent agents should collaborate with the user to build an acceptable request." [1] 1  3  design of this symbiosis, rather than the embodiment of intelligence in some computational artifact. There are at least two complementary approaches to achieving this goal. One way is to build superior interfaces with better affordances that clearly advertise their function to the human user, and that cater to known human psychophysics [151]. Research in Human Computer Interaction (HCI) pursues this approach in many directions—some of it is surveyed in Chapter 3 of this dissertation. Central to the H C I approach is the desire to make the system easier for the user to understand [122, 123]; the user should be able to acquire and exploit a model of the system. The other, complementary way stems from the realization that until now, it has been the user that does all of the explicit modelling, and that perhaps we have reached a stage where the computer can be made to bear at least part of the burden of representation. Research in User Modelling [115, 116] takes this approach; the system should be able to acquire and exploit a model of the user. This dissertation focusses on the second approach. User-guided theorem-proving systems and diagnostic expert systems are obvious examples of tasks in which humans and computers collaborate to achieve a goal. In environments like these, where the goals of the user are known by the system a priori, the job of acquiring a restricted model of the user is reasonably w e l l defined. Even in relatively open-ended application environments, the fact that the human user has chosen, say, a word-processing program rather than a drawing program, provides some grounds for model building. In contrast, vague or implicit user-goals in broader areas like decision-support systems (See, e.g., Goldstein [84]) or desktop publishing and production environments makes acquiring and updating models of users very difficult. This dissertation develops a particular approach to acquiring and exploiting models of users. The approach is applied to authoring and a prototype applica-  4  tion for video authoring and browsing is described. The introduction first defines authoring, and then offers a new perspective on traditional approaches. This perspective is one of the contributions of this thesis, and yields insight into the heretofore unregarded limitation of the traditional approaches, as well as insight into a new strategy called intent-based authoring which overcomes these limitations and is developed in this thesis. The goal of this thesis is to provide the intellectual and logical foundations upon which intent-based authoring systems can be developed.  1.1  Introduction Authoring Information ( Author  Content^)  v.  Reader ( ^ F o r m """^  Supply  Presentation  Demand  time  Figure 1.1: The traditional approach to authoring Authoring is the honorable tradition of collecting, structuring and presenting information in the form of a "document" rendered in some medium or media. Until recently, the document has been static, in the sense that once rendered, it is fixed for all time and for all readers. Promising new technologies have recently come into existence that could alleviate some of the limitations of this difficult, knowledgeintensive undertaking.  5  1.2  Traditional Approaches  In the traditional model of authoring, the task of an author is to collect a coherent 2  body of information, structure it in a meaningful and interesting way, and present it in an appropriate fashion to a set of readers (or viewers) of the eventual work. This traditional notion of authoring commits the author to the form as well as to the contents of the work, well in advance of the actual time at which it is presented. Figure 1.1 emphasizes that there is no clear separation of information from presentation, and authors are committed to both the form and content of their message. Structured-document approaches separate form and content, but user-tailored presentation is still not possible; reader "demand" only indirectly affects the authoring process. The familiar book format conveys the force of the general problem; once printed, there is no way—short of second-editions and published e r r a t a — to change the presentation for the particular needs and desires of individual readers, or groups of readers. The author must both select and order in advance the infor3  mation to be presented. Presentations tailored to the needs of particular audiences are not possible in the traditional approach to authoring, with its "compile-time" commitment to form as well as to content. The traditional approach to authoring when applied to non-traditional media like film, results in the same limitations. A s an offshoot of his semiological analyses of the cinema, Metz [134, p.45] wrote that "the spectatorial demand cannot mould the particular content of each film..." Metz is pointing out that when viewers sit in their theatre seats munching popcorn, it is too late in the traditional model T h e on-line copy of Webster's 7th Dictionary offers the f o l l o w i n g definition: 1: the writer of a literary work (as a book) 2  2a: one that originates or gives existence: S O U R C E <trying to track down the author of the r u m o r x t h e author of a theory > . T h a t some books are published in multiple editions—the Windows versus the Macintosh edition of a manual, or the Prolog versus the Lisp edition of a programming text, for instance, does not address the general problem. B o t h these groups were anticipated by an author at compile-time. N o t all individual readers can be anticipated in this way. 3  6  for their goals and desires to influence the content of the celluloid images being projected before them. Such statements—though accurate in 1974—are representative now of what should be considered out-dated, traditional approaches that take a technologically imposed "supply-side" view of the authoring process, in which authors and publishers join to decide both the form and the content of a document before readers ever make their wishes known. The principal limitation of these traditional approaches is the resulting "one-size-fits-all" static document, exemplified by the venerable book format that we have been using since well before Gutenberg, when scribes laboriously and meticulously copied manuscripts; identical replication was the sine qua non of these technologies. Most approaches to authoring are even today just bigger and faster versions of the printing press, and do nothing to overcome this early binding problem.  4  Today we can do better. We now have fast graphics, powerful reasoning engines and other technology, and rather than just add horsepower to traditional techniques, we can harness these new technologies to change the way authoring activities are conducted. Before continuing the exposition of this new, non-traditional authoring paradigm, we argue that at least two "new" strategies fall within the traditional model and still suffer from its limitations.  Hypermedia:  (See Section 3.8) is media which can be accessed non-linearly, or  non-sequentially, and it is nothing new. The terminology became common when non-linear documents became computerized, but hypermedia has been with us for a long time in the form of indexed documents {e.g., encyclopaedia), footnotes that reference other parts of a document or other documents, and so on. Although tables of contents and elaborate indexes are intended as remedies to the static document format, the burden of this approach to overcoming the "one size fits a l l " "Binding" is used here in the computer scientist's sense of associating values with variables. The pun was originally unintended. 4  7  problem falls heavily upon the reader. For instance, an encyclopaedia is a hyperdocument that can be browsed using the indices and cross-references as navigational links. The browsing activity completes the selection and ordering functions normally performed by the author and brings with it an inherent overhead that must be assumed by every reader. The viewer completes the job of the author by selecting and ordering the information to be viewed through the process of navigating the links established by the author. This not only pushes aspects of the problem from one person (the author) to another person (the viewer), it also dramatically increases the demands on the author who must provide explicit navigation cues in addition to the traditional authoring tasks. Reducing the amount of human effort required from the author and viewer is still a significant problem with current approaches to (hyper-)authoring. These effects can be mitigated by the knowledge-based approach advocated in this dissertation. See Section 3.8 for more exploration.  F o r m versus Content:  A n author chooses not only the information to be pre-  sented (the content) but also the order and style in which it w i l l be presented (the form). Both contribute to the effectiveness of a presentation, yet few people are highly skilled in all aspects of these processes. This problem is at least partially addressed by the structured document paradigm, which attempts to separate the specification of the content of a document from the specification of its form. Markup languages like S G M L (Standard Generalized Markup Language) and Hytime [146] are characteristic of this effort. They permit a delayed binding for what we might call the "surface structure" of a document (the format in which it is finally presented), but they still require the author to provide the "deep structure" (a hierarchical decomposition of the content as a structured document). See Section 3.7 for more details.  8  Authoring  Viewing User Model  Author  Reader  5-  Intent Information Space  Supply  =»-  (^Form^) Presentation Space  Knowledge ^Presentation, Media, Domain,  Demand  time  Figure 1.2:  Content versus Intent:  The intent-based approach to authoring In order to tailor presentations to the needs and desires  of individual readers, we need consumable models of these readers. For the "demandside" of the equation to have a direct effect on the form and content of the document, decisions about the final presentation must be delayed until "run-time," when the model of the reader can be brought to bear on the final stages of the design process. One difficulty is that user modelling is a new and complex problem. A s part of this thesis, techniques for user modelling have been developed and applied to the authoring problem. Thinking of authoring in terms of the knowledge required to support the activity has resulted in a new approach developed in this thesis called "intent-based authoring," which may ultimately resolve the principal problems with the traditional approach.  1.3  Intent-based Authoring  A more complete de-coupling of specification and presentation processes is required before the goal of truly personalized presentations is attainable. In addition to the content of the document, the author must also supply an  intent. The  9 author's intent is an arbitrarily complex communicative goal analogous to the notion of illocutionary force in the literature of speech-acts (see Section 4.3 for more on Speech A c t Theory and Section 5.3 for more on authorial intent as used in this dissertation), but can be safely interpreted in the context of this dissertation in its typical dictionary definition, which offers as synonyms: intention, intent, purpose,  design, aim, end, object, objective.  5  This authorial intent is usually implicit in the work; a newspaper article is (sometimes) written to inform, an editorial to convince, a dissertation such as this to argue for the acceptance of a new authoring paradigm, and so on. The author's intent is a (possibly abstract, very high-level) communicative goal. M a k i n g explicit this intention at the time the document is specified opens the door to truly user-specific document presentation. Information and presentation spaces can be clearly separated, bridged by various knowledge sources. In particular, a model of the viewer permits user-tailored determination of content at runtime; supply meets demand. Illustrated in Figure 1.2, we call this approach to authoring intent-based authoring, and describe here an application of the approach to the authoring of video documents. Video is used as proof-of-concept because it has characteristics which make it a popular recording medium, and because it is in many ways more difficult to deal with than other media (see Section 3.5.1); the intent-based approach to authoring advocated in this thesis is expected to apply equally to other media. M a c K i n l a y [130], Karp and Feiner [106] and others have argued similarly in the domains of graphical presentation and animation. Feiner explicitly uses the term "intent-based presentation." Previous work in automatic presentation has dealt with some aspects of the issues addressed herein, though it has been restricted for the most part to choosing "the right way" to display items based on their syntactic form [130, 175]. Semantic qualities of the data to be displayed are seldom consid5  S o u r c e : Webster's 7th Dictionary, on-line copy.  10  ered. Unlike Karp and Feiner [106], who describe a system for intent-based animation, we do not start with a perfect model of the objects to be viewed and then decide on the sequence of video frames to be generated. Instead, we start with a typically large collection of pre-existing video frames (usually sequences of frames) and select and order these to communicate the intended information. Our task is one of (automatic) "assembly," rather than (automatic) "synthesis," a different problem entirely. A presentation for our purposes is an edit list which specifies the order in which a selection of video clips is to be played. In different terms: i f the 6  presentation is about a cube in space, our model is not of the cube, but of the video tape whose subject matter is that cube; we do not model the cube, with its size and position, but the tape, with its frame numbers and contents. Recently, other researchers have considered related problems. Hardman et al. [94] undertake to free multimedia authors from having to specify all the timing relations for presentation events; some of these are derived by their system at run-time. Goodman [85] also build presentations on-the-fly from canned video clips and other information. The work reported in this dissertation focusses on user modelling, rather than the media and domain concerns that motivate most other work. A s shown in Figure 1.2, the user model is a crucial bridging element between the authoring activity that takes place at compile-time in the absence of the eventual viewer, and the viewing activity that takes place at run-time in the absence of the author. S u c h a characterization deliberately excludes from consideration details o f how clips are to be visually related (i.e., special editing effects like cut, fade or dissolve), attributes o f the playback (e.g., screen contrast, color balance, etc.) and other aspects o f video authoring that could easily fall within the purview o f a framework o f this sort. 6  11  1.4  Valhalla and the Departmental Hyperbrochure  The prototypical application called Valhalla, described in Chapter 6, is an intentbased authoring and presentation system. Valhalla is an intent-based implementation of the University of British Columbia Department of Computer Science Hyperbrochure. Originally conceived as a static one-hour video presentation, the hyperbrochure has been pressed into service as a prospecting tool for students, staff, faculty, granting agencies, industrial partners and other internal and external interests; this usage is asking much of a single, linear presentation. The needs of one group of viewers are quite different from others, not to mention the differences in the particular interests of individuals within these groups. The need for a more versatile way to show different people what the Department of Computer Science has to offer was identified, and Valhalla emerged partially in response to this need, and because it represented an opportunity to deploy the results of this research. The Departmental Hyperbrochure now consists of two thirty-minute video disks that include an introduction to U B C ' s Computer Science Department by its head, interviews with most of the faculty and staff, as well as walk-throughs of the laboratories. The remainder of this section is a walk-through of an actual sample session with Valhalla. The reader might keep this example of the usage of the system in mind while reading other parts of this thesis. The same example is treated in more technical detail in Section 5.4.1.  1.4.1  Sample Session  Tom, a faculty member at the Department of Computer Science, arrives in one of the department labs with Joan, a visiting student from the University of Toronto. Joan is considering transferring to U B C , and wants to find out more about the department. She doesn't have her own computing account there yet, so Tom logs  12  in and starts up Valhalla. The system derives an initial model of Tom based on widely available information indexed by his login i d ; Valhalla knows that Tom is faculty, and that he is local to the department, and assumes that his gender is male because there is currently a higher percentage of males than females in the department. The system also knows that Tom belongs to the graphics research group and assumes that his principal research interests lie in that area. Based on this initial model, Valhalla prepares a presentation consisting of a set of video clips from the Hyperbrochure. Tom leaves to do other work, and Joan takes his place.  • User Modal • Each item below represents an important assumption that the syst has made about you. Correcting, the assumptions will clianije the behaviour of the system and the nature of your presentations.  r  us  r ge r g e  Figure 1.3: T h e Valhalla User M o d e l W i n d o w She watches some of the basic introductory material with which the presentation begins, but begins to wonder why the material goes on to talk about graphics, until her gaze falls on the user model window, in which the system has displayed the assumptions that have had the most effect in preparing the current presentation (see Figure 1.3). Joan sees that the system thinks she is male, local, and interested in graphics research. She clicks on the interface to correct the obviously false assumptions and instructs the system to design a new presentation based on this revised model, by manipulating the virtual V C R interface shown in Figure 1.4. The new presentation includes clips about the A I research group and its labora-  13  Current interval:  Go  |  Previous |  Next Pen, ay  - Manual Laserdisc CentreCurrent Frame Numbe  Figure 1.4: The Valhalla Control Window tories, as well as a sequence of scenic views of the Vancouver area and opportunities for entertainment. A brief history of the University concludes the presentation.  1.5  Overview  This dissertation is divided into three parts. Part I is an introduction. Part II is a literature review, consisting of Chapter 2, a survey of existing approaches to the modelling of agents, and Chapter 3, a survey of the broad spectrum of authoring systems with respect to a number of characterizing dimensions described in Section 3.1. Chapter 4 is a review of relevant theoretical material upon which the contribution is built, including hypothetical reasoning, probability and decision theory. Part III describes the theoretical and practical contributions of this dissertation: Chapter 5 goes into some detail about the reasoning framework adopted to support the intent-based authoring paradigm, and Chapter 6 describes a prototype imple-  14 mentation built to demonstrate these ideas. Part IV concludes with Chapter 7, advancing some generalizations and i m p l i cations of the intent-based authoring and presentation paradigm, as well as some proposals for further work. Although the wide space of authoring sampled in Part II provides many useful directions for practitioners interested in intent-based presentation, the field is largely an unwritten book, waiting for integrative contributions from researchers in user modelling, psychological perception, and artificial intelligence. A number of "scenarios" are scattered throughout this thesis. These are i n tended to give the reader a sense for the philosophy and goals of the intent-based authoring paradigm, and although some of them are unabashedly science-fiction, all behavior described in the scenarios can be implemented by addressing technical and non-theoretical issues. A central component in all example scenarios is the underlying model of the users involved, necessary to achieve the functionality described. F u l l code listings are available over the Internet from the author.  7  The author can be reached at csinger@cs .ubc. ca, and information is available at http://www.cs.ubc.ca/spider/csinger/home.html  Part II Literature Survey and Formal Background  15  16  Scenario: Information Retrieval The reader is warned that the following scenario is science fiction; the intent is merely to motivate the reader with the long-range goals of the intent-based authoring paradigm, which is to move away from telling computers how to do things, to telling them what to do, and finally to just telling them about ourselves. Information Retrieval Dan and Mike are both users of FAST, the powerful information retrieval system of the twenty-twenties. FAST is connected to a bewildering variety of widely dispersed databases on all aspects of human endeavor, and operates -as its acronym suggests- at great speed over high bandwidth networks. When Dan asks the system for the names and descriptions of deadly viruses, FAST begins its response after accessing various cross-indexed medical and historical databases with the story of the eradication of HIV, the last virus to be tamed by medical science. Dan is a doctor. When Mike presents a similarly formulated query, the system begins with a description o/Michaelangelo, the computer virus that threatened to destroy PC diskdrive information on the artist's birthday in nineteen ninety-two, in the dark, depressing, early days of information retrieval. Mike is a computer programmer.  Chapter 2 User Modelling " You have a time machine and you use it for... watching television ? " "Well, I wouldn't use it at all if I could get the hang of the video recorder"... —Douglas Adams D i r k Gently's H o l i s t i c Detective A g e n c y  The loosely defined area of user modelling has grown over the past decade out of its origins in the field of natural-language dialog systems [196], into a wide range of disciplines concerned with developing cooperative systems for heterogeneous user populations [115]. Some of the more prominent disciplines include HumanComputer Interaction, Intelligent Interfaces, Adaptive Interfaces, Cognitive E n g i neering, Intelligent Information Retrieval, Intelligent Tutoring, Active and Passive Help Systems, Hypertext and Expert Systems. Although the specific reasons for modelling agents are manifold, the general appeal of the undertaking is that it promises more effective use of the available communications channels between the agents being modelled. Most of the attention has focussed on user modelling, a special case of agent modelling, where the agent being modelled is the (human) user of an interactive system. The emphasis in this setting is to increase the cooperativeness of the system vis a vis the human;  17  18  this thesis shares the view that user modelling is a likely vehicle to engender more cooperative behavior in human-machine interaction. Kass and Finin [107] provide a useful framework for presenting the user modelling literature by categorizing it along six dimensions. They choose to analyze models as to their • degree of specialization, • modifiability, • temporal extent, • method of use, • number of agents, and • number of models. The present discussion w i l l be related to this framework.  2.1  Overview  A number of classes of research activity have been subsumed under the label of agent modelling. The literature divides fuzzily among natural language understanding and generation, computer aided instruction or intelligent tutoring systems, and cognitive modelling. The thread of information retrieval runs through these somewhat arbitrary divisions [118] [141]. Each of these areas has its own reasons for modelling agents. Recent efforts at developing user modelling shells promise to 1  take the field from the ad-hoc to the systematic. In natural language, there has been growing consensus that models of agents are necessary to both understand references in user utterances, as well as to generate statements that w i l l be comprehensible to the user; modelling is understood to be a necessary component of natural language dialogue [35] [36] [107] [91] [178]. These observations expand readily from text-only to multi-modal and multimedia environments: the presentations produced by a system for the consumption of a user should be tailored to the user's expectations, abilities, and goals. Practitioners of computer aided instruction (CAI) [198] have made use of models of student users (student models) to decide what to teach, and how to teach See [12] for an annotated bibliography of the field, divided into the areas of Computer Aided Instruction, Expert Systems, Knowledge Representation, Logic Programming, Natural Language, the Philosophy of Science, and User Modelling per se. 1  19  it [50]. The interests of this research community address the question of how a user acquires an accurate model of the system (the  system model),  as opposed to  how a system can acquire an accurate model of the user. Cognitive modelling [101] has as its goal the development of psychologically valid models of human cognition. Although researchers in this area generally build computational systems to test their theories, these systems can sometimes form the bases for useful modelling tools. For example, Craddock [53] develops a model for database retrieval which explains aspects of common-sense reasoning i n humans, and which admits of a relatively direct implementation path; his implementation could be used as an episodic knowledge base ( E K B ) that stores user models. Provan and Rensink [168] refer to psychophysical results to support their model of neural connectivity and Marcus [131] also leans on psychological findings to defend his novel approach to parsing natural language; such efforts should not be i g nored when building systems to reason their way to user models from observation of human responses to visual and textual stimuli. Different strategies and technologies have been employed to address the difficult problems associated with user modelling. E.g., default reasoning [103] [50] [60] [13], truth maintenance [98], and Dempster-Schafer analyses [178]. A n early approach to user modelling, which is still the subject of some current research and which is only now beginning to find its way into commercial products, is  Stereo-  typing [17] [45] [170] [169].  A stereotype is a collection of data which typifies a class of users. R i c h [171] defines stereotypes as a means of making a large number of assumptions about a user based upon only a small number of observations, and she pioneered their use in the Grundy system for recommending books to users. The data are usually represented in some kind of logical calculus as statements of belief or knowledge or goals, and there w i l l be as many stereotypes as there are identifiable classes of users. The approach involves first identifying a fixed set of classes of users  a priori,  20  then deciding the membership of an individual in one of these classes, and finally attributing the contents of an applicable stereotype to the individual. The accuracy of user models which rely upon stereotyping depends directly upon the number of pre-determined user categories.  2  The more stereotypes, the better, although the  difficulty of determining which stereotype to apply to an individual grows with the number of stereotypes. The activity of choosing which stereotype to apply is called  triggering. Variations upon the theme of stereotyping have been developed  in a number of directions; individuals can be permitted to inherit characteristics from multiple stereotypes  (e.g., [16]), an approach which affords more accurate  modelling at the expense of more elaborate means for arbitration between applicable, mutually inconsistent stereotypes. Stereotypes may also be ordered into taxonomic hierarchies which provide savings in the space required for the representations: if the stereotype of a computer novice contains typical beliefs of novice computer users about the operation of a computer system, then the stereotype of an advanced user need only contain typical beliefs of advanced users where these conflict with, or do not appear in the novice stereotype. The advanced stereotype inherits the contents of the novice where the latter does not conflict with the former. Taxonomies can simplify the attribution process; if it has been determined that a user is an expert, for instance, there is no need to verify stereotypes higher in the hierarchy. A n extension to stereotyping is what B a l l i m [16] has called "ascription by perturbation," wherein an agent assumes that another agent is similar to itself and therefore attributes its own beliefs to the other. This approach offers considerable power in environments where a large fraction of the agents' knowledge is common. The usefulness of the approach is illustrated by how well it works in our A n early, typical example is the beginner-intermediate-expert distinction exploited in some popular off-the-shelf word processing programs primarily to vary the verbosity of their helpmessages. The assignment of the user to the appropriate category was determined by the user h i m self, thereby solving the thorny acquisition problem. 2  21  own lives; we never know what our fellow humans are thinking or believing, but we rely on our own introspection and perform attributions with internal justifications like: "If I were him, I would be hungry by n o w . . . " A canonical perturbation ascription rule would be: "assume that another agent's view is the same as one's own except where there is explicit evidence to the contrary." [16, p.76]. Humans appear to be remarkably successful with this approach. Chin [45] [44] introduces the notion of double stereotypes. In addition to categorizing users, his K N O M E system categorizes information into levels of difficulty, so that inferences can be represented as relations between user types and difficulty levels. ("Experts know all simple and mundane but only most complex knowledge." "Beginners know most simple, a few mundane and no complex k n o w l edge.") A promising improvement to the technique of stereotyping is to dynamically derive membership categories from episodic databases of user activity. Doppelganger [155] is a generalized user modelling system that uses learning techniques to interpret data about users acquired through a variety of sensors. Applications connect via standard protocols to a server that provides access to user data. Each user model is a point in a very high dimensional space whose dimensions are determined by the available sensors; this point moves through the space as information about the user is gathered. The categories in Doppelgdnger are called community models, which are computed as weighted combinations of member models, "and thus change dynamically as the user models are augmented." Section 7.2 of this thesis considers the use of learning techniques in future work. A strategy related to stereotyping is the use of profiling, which involves giving users control over various aspects of system operation by allowing them to set the values of a prescribed set of parameters. Common examples are the tailorability of the U n i x operating system with scripts and alias mechanisms, and the customizability of the X w i n d o w interaction environment. There is also a wide range  22  of application programs which offer user-modifiable operation (e.g., [54]). Another approach being studied in the research community is the use of planrecognition. Recognizing the plan of a user permits the derivation of his or her goals and intentions. This has been employed to help provide pro-active user feedback when faulty plans are recognized [189]. Problems with plan-recognition have been the management of uncertainty and the prohibitive size of the plan library required for serious applications. Various default reasoning techniques have been applied to the former  (e.g.,  weighted ab-  duction [8]), but the latter difficulty has hardly been addressed (but for an exception, see the P H I system [20]). Plan recognition is usually performed [109] under the strong assumptions that 1) the recognizer agent has complete knowledge of the domain, and that 2) the agent whose plan is being inferred has a correct plan. These assumptions are clearly not universally true: users may find ways of doing things that the designers of the systems had not considered, and users all too often have faulty plans usually based upon faulty or incomplete models of the systems they are using. These assumptions are explicitly tackled by systems which try to model the faulty plans that users might have  (i.e.,  Bauer [20] and Thies [189]), which in  turn have to deal with a potentially infinite number of faulty plans. Plan-based approaches have grown out of basic artificial intelligence research, and applications are few, but some work is already being done to determine how useful plan recognition can become to a range of interaction environments [86]. A n interesting twist on the user model is the Programmable  User Model  [204].  This tool embodies psychologically motivated constraints in a programming environment which interface designers use to build a model of a user. In principle, such tools w i l l ensure that any task in which a user may engage w i l l be computationally supported in cognitively sensible ways. The remainder of this section examines in more detail some issues of agent modelling that are relevant to the current work.  23  2.2  Issues: Acquisition and Exploitation.  The problem of agent modelling divides broadly into questions relating to the acquisition and to the exploitation of agent models. Agent models must be acquired. Even the use of user stereotypes requires determination of the user's membership class, although the issue can be circumvented by simply asking the user to decide for himself or herself.  2.2.1  3  Acquisition  Acquisition is of two varieties: the system model must be acquired by the user, and the user model must be acquired by the system. The emphasis in this thesis is on the latter process. Kass and Finin consider acquisition along a dimension whose extrema are ' i m plicit' and 'explicit' forms.  4  Explicit acquisition can be as simple as asking the  user to fill out forms or enter descriptive keywords; this data can then be used, for instance, to determine the user stereotype. Systems employing explicit acquisition have been called 'adaptable' [76] (cf. the 'computer as tool' metaphor [108]). Implicit acquisition is generally more subtle, requiring that the model be i n ferred from observation of the user. A n approach that has met with some success is that of monitoring the communication between user and application with the aim of inferring all or part of the user model. For instance, Csinger and Poole [60] [55] employ a normative theory of inter-agent communication based on a Gricean analysis [89] to derive the beliefs of interlocutors in a natural language setting. Their system is implemented in a logical framework for common-sense reasoning [158]. Zukerman [207] presents a planning mechanism which she uses in conjunction A s users are typically not very good at deciding such things [169], other methods are desirable. R e c e n t work makes this distinction in various forms. Laurel, for instance: "Increasingly, systems w i l l need to employ either explicit conversations with people to determine task objectives or impli c i t user-modeling techniques to infer objectives from behavior..." [123, p l 0 7 ] 3  4  24  with a model of user beliefs and inferential capability to predict the possible (perlocutionary) effects on hearers of utterance components. Modelling these effects in this way permits a traditional anticipation-feedback loop for utterance design. These approaches have the common aim of modelling potential perlocutionary utterance effects by recourse to a Gricean model of dialogue. Zukerman continues this line of investigation [163] with a system called R A D A R that generates both responses and queries of two types: disambiguating queries and queries to elicit additional information. Decision-theoretic measures are used to determine the d i alogue strategy. W u [200] uses decision theoretic techniques to decide when explicit interaction with the user is to be preferred over implicit hypothesizing, by maximizing the expected utility of the intervention. The work described in this thesis also moves in this direction; see Section 5.5. Kass and Finin mention a number of other efforts taken along similar lines. The implicit approach to acquisition promises to extend into the multimedia environment, as soon as the nature of interaction with these new technologies can be captured in a set of normative rules. Kass and Finin separate the issue of acquisition into the acquisition of goals, plans, and beliefs, suggesting that acquiring beliefs is the hardest of all.  2.2.2  Exploitation  The exploitation of models is highly task-dependent, but some broad distinctions can be drawn. Kass and Finin's framework identifies the 'method of use' dimension. They present what they imply is a continuum between 'descriptive' and 'prescriptive' models. The difference between descriptive and prescriptive models may be nothing more than the style in which they are employed. Once acquired in some fashion, a model may be consulted for a variety of reasons; if an explanation is sought for the behavior of an agent, then that model may be called a descriptive model. The  25  Figure 2.1: User and system models same - o r another- model consulted with the intention of tailoring system presentations to an agent's knowledge, for instance, is being used in a prescriptive sense. A descriptive model of an agent may or may not be accurate; this accuracy can be tested by comparing predictions of the agent's behavior with actual, observed behavior. A normative model is one which canonizes normal, expected behavior: it can be employed descriptively to explain the behavior of an agent, as well as prescriptively to anticipate the agent. When predictions on the basis of a normative model conflict with actual, observed, behavior, the observed behavior of the agent can be interpreted as 'wrong' in some sense. The correctness of the normative model is not questioned. Some domains admit of such models, others do not. One agent may refer to its descriptive model of another agent to decide its actions. A typical example of this usage is sometimes referred to as the anticipation feedback loop as found in natural language dialog systems which refer to a model  26  of their human interlocutor to ensure that the utterance under consideration w i l l be acceptable to the user.  2.2.3  Correction: Situations requiring correction.  The acquisition task is never complete. Changes in the system or in the domain of interest may occur from time to time, and w i l l require the user to update her system model. It is the task of the system to present information to the user in a manner that facilitates awareness and comprehension of these changes. This is just the kind of pedagogy C A I researchers have been exploring [198]. Likewise, the user may forget information over time, or the user model may have been incorrectly acquired in the first place; either situation requires remedial action by the system. A number of scenarios present themselves: • Beliefs erroneously attributed to users which they do not have (when these beliefs are, in fact, true, a situation arises called false consensus). • Beliefs not attributed to users which they do in fact have (pluralistic ignorance, special case). This is a serious acquisition problem. • Erroneous beliefs correctly attributed (user misconceptions identified). • Ignorance: simply having no beliefs in respect of the proposition concerned.  B cl  user  a b c d  Bcl y f S  S  eTn  (B  a -b  -ic d  Cl ) USer  Bel  system  a -ib c -d  normative false consensus pluralistic ignorance misconception found  Table 2.1: Errors i n user models.  See Table 2.1 for a summary of these error situations. These categories of modelling error grow more interesting with the number of agents being modelled. For instance, in the case of a C S C W environment where  27  multiple participants are engaged in a negotiation task in which mutual consensus is the desired outcome, a facilitator agent might more easily identify cases of false consensus than any individual participant; the facilitator w i l l certainly be able to act more easily upon this information than any of the individual agents. In negotiation tasks where it is the underlying goal of all participants to maximize joint outcome, some or all of the participants may not believe that some or all of the other participants share this goal. A facilitator or mediator agent might be able to act on recognition of this case of pluralistic ignorance to the benefit of the group and its common goal. Kass and Finin advance as one of the dimensions of their analysis the notion of 'modifiability,' intended to distinguish models along a range between those which are static, and those which are dynamic. It is only dynamic models which w i l l admit of correction, and the authors point out that 'user models that track the goals and plans of the user must be d y n a m i c '  2.2.4  Scope: What is Represented?  A model of a user divides naturally into two components: the normative or generic component, and the specific. The generic component models the abilities and l i m itations of normative humans. This includes such quantities as psychophysical^ derived limits to visual resolution (see Section 3.4), color preferences and even certain typical pathologies as colorblindness. Although this component may need to be modified for 'abnormal' users, it can in principle be acquired once and for all from psychophysical studies and putative cognitive models. The specific component relates to the goals and beliefs (e.g.) of a specific, i n dividual user. It is distinguished from the generic model in that it must be acquired and maintained for each individual. While the generic component w i l l be useful to ensure that systems present information in cognitively sensible ways, it is the specific component which w i l l induce adaptable, user-sensitive cooperative oper-  28  ation, and is the target of the present investigation. Kass and Finin distinguish between models which are 'individual' and those which are 'generic,' recognizing that there is a continuum between these extremes. The stereotyping approach outlined above lies somewhere along this continuum, particularly since various hybridizations are possible, such as creating a hierarchy of stereotypes to better accommodate variation in agents without dramatic i n creases in storage requirements.  2.2.5  5  Extent and Adaptivity  Kass and Finin also discuss the 'temporal extent' of the model, arguing that the useful lifetime of information varies. The maintenance of the model should be subject to conditions of controlled 'forgetfulness,' where information about the agent that is judged to have outlived its usefulness according to some criteria is deleted, or forgotten. This notion is not pursued in this thesis; the issue of how elaborate a temporal representation is necessary is orthogonal to the investigations of this dissertation. The information in the models should be faithful to the composition of the user population, which may change. Predefined user categories such as conventional stereotypes may be inaccurate, misconceived, or out-of-date. Various means of adapting to the user population have been considered, from as early as the Grundy system [171], to the approach described in this dissertation. S u c h an approach is immediately suggestive of an object-oriented agent model. The objectoriented methodology would y i e l d the dual benefit of clean and separable information structures for the model w h i c h allow inheritance mechanisms, as well as well-defined accessibility v i a methods. See C h i n [45] and Wilensky [199] for suggestive leads in this direction. 5  29  2.3  Multiple Agents, Multiple Models  In general, there w i l l be more than one agent involved in a collaborative process, and the system w i l l likely need to model some or all of them. Even in existing i n teractive systems, there is a need for some sort of multi-agent modelling. Kass and Finin mention medical diagnosis systems, in which both the user and the patient are to be modelled. Even though the user and the patient may be one and the same individual (doctors get the flu too, after all!), it is to the distinct roles in the task domain that the operation of the system is sensitive. This issue w i l l emerge at various points in this thesis. In general, though, a separate model may be required for each agent-role. Certainly in the case of systems designed explicitly to support the cooperative work of more than one individual—i.e., systems designed to support collaborative w o r k — a n d particularly for the class of such systems called decision-support tools, explicit models of multiple agents w i l l be required. The existence of multiple users is considered in Figure 2.1. Acquisition relationships are suggested by the directed arcs. Not shown are models that users might have of other users. Moreover, it may turn out to be necessary to model agents' models of other agents, including the reasoning capabilities of these agents [13]. i Kass and Finin distinguish between multiple agents on the one hand, and m u l tiple models on the other. They suggest that multiple sub-models might need to be maintained for each user, since their levels of expertise may vary between (sub-)domains. They appear to consider only the stereotyping approach in their discussion.  2.4  User Modelling Shells  Parallel to familiar developments in other areas, there is growing interest among the user modelling community in User Modelling Shells. Just as user interface  30  management systems ( U I M S ) alleviate part of the systems implementation burden and enable cost-effective generation of non-trivial interfaces, a user modelling shell ( U M S ) provides services for implementing non-trivial modelling capabilities. Just as expert system shells made advanced knowledge-based techniques accessible to systems developers, U M S ' s promise to promote transfer of advanced user modelling techniques and technologies into application development environments, which in turn promises to fuel a new round of research and development in the field of user modelling. Current U M S research focusses on developing "integrated representation, reasoning and revision tools that form an 'empty' user modeling mechanism. When filled with application-dependent user modeling knowledge, these shell systems would fulfill essential functions of a user modeling component in an application system." [115]. G U M S [75], B G P - M S [113] [114], and U M [111] are all efforts at implementing these functions, and all of them rely on the stereotyping approach described earlier. G U M S [75] permits only single-inheritance in the stereotype hierarchy, and users can belong only to a single stereotype. If new observations about a user i n validate his or her membership in the current stereotype, G U M S moves upward through the stereotype taxonomy to a more general user stereotype. Revision of the user model in G U M S therefore results in a loss of information. B G P - M S [113] [114] (Belief, Goal and Plan-Maintenance System) represents assumptions about the user in an extension of Prolog, and employs multiple i n heritance in a partition hierarchy to extend models of individual users. The system provides various development and run-time services for developers of applications requiring user modelling. The developers of B G P - M S state their intention of adding an automated truth-maintenance system to provide incremental consistency of the user models, and they are investigating the use of modal logics for  31  increased expressiveness of the user models. User Modelling Tool ( U M T ) [30] is a general purpose user modelling shell whose approach to user modelling falls into the class of assumption-based user modelling because it uses an assumption-based truth maintenance system ( A T M S ) to maintain the consistency of the user models. Stereotypes and production rules are the techniques used in U M T to generate and activate the models per se. A L I S P implementation for Symbolics environments is available [31]. Although these efforts are perhaps somewhat optimistic in view of the early state of user modelling technology, they point the way towards the first commercial implementations.  2.5  User Modelling: A Definition by Consensus  A frequently asked question at the 1992 User Modelling Workshop was whether User Modelling was somehow the same as Interface Design, or more generally, i f there was anything that was not User Modelling [4]. This question naively presupposes some consensual definition or understanding of what user modelling actually is, and demonstrates how easy it can be to mislead even self-professed practitioners in the field with the term user-modelling. B y 1994, the user-modelling community had outlived its detractors and outgrown the workshop format. Although practitioners were now willing to use the term without embarrassment, and as if they all agreed what it meant, there was still no "definition" by consensus at the 1994 User Modelling Conference, which enjoyed contributions from an even wider variety of disciplines. A number of things in particular are not intended with the use of the term in this thesis. For instance, no cognitive adequacy of any sort is intended for the models described here; our approach to modelling agents does not purport to accurately represent the human users of systems to some arbitrary degree.  32  In its broadest interpretation, user modelling is nothing more than applying the user-centred view to the design and implementation of systems. Whether the system in question is an advanced user interface, a C A D system, a database retrieval system, or some other more or less advanced system, when the designers of these systems adopt the user-centred view described herein, they are engaged in user modelling. In the rest of this document, the use of the terms user modelling or agent modelling is somewhat more specific, referring to the acquisition or exploitation of explicit, consultable models of either the human users of systems or the computational agents which constitute the systems.  33  Scenario: Intelligent Meeting Support This is another motivating scenario, illustrating  that models of individual users  can be exploited at run-time to tailor the form and content of presentations.  Intelligent Meeting Support The geographically are participating  dispersed executives of InterSpect Systems Consulting Corp. in a meeting via the services of WOW, the intelligent meeting  support system that has gained a great following  in the early years of the twenty-  first century. Participants benefit from a variety of media services including video, graphics, and text, all transmitted over high-bandwidth communication  links.  Maria, the ebullient manager of central operations, is holding forth on the need to maintain quality by hiring only the best candidates for positions now vacant. Ralph, the chief accountant of InterSpect, is silently preparing afinancialreport indicating that it might be difficult to pay the best candidates what they are worth, given the combined effect of the current state of the national economy and the company's debt load. Fred, a member of the board of directors, hates numbers. Accounting bores him, and pie-charts infuriate him, but he is fascinated by a new tool for 3D visualization of multivariate data. As Maria finally winds down at the urging of the WOW facilitation manager, which has determined that she really has had quite enough bandwidth to make her point, Ralph's report is made available to the other participants. Maria, who has a great grasp of numerical data, and who actually savors their visual impact on a page, sees in a corner of one of her displays the rows and columns of an old-fashioned spread-sheet representing Ralph's fiscal objection to Maria's policy. Fred sees a multi-dimensional scatterplot constructed by WOW to make maximal use of the capabilities of his visual system, and other participants experience other varied events in these and other modalities. When Ralph draws the attention of other participants to Payroll Item Number 2070, perhaps only by uttering the expression in English, not only is his utterance simultaneously translated into the (natural) languages of the other participants and optionally presented on their audio channels, but these other events also oc-  34  cur: 1) A set of relevant numbers is highlighted on Maria's display, 2) when he is attending to it, the dimensions of Fred's scatterplot are interchanged to perceptually emphasize the data referred to by Ralph, and 3) similar events are experienced by the other participants.  Chapter 3 Authoring  Every style is but one valid way of looking at the world, one view of the holy m tain, which offers a different image from every place but can be seen as the everywhere. —Rudolph Arnheim  Art and Visual Perception: A Psychology of the Creative Eye,  final paragraph  Style takes its final shape more from attitudes of mind than from principles of c position, for, as one elderly practitioner once remarked, "writing is an act of f not a trick of grammar." ...style is the writer, and therefore what a man is, ra than what he knows, will at last determine his style. —The Elements of Style, This chapter takes a broad view of Authoring research and samples the range of work at a number of relevant focal points. To provide some order for the descriptions to follow, and to situate the work leading to this dissertation, the following dimensions of analysis are used.  35  page 84.  36  3.1  Dimensions of Authoring  The work is grouped by the tasks they are designed to support. The work of M a c k i n lay, Casner, Roth as well as some of Feiner's is addressed under the heading of Automatic Exploratory  Graph Generation. Data Analysis,  Pickett and Grinstein are representative of work in  while Ware and Beatty, Cowan, and Cleveland are pre-  sented along with Psychophysical  Research performed by psychologists. Feiner,  and M c K e o w n , are best considered in the context of Multi-media. ers are discussed under the label of Computer  the themes of Hypertext  and Hypermedia  Supported  Cooperative  B l y and othWork, and  appear in a number of guises under these  headings, as well as in their own section. Structured documents, cyberspace, and finally intent-based documents—the core of this dissertation—are explored in separate sections. A number of systems are presented within the framework described above and their interrelationships are shown within the space defined by the dimensions listed below. Deliverable: The deliverable is that which is being authored. There are sometimes [role-dependent] distinctions between artifact and deliverable: the deliverable is not always a physical artifact, and different agents (reflecting different roles) may emerge from the activity with different deliverables: from the author's point of view, the deliverable is the book, while from the publisher's point of view it may be the month-end sales figures. Task/Data Domain: The domain refers to the field to which the task being expedited is relevant. This categorization may also be role dependent: the author of the programming manual may regard his contribution as being to the field of structured programming, while the editor may feel that she is contributing to the field of document design and production.  37  Temporal Scope: Is the emphasis on: presentation and object-level real-time support? representation and meta-level maintainability? Short-term versus long-term storage? Version control? How, in other words, does the system conceptualize and deal with Time? Target Media: The target media are those supported by the system or work under investigation. Varieties of text, graphic, audio, video and any combination thereof are the target media of different systems, each of which may take different views of the relationships between medium and mode. Mode: Flesh and Blood Interface: What attention is being paid to human perceptual capabilities and limitations [57]? To visual, aural and gestural primitives? Different aspects of the visual, aural and kinesthetic modalities define the space of human-computer interaction today. Are there other modes (senses) to be considered? Style: Is there an over-arching notion of style in the system? Is it adjustable? In what ways, by which roles, and to what degree? What degree of inter-presentation coordination is allowed? (See Beach [21] and Cargill [37]). Models: Are there explicit or implicit models of users, systems, agents, roles? H o w are they employed and combined to achieve the functionality of the system? Role relationships:  W h o is/are the Author(s), Reader(s)?  Are they co-  temporaneous? Are they co-present? Other task and domain specific roles include: editor, facilitator, chairperson, manager. The broad view allows for temporal shifting and stretching of the authoring process. The locus of creation of presentation can be shifted along a continuum from conventional author, to situations where the presentation is decided by a combination of conventional author and conventional reader, to situations where the presentation is designed with no input from any single agent resembling a conventional author (e.g., a computer-generated event log  38  of system activity). There are also questions about the number of actual authors and actual readers, and whether activities by these agents are simultaneous and distributed. A n notating is a special case of non-simultaneous authoring by one or more potentially distributed authors.  Heterogeneity: This issue ramifies into a number of questions at varying levels of analysis.  • Internal (Human) Interface: Is the interface to the system role-dependent? E.g., D o readers and writers have the same interface, or is there a 'modal disparity' between them?  • External Interface (Interoperability): Is the system designed to cope with varied sources of information with differing protocols?  • Organizational Interface: Can different organizations using the same system make use of each other's information spaces?  • Seamlessness: Can users of new systems integrate their existing tools and work-spaces? In particular, do new C S C W tools support the use of existing tools for individual work? (Can seamless transitions be made between individual and collaborative work? [125].) Some transitional elements are bound to remain central for some time: users may want to continue using pencil and paper, and systems should not force them to change their ways.  1  Information and Presentation Spaces: The presentation space is the arena in which the presentation takes place, composed of various media and directed at varT h e r e remain many advantages to hardcopy that w i l l be difficult to displace, so for the foreseeable future, usable, practical electronic systems w i l l need to interface with the paper and backof-envelope worlds. 1  39  ious modes; the information space refers to that which is presented. These definitions provide useful categorizations of authoring systems. • Static, or changing, dynamic information space? • Static, or changing, presentation space? • Static, or changing, presentation? • Shared Information or presentation space. Particularly confounding problems arise when information is shared between synchronous collaborators. One of the important motivations for modelling users (see Section 2) is the desire to tailor presentations to individual needs, understanding and goals. This desire conflicts with synchronous collaboration when one participant wishes to direct the attention of another participant by direct reference to an element on his private display. This reference may be very difficult to resolve when the referent either does not appear at all on the private display of the second participant, or appears in some different form, or at a different l o cation. (The example on page 33 presented a scenario intended to typify this kind of problem in the domain of intelligent meeting support.) The presentation of identical views alleviates these difficulties by ignoring the problem 2  of tailoring presentation. The survey begins by situating a few traditional authoring systems within the space defined by these dimensions. The familiar  BOOK  is itself a deliverable, across a wide variety of domains,  not the least of which is recreation or entertainment. Its scope varies from the short-term throw-away paper-back to the archival of information in weighty, hardbound tomes; the information in a book is not typically presented in real-time, except at poetry readings and for the delivery of bed-time stories to children. Readers 2  K n o w n as W Y S I W I S [125], or  what you see is what 1 see presentation.  40  can interrupt their progress at w i l l , and the subject matter can refer to remote points in time. The target media is clearly paper of some sort or another, upon which the information is available to humans via their visual sense, unless they are in Braille, in which case they refer to the tactile sense; books also provide a pleasant kinesthetic feel. The pages of a book typically adhere to a discernible style, which lends familiarity to the on-going reading process. The only kind of modelling is of the implicit variety, undertaken at the time of writing of the book, and which reflects the author's desire to appeal in some fashion to the reader. Books typically admit only the roles of author and reader, though other roles are hidden in the production process: the touches of editors, publishers and translators sometimes remain visible in the final copy. There is usually -though not always- a single author and books are generally printed with the hope of attracting more than a single reader! Books fit seamlessly into the lives of most humans, and what problems exist are accepted out of long habit: books can be carried in hand-bags and attache cases and read on crowded buses during rush hour, as well as in bed with a flashlight. Both the information (content) and the presentation spaces (form) of a book are static. LETTERS are like books in most respects, differing notably on the dimension of role; there is typically a single reader to a letter, or a small number of secondary readers who may appear on a carbon-copy list in the document itself. A CORRESPONDENCE is a sequence of letters between individuals, and therefore admits of a temporal aspect. A M O V I E is a 'book' whose target media is film or video, and which is accessible to human viewers via the visual and aural modes or channels (notwithstanding the efforts of Odorama and Sensurround to expand the viewing experience to other modes!) The scope is usually real-time with respect to presentation, although a video tape can be stopped and rewound at the viewer's discretion. A traditional theatrical PLAY exhibits aspects of the movie, with the added  41  complication that the audience can affect the performances of the actors in various subtle ways; the presentation space is dynamic. The notion of I N T E R A C T I V E THEATRE  (e.g., theatre-sports) amplifies these audience-feedback effects [123]. It  is not as easy to rewind a live performance as it is a video tape. A V I D E O G A M E of the arcade variety uses a combination of graphics, video, sound, and sometimes motion feedback to provide a multimedia simulation of an alternate reality. Some games permit multiple players to share the alternate reality (shared information space), either on the same video screen (shared presentation space) or on separate terminals. The video game is even further along the interactivity continuum, permitting its users to modify not only how information is presented, but its content as well; the information space is dynamic. Some home video games maintain simple models of their relatively small sets of players in order to recall preferences, and perhaps to restore the state of a suspended game. These simple, familiar examples illustrate the use of the dimensions of authoring; the remainder of this chapter surveys the esoterica of authoring, using the same dimensions.  3.2  Graph Generation So I sat down and wrote a program that'll take those numbers and do what you like with them. Ifyou just want a bar graph it'll do them as a bar graph, ifyou want them as a pie chart or scatter graph it'll do them as a pie chart or scatter graph. If you want dancing girls jumping out of the pie chart in order to distract attention from the figures the pie chart actually represents, then the program will do that as well. •—Douglas A d a m s  Dirk Gently's Holistic Detective Agency The automatic presentation of information has occupied its share of the A I as  42  well as graphics literature. Early work, in particular, recognized the strong re3  lationships between knowledge representation and (graphical) presentation which characterizes the current work. For instance, Zdybel et al. [205] define an Information Presentation System (IPS) as a system that: 1. Automatically generates displays according to content-oriented specifications 2. Provides a systematic basis for interpretation of user graphic input 3. Functions reasonably well without demanding custom-tooling to a particular application 4. Is easily extensible to satisfy domain and user-specific display requirements. They further state (1) that their "view of an IPS is that it is itself a knowledgebased system," (2) that a "high degree of sensitivity to the human end-user can be built i n , " and (3) that "an IPS is a place to embody a consistent set of decisions about the human factors of graphic display." The first point is basic to this thesis, that presentations can in some sense be expressed in and even derived with some form of logical calculus. The second point takes aim at the issue, also central to this thesis, of user modelling, and the third refers to psychophysical issues addressed in this chapter under the heading ' flesh-and-blood,' and embodied in the implementation described in this dissertation as perceptual salience. In their description of the View System, Friedell et al. [79] discuss how "the graphical presentation of data is tailored to the user's identity, task, and database query." Pre-defined presentation plans are chosen by a best-first search mechanism. They recognize the potential of encoding knowledge about graphical presentation in the form of what they call 'synthesis operators [80], and also discuss I n this dissertation, "graphics" means virtually any technique used to produce visual representations of information; this w i l l naturally include business graphics l i k e bar graphs of corporate data, 2 D and 3 D drawings generated with and without the aid of computers, and visualization techniques for multivariate data. 3  43  "reasoning about how to select and combine ...primitive elements of object descriptions." Even the possibility of reasoning with multiple media appeared in the literature of the early 1980's. Neiman [143] writes: ".. .the use of the knowledge structures to generate multi-modal output demonstrates the generality of the knowledge representation techniques employed." His system was used to generate explanatory, data-driven animations for users of a C A D system, coordinated with natural language explanations. The fields of graphics and A I begin, unfortunately, to diverge at about this time,  4  and the potentially very fruitful area of automatic presentation defined by the i n tersection of these fields has suffered for it. Nonetheless, a few researchers have continued in this tradition, and their work is reviewed in the remainder of this section. The work of Mackinlay [129] [130] is the first to explicitly address expressiveness and effectiveness criteria for visual presentations. He concerns himself 5  with relational data (this is the domain) and restricts himself to graphical languages commonly associated with business graphics (the deliverable). The presentations are designed for traditional screen or plotter devices (target media). Although there is no explicit modelling of users, the system embodies explicit knowledge of various media. There is considerable effort to render presentations in accord with human perceptual abilities and limitations: the system can thus be seen as modelling certain normative human characteristics, but this model is static and acquired from psychophysical studies in the literature (flesh-and-blood). There is no over-arching notion of style by which one presentation might be coordinated with another, except in special cases when a presentation is broken into, say, two line graphs with T h i s c l a i m is based upon the non-incidence of graphics and presentation papers i n important A I conference proceedings after 1983. This regrettable schism appears to be under repair by the mid-nineties. 4  For basic background on graph representation see Bertin [24] [46] [47], or [57] for a survey. 5  [25]  [26] and Cleveland [48]  44  Deliverable Task/Data Domain Scope Target Media Flesh and B l o o d Style Models Role Relationships Heterogeneity Spaces  Business graphics Relational data Not applicable Traditional screen and plotter devices Implicit psychophysical limitations Not really Explicit: media limitations; Implicit: Human perception Single author, multiple reader Not applicable Static information and presentation  Table 3.1: Mackinlay and related work.  a common axis. The prototype developed by Mackinlay is called APT, and has the following typical role-relationships: Single-Author and Multiple-Readers who are not necessarily mutually co-temporal or mutually co-present with the Author. The presentation is static, prepared for a particular data/medium combination; there is therefore no particular support for heterogeneous environments, although the same data may be presented differently in environments which differ in the available media. See Table 3.1 for a summary. Mackinlay's work is interesting both in its own right, and because it has been the starting point for a variety of efforts by other researchers. Roth [175] attempts to characterize semantic dependencies i n the data to be presented. H e describes the static categories of data types,  relational-structure,  arity, relations among relations, and recognizes the need to represent the dynamic needs of users. Because he distinguishes time as a separate data type, he is able to make special arrangements for the display of temporal information. Casner [38] undertakes to include task descriptions in the process of automating graphical presentations. His notion of expressiveness is at a much higher level  45  than Mackinlay's, and derives from a logical formulation of the task in which a user is involved. Casner's system embodies generic models of human perception which are consulted to produce presentations in support of particular tasks, and which w i l l minimize the perceptual and cognitive demands placed on users engaged in these tasks. The effectiveness of the corresponding presentations is measured by reaction-time regression studies of users engaged in five tasks in the airline reservation domain. Marks [132] has addressed the issue of avoiding unwanted conversational i m plicatures in a coordinated text-graphics environment. One of the novelties he claims is the inclusion of relations that describe the perceptual organization of symbols. Canonical organizational principles he mentions explicitly are sequential layout, proximity grouping, alignment, symmetry, similarity and ordering. Rather than fabricate some convincing post hoc argument for the necessity of paying attention to perceptual issues, he properly and soberly points out that ".. .it is virtually impossible to design meaningful network diagrams for which no perceptual organization w i l l occur." Henry and Hudson [97] have taken what might be called a semi-automatic approach to the generation of graphs. Their paper describes a system which supports a user in the exploration of large graphs, primarily by allowing iterative, interactive refinement of layout algorithms. Although the authors do not explicitly mention modelling of the user, this paper is interesting in the present context because the observations about layout may be relevant to navigation through abstract i n formation spaces as well. It may not always be the case that (hypertext) systems need to display to their users a graph of all or part of the underlying database, but it w i l l always be the case in appropriately large systems that only some subset of the information can be presented. This puts the emphasis upon user-centered means of deciding what 6  6  T h e idea o f an "intelligent z o o m " [19] changes this characterization in that while the entire  46  should be presented, and how it should be presented. Approaches like the one described in this thesis might be used to select a critical subset of the links in a hypermedia document, which might then be explored by a user. This critical subset is continuously subject to re-evaluation, in the course of new input from the user, and from other sources of information as may be available to the system. N o i k [150] pursues another approach to automatic graph layout, taking the notion of the fisheye view to multiple focal points in hierarchically nested structures. This leads to the idea of using some technique to determine the focal points from a user model. Although most reviewers would likely place Feiner's work in the context of visualization or multi-media (as has been done here with some of his work), some of his efforts can be seen to contribute to the field of automatic generation [71].  3.3  Data Analysis and Visualization  Or you can turn yourfiguresinto, for instance, aflockofseagulls, and the formation theyflyin and the way in which the wings of each gull beat will be determined by the performance of each division of your company. Great for producing animated corporate logos that actually mean something. But the silliest feature of all was that if you wanted your company accounts represented as a piece of music, it could do that as well. Well, I thought it was silly. The corporate world went bananas over it. —Douglas Adams  Dirk Gently's Holistic Detective Agency  Some of the visualization work being undertaken today is of interest here for several reasons. Researchers in this field have recognized the importance of the human in the H C I loop, and cooperative systems are being designed to take full space is rendered, only parts of it are rendered legibly; such approaches have the advantage of prov i d i n g some context for the viewer.  47  advantage of human perceptual abilities. To automatically achieve the communicative goals of the author in his absence, systems must be able to present information appropriately, taking advantage of human perceptual abilities and avoiding its limitations. The best known approach to data visualization is the scatterplot [90] [197]. The success of this technique is due to the ability of the early vision system to group points in space based upon proximity and similarity in color, size and shape. Ware and Beatty have shown that up to five dimensions can be effectively mapped to a full color scatterplot display, and suggest ways in which the visual effect can be maximized. Were it not for the need to detect patterns in data of arbitrarily high dimension, efforts might have stopped here. A n increasingly popular approach to enlarging the dimensionality of displays is what has been variously referred to as iconography, and geometric coding. This approach employs a generalization of the traditional graphic primitive, the pixel, into a parameterized icon whose features are mapped to distinct dimensions of the data stream. A famous example which proved more useful as a characterization of the method than representative of its success, is the ChernoffFace  icon family [43].  t \ ) (a)  Figure 3.1:  (b)  (c)  EX VISfive-dimensionalstick figures  The generalized icon (gicon) is a generalization of the pixel to higher dimensions. The strategy has been to allow the information in different channels of the  48  Figure 3.2:  Iconograph  input data to control corresponding pixels in each gicon. In general, the gicon is an nxm  array of pixels, each mapped to a different input channel. The available dis-  play surface is then tiled with these icons. The logic of this and related approaches is that the number of information channels which can be displayed is increased: "Geometric coding allows for further and far reaching extensions [over color] of dimensionality. Observers can utilize shape perceptions to sense the combinations of data at each location and texture perception to sense how those combinations are spatially distributed." This was the rationale behind the Chernoff Face icon family, as well as the stick figure family described by Pickett and Grinstein [157]. The latter is a stick figure consisting of several connected line segments, where the angle of inclination of each limb is controlled by a different dimension of the numerical data to be visualized. Figure 3.1 presents a representative member of the aforementioned stick figure family, and Figure 3.2 shows how large numbers of them interact to produce global perceptual effects from the underlying data set. 7  7  T h i s figure depicts satellite imagery data from the western tip o f Lake Ontario; the data was  49  Deliverable Task/Data Domain Scope Target M e d i a Flesh and B l o o d Style Models Role Relationships Heterogeneity Spaces Table 3.2:  Graphic Relational data Not applicable Conventional CRT, also audio Its raison-d'etre Not applicable Not applicable Not applicable Not applicable Static  Data Analysis and Visualization.  The search for effective icons is also lead by studies of pre-attention (see Section 3.4): "Shifts along certain dimensions of color, shape and motion of elements lead to preattentive discrimination, and it is variation in these dimensions that we must seek to bring under data control in our texture displays." Some early implementations have been described in the literature. The E x ploratory Visualization (Exvis) project [185], for instance: is a multi-disciplinary effort to develop new paradigms for the exploration of data with very high dimensionality. The fundamental philosophy behind Exvis is that data representation tools should be driven by the perceptual powers of the human. In addition, the interpretation of data of very high dimensionality will be maximized only when we learn how to capitalize simultaneously on multiple domains of human perceptual capabilities. This project is in the early stages of exploring the possibilities of iconographic data representation using sound attributes, along with the integration of auditory and visual displays into a single unified data exploration facility. (See also G r i n stein et al. [90] and Pickett [156] and the summary in Table 3.2.) collected by a U.S. Air Force Geophysics Laboratory weather satellite [90].  50  3.4  Psychophysical Research  Basic work in psychology has resulted in improving models of human perception. M u c h of this work (e.g., [144] [191] [190] [193] [192]) has been concerned with elaborating a putative dichotomy between processes which are pre-attentive and those which require attention. Pre-attentive processes are characterized by their speed: they are fast, typically accomplished within 100ms, suggesting that they are performed in parallel by the human perceptual system. Such processes are sometimes referred to as automatic, parallel, or early-vision processes. Although such a dichotomy is conceptually attractive, it has been increasingly unable to account for the data, and new models are appearing which refer to a continuous ranking of perceptual difficulty. Pre-attentive tasks are at the extreme 'easy' end of this continuum, while tasks requiring attention are at the other, 'hard' end of the scale. Other researchers (e.g., [137] [66]), while also interested in developing a basic perceptual language, are not so concerned with the underlying psychological model.  Color Considerations  Color deserves a separate section in this document for  several reasons. Our world is, for most of us, a very colorful place, and the value of color should not be ignored in the computational models we build, of people, and things. Color is, not surprisingly, one of the most effective psychophysical stimulus dimensions. Even in the absence of a complete neurophysiological underpinning, a tremendous amount of informal as well as empirical information is available on the use of color to accomplish various communicative tasks. Not only is the use of color in visualization powerful, but the kind of knowledge we have of color capabilities provokes questions about other human perceptual capacities.  Ware and Beatty  show that it is possible for human observers to perceive  five data dimensions simultaneously using color [197]. The data they used was  51  characterized by a hyperellipsoidal probability density distribution, but they conclude with respect to the generality of their results that: "colour is likely to be effective in assisting in the perception of correlations in multidimensional space." Although in most cases they found that adding color was expressively equivalent to adding three more spatial dimensions, color is not a completely heterogeneous perceptual space, and "resolution is worse in some directions than in others." In particular, when clusters are separated along dimensions which have been mapped to color, perception suffers. Clusters are perceived as distinct when they are separated by between three and five standard deviations along most of the possible vectors; "much greater cluster separation is necessary before two clusters can be resolved" when they are separated on [only] " a few" specific color vectors. They observe that users require no training to use their color-based fivedimensional visualization tool, but point out the importance of control over the background color, which tends to emphasize particular colors in the display, and consequently particular correlations in the data (this is a perceptual  implicature,  which systems can attempt to mitigate, or to anticipate and exploit to emphasize data of interest).  Murch  also gives an interesting summary of the use of color, from the point  of view of a graphics practitioner [140]. He distinguishes between the qualitative and quantitative uses of color, and provides a stimulating list of guidelines for the effective use of color derived from physiological, perceptual and cognitive studies. M u r c h also provides a list of the sixteen best and worst color combinations. This is the kind of knowledge that w i l l need to be consulted by mature automatic presentation systems.  Benbasat  undertakes a series of empirical investigations on the impact of  color on presentation [23], and the impact of presentation on a variety of manage-  52  rial decision-making tasks [22] [126]. These investigations are representative of a thread of the literature entirely separate from what has been thus far cited in this paper. The results of Benbasat's work are consistent with those of Mackinlay. There is a wide range of sometimes contradictory information available on the use of color to represent information in visual displays. M a k i n g use of this information is difficult, however, and unless there is a pressing need to convey a large number of categorical variables, a designer is best off with gradations of a single hue to represent changes in the value of a quantitative variable.  G r a p h i c a l Semantics  ested in the  meanings  Montalvo [136]  [137] and Grosz [132] are both inter-  of graphical displays. Grosz has also done much work in  computational linguistics, in a purely textual environment; some ideas have exported well to other media. A more pragmatic approach is detailed by Kurlander and Feiner [120]. These and other similar guidelines are suggestive of a beginning for a database of default axioms for reasoning about presentations of information. Results of these studies are of direct consequence for designers of humancomputer interfaces.  53  3.5  Multimedia Words are too solid they don't move fast enough to catch the blur in the brain that flies by and is gone... —Suzanne Vega,  Language  One day while walking along the road, the monk Gisho met his master, who was blindfolded. Gisho asked, "What have you seen?" and his master replied, "Thewind whistling in my ears." —Zeutuban III  B y definition and by convention, a system w h i c h makes use of more than one m e d i u m is a multimedia system. 8  9  In its most liberal interpretation, the rubric " m u l t i m e d i a " would cover any system which employs the sound of a bell to direct user attention. Typically, though, the term refers to systems with more or less esoteric applications (by today's - o r yesterday's- standards) of graphics or video. M u l t i m e d i a systems promise great improvements to interfaces, and afford scope for new types of interfaces. F o x [78] writes of the possibilities i n computer aided instruction: If computer systems can develop accurate models of user knowledge, including goals and plans as well as facts, multi-media information might then dramatically improve the bandwidth and effectiveness of instruction. S o m e confusion exists i n the literature(s) as to whether it is medium or mode which is being multiplied, and the terms multimodal and multimedia are often conflated. This thesis is not the place to embark upon yet another religious tirade designed to settle the issue; the terms w i l l be used loosely where such usage is harmless. 8  mul-ti-me-dia 'me-d-e--e (1962) :using, involving, o r e n c o m p a s s i n g s e v e r a l m e d i a <a m u l t i m e d i a a p p r o a c h t o (Webster's 7th Dictionary, on-line copy). 9  learning>  54  Similar statements can be made for other areas of investigation, and in particular for the general goal of user-tailored run-time presentation. The rest of this section reviews research in multimedia presentation.  Feiner  M u c h of Feiner's work has been directed at identifying and resolving  multimedia presentation issues [72] [69] [70], and he has given consideration to the use of models (albeit static ones) of tasks, objects and their interactions, and of models of user knowledge. These models have been brought to bear on the presentation task to determine which objects are to be included in a rendering of a scene, the level of detail with which they should be rendered, and what if any special visual effects are to be employed. For instance, if the task in which a viewer is engaged involves the manipulation of a complex piece of machinery, its functional parts may all be rendered as three dimensional solids; if the task is to disassemble the machine, an exploded view or the use of transparency may be called for to show the inter-relationships of the working parts. Feiner's work has been primarily research-oriented and has not resulted in any commercial products. Other work by Feiner et al. also addresses automated generation of graphical presentations. His work with Peter Karp [105] addresses the automated generation of animated presentations with the use of models of knowledge about filmmaking, and points out the advantages; not only does design-time authoring support for animated presentations decrease their cost, he points out that "automation could ultimately make it possible to generate presentations on the fly that are customized for a particular viewer and situation, adaptively presenting information whose content cannot be fully anticipated." Feiner does not pursue this goal, but we do in the current work. Explicit models of users do not play a part in the prototype associated with Feiner and Karp's work, but the notion of intent-based presentation is beginning here to take hold. With Seligman [183], Feiner pays particular attention to the communicative intent behind the presentation; this is where the term 'intent-based'  55  first appears. The Intent-Based Illustration System (IBIS) designs illustrations with a generate-and-test approach using a rule-based system of methods and evaluators. The former are rules that specify how to accomplish visual effects, while the latter are rules that measure how well a visual effect is accomplished in an illustration. The evaluators can be thought of as psychophysical^ motivated constraints on the presentations.  10  Still more work explores the coordination of different media (text and graphics in particular) in single presentations [73]. WIP  The W I P project at D F K I under the direction of Wolfgang Wahlster is sim-  ilar to Feiner's IBIS system in that it too uses underlying static models of objects and tasks to determine the layout of a presentation. The W I P researchers say of their approach that it "should be understood as a starting point which is of practical use for the automatic synthesis of various kinds of pictures... in a multimodal system." [173] These statements are made in the light of the realization that there is not yet a mature theory of graphical communication upon which to build such systems. WIP's most notable incremental contribution to automatic multimedia generation is the implemented co-dependence of its content-design and format-presentation engines: the execution of these units is temporally interleaved, permitting feedback from later in the presentation pipeline to affect and even retract earlier design decisions. For instance, if the system has decided to show a picture of an espresso machine as part of the instructions for its use, and has begun laying out textual l a bels of all its functional parts only to find out that one of the labels is too long to fit within the boundaries of the part, it may decide to lay out the other labels in a different way or to draw the entire espresso machine from a different perspective that would support the original labelling method. 1 0  S e e "perceptual salience" in Section 5.4.1.  56  The W I P architecture also lends itself to parallelization [195]. ALFresco  Stock [187] [186] describes an on-going effort to integrate natural  language and hypermedia interfaces. The domain is the exploration of 14th century Italian frescoes, and gives users the means to retrieve multimedia information via natural language queries. Stock argues that this approach takes advantage of multimedia to increase the bandwidth of the communication channel between human and system, and reduces the lost-in-hyperspace problem.  F R E S S , E D S , Intermedia a n d InterNote  Work at Brown University has re-  sulted in several generations of systems, each building upon the strengths of its precursors. Early experience with the Hypertext Editing System in the late 1960's and then F R E S S [202] ,  n  and E D S [202] led to the more modern Intermedia [202]  and InterNote [39] systems. F R E S S (File Retrieval and Editing System) is a multi-user hypertext system developed in the late 1960's, and E D S (Electronic Document System) is a hypermedia system developed in 1982 for V A X and Ramtek 9400 color display environments. One of the major differences between F R E S S and E D S is the addition of maps to help users avoid getting lost. Both systems offered support for bi-directional linking and keyworded links and nodes. Intermedia is an electronic document system which is not a separate application, but a framework for a collection of tools that allow authors to make links between standard types of documents created with heterogeneous applications. "The material an application creates is the document.'" Intermedia was designed with multi-user interactive educational applications in mind, and thus emphasizes i n teractive display and annotation facilities. Professors, for instance, "would be authorized to add or change 'canonical' hypertext structure, whereas the students 1 1  Commercially reimplemented i n the early 1970's by P h i l l i p s Corporation [51, p447].  57  Deliverable Task/Data Domain Scope Target M e d i a Flesh and B l o o d Style Models Role Relationships Heterogeneity Spaces  Generalized electronic document General desktop support Version control and maintenance Text, graphics, animation N o explicit perceptual effort Not applicable Implicit Blurred Between applications i n the desktop environment Dynamic information space  Table 3.3: Intermedia.  would be authorized only to add links and annotations." [147] Links i n Intermedia are uni-directional connections between two arbitrary and application-specific objects [92]. See Table 3.3 for a summary. Finally, InterNote is an extension of Intermedia designed to better support small collaborative groups involved particularly with document review and revision. A much-cited aspect of the extension involves the ability to transfer data across links using a technique the authors call warm NoteCards  linking.  12  [93] is implemented within the Xerox L I S P environment, and was  designed to support authors, researchers, designers, and "other intellectual laborers" in their daily toil. A n o t h e r advocate o f non-standard l i n k i n g practices is Schnase et alias [180], who urge that arbitrary computational methods be integrated into any l i n k structure. This enlarges even further upon even unconventional definitions o f authoring; observe that the now popular http protocol for T C P / I P based communication of H T M L - c o d e d information permits just this k i n d of l i n k i n g via the 12  C G I interface. See h t t p : / / h o o h o o . n c s a . u i u c . e d u / c g i / i n t e r f a c e . h t m l  58 A n electronic generalization of the familiar paper index card, the notecard can contain text, drawings or bitmap images, and - m o r e recently added- video sequences. Cards also have titles, and there can be a variety of card types. Typed, bidirectional Links connect notecards into networks. Types are labels chosen by the user to represent the relationship between source and destination card. L i n k sources are anchored to icons, but destinations are entire cards. Browsers are special notecards displaying a structural diagram of a network of notecards; they are created by the system and are thereafter manipulable as any other notecard. They also permit manipulation of the structure of the information space via direct manipulation of the objects which appear on the browser. Fileboxes are special cards used to organize or categorize collections of notecards. NoteCards also provides a limited search capability to locate all cards matching some user-supplied specification, currently restricted to text-string search on titles. The system is in use at numerous locations within Xerox, as well as at universities and government agencies for document authoring, legal argumentation, development of instructional materials, design of copier parts, and competitive market analysis. Halasz [93] makes a number of useful distinctions in order to arrive at three dimensions for analysis. In particular, he explains how systems designed to support browsing differ from those designed to support authoring. A s one extreme of a continuum, systems purely for browsing do not permit modification of the information space. Systems which are primarily intended for authoring are likely to be used by a small number of authors to prepare information for a large number of more-or-less casual readers; these systems w i l l have well-developed creation and modification tools, and w i l l support continuous modification of information structure as part of ongoing task activities.  59  3.5.1 Video Authoring The spotlight follows her for a moment, maybe picking up some stock footage. Videotape is cheap. You never know when something will be useful, so you might as well videotape, it.  —Neal Stephenson Snow Crash, page 33.  No matter how apolitical the producer of the work of art may seem, every work has political relevance of some sort.  —James Monaco How to Read a Film, page 11  The following extract from Davenport et al. [62] might well serve as a manifesto for this thesis, with only a few changes and additions to reflect the underlying communicative goal of the author of the presentation, a role unaccounted for by Davenport: In the case of on-line video servers as well as home-movie editing assistants, the machine must respond to the user by selecting the "best" shots, sounds, and text chunks, then orchestrating or sequencing them to emphasize a particular story. The story content should reflect the user's background and intent. The incentive to provide presentations which have been particularized to the viewer's needs and interests is even stronger with video than with traditional media because time is a precious human commodity, and time is what it takes to annotate, and to view video. Traditional authoring paradigms do not support such run-time determination of form and content. Video is finding increasing use as a transcription medium in many fields because it arguably provides the richest record (the "thickest description" [82]) of the events of interest. Video recording offers high bandwidth, greatly exceeding human note-taking skills and speed; researchers can later review and annotate video  60 at leisure. A n d , increasingly, video is cheap. These attributes virtually ensure a growing abundance of video material for future on-line presentation systems. On the other hand, the limitations of the traditional approach to authoring are most obvious when applied in non-traditional media, such as in the video medium. The raw material must first be acquired, which involves filming and possibly digitizing. From this raw source, video authors must assemble cuts into a cohesive presentation. The raw footage can be very voluminous, and the relevant parts of it very sparsely distributed. Ten hours of video taken during a field study for a new graphical user interface (GUI), for instance, may include many instances of coffee drinking and doughnut eating by the users that may not be relevant to any conceivable presentation. Nevertheless, it takes someone at least ten hours  13  to  scrutinize the footage for something useful. The process of identifying these useful events and sequences has been called annotation, and a number of systems have been designed to expedite it. (See, for instance, Buxton and Moran [34], GoldmanSegall [83], Harrison and Baecker [95], M a c K a y and Tatar [127], M a c K a y and Davenport [128] and Suchman and Trigg [188].) When the author has finally identified a set of cuts he or she deems to be relevant to an eventual presentation, the traditional notion of authoring requires assembling these into their final presentation order. Although quite adequate for creating rock music videos, this approach suffers from the aforementioned limitation, that such a presentation can not be tailored to the needs of individual viewers. It is here that video data diverges significantly from text, graphics and even animation. Video data is inherently uninterpreted information in the sense that there currently are no general computational mechanisms for content-searching video data with the syntactic precision of generalized textual search.  14  Even graphics,  E m p i r i c a l investigations to date suggest that human annotation o f video material takes an order of magnitude more human time than the duration of the video being annotated; see, for instance, Harrison [95]. S e e , however, Cherfaoui and Bertin [42], who use digital image processing techniques to ex13  14  61  because there is usually an underlying model or database that can be queried, and animation, which also has a model or database with temporal information added, can be searched for information using existing computer tools. But frames and sequences of frames in video data cannot easily be queried for semantic content except in fairly specialized domains. A t present, the only practical way of accessing a video database is for a human to first annotate it so that the annotation can be used to guide the author and the viewer. Creating this annotation, at least with the current tools, is an inherently linear operation (in terms of the time required to do it) and is a major bottleneck in the authoring of video documents. It is useful to distinguish between the transcription processes of logging at the lexical level, which lends itself to some degree of automation, and annotation, a semantic/pragmatic task which w i l l require human intervention for the foreseeable future. A log of a meeting can be acquired automatically, for instance, by the Group Support System (GSS) software used by the participants. This log can be subsequently used to index a video record of the meeting to find instances of user actions at as low as the keystroke level and as high as the level(s) of abstraction embedded into the G S S (e.g., "brain-storming session," "open discussion," etc.) [164]. Annotation, on the other hand, is at a higher level of abstraction, defined by the eventual use to which the record of the meeting is to be put. Although the intent-based authoring paradigm can influence the annotation process Csinger [59], the remainder of this dissertation focusses on the post-annotation processes of presentation. In the video medium, selecting the intervals of the record to be displayed, as well as the order in which they are to be displayed, are both serious problems. Previous work in automatic presentation has dealt with some aspects of both tract some types of information from video. Recent work by Goldberg and Madrane on automatic extraction of spatio-temporal indices from video at Eurecom Institut, Sophia A n t i p o l i s , France, is also noteworthy. Refer to Joly and Cherfaoui [102] for a survey of related approaches.  62  of these questions, and has been restricted for the most part to choosing 'the right way' to display items based on their syntactic form [130] [175] (see Section 3.2). Semantic qualities of the data to be displayed are seldom considered; Karp [106] is an exception, where he describes a system called E S P L A N A D E (Expert System for P L A N n i n g Animation, Design, and Editing), a knowledge-based animation presentation planner that uses as input a separately supplied script and a set of communicative goals. E S P L A N A D E creates a presentation plan at the individual frame level, specifying a hierarchy of sequences, scenes and shots. Presentation and transcription are inextricably intertwined. A presentation system can not present what has not been transcribed; the executive can not retrieve all instances of the mention of a competitor company's name unless the minutes of the meeting contain these references, nor can he retrieve everything said by his subordinate, Doug, unless the minutes are appropriately structured. If the meeting involves a G S S , the facilitation function of the G S S can be expected to provide some of the knowledge required for both the presentation and transcription functions. Davenport et al. [62] describe their approach to interactive digital movie making, a domain similar to ours in that they must log and annotate video footage for later retrieval by computer, in the absence of a human editor. Their domain differs in that it permits control over the acquisition of original raw footage. They are also not as interested in modelling the user per se as they are in giving the user meaningful interaction affordances to select variants of the movie. A s movie-makers, Davenport et al. go to some effort to maintain the stylistic consistency of their presentation, an important element with which we have not yet concerned ourselves. Davenport et al. [63] describe "cinematic primitives for multimedia," a set of dimensions along which video shots are annotated for later reference. Their challenge, they claim, "is to develop robust frameworks for representing story elements to the machine such that they can be retrieved in multiple contexts." Our goal,  63  though not focussed on interactive film and storytelling, is similar: presentations always have a 'story to tell,' even i f they are designed automatically. Creators of interactive or multivariant video are interested in preserving "underlying narrative structures" [68, p i 2 ] , a quality not far removed from what is called intent in this thesis. In the absence of a solution to the general problem, the amount of costly human effort currently involved in annotating and browsing multimedia information w i l l only be multiplied with the growing interest in the new technologies. Chapter 6 includes a description of how our approach to video authoring has been applied; this medium was chosen because authoring in the video medium is even harder than in conventional media; there is nothing in our approach that prevents it from working across media boundaries.  3.6  Computer-Supported Cooperative Work  Computer Supported Cooperative Work ( C S C W ) developed over the eighties into a separate field of research in its own right. Work in this area examines the potential use of computational support for individuals engaged in collaborative group work. The field draws on research activity in diverse disciplines including computer science, artificial intelligence, psychology, sociology, organizational theory and anthropology [87, p5]. Not as relevant to the current work, perhaps, as some of the other areas surveyed here, it is included because automatic authoring and presentation systems can be multi-agented, supporting the collaborative activities of multiple authors and readers. The intent-based authoring paradigm as related in this dissertation i n volves agents in the roles of author and reader(s), communicating in general asynchronously v i a the author's intent and the user model of the reader. Although this thread w i l l not be followed in this dissertation, future work w i l l need to address it.  64  A distinction should be drawn between systems which are cooperative, those which support  collaborative  and  work. Although the literature varies in its usage  of these terms, they w i l l be used consistently throughout this document in the f o l lowing way: Cooperative systems are those which exhibit behavior which is sensitive to the needs of the individual user-agent. Collaborative systems are those which have been designed to support the joint activity of a number of user-agents.  Synchronous Asynchronous  Same Place  Distributed  blackboard... bulletin board...  telephone... email...  Table 3.4: CSCW time and space diagram. M u c h of the C S C W research literature has been broken up along the dimensions of support for tasks in which the multiple collaborators operate synchronously or asynchronously, in the same place or remotely, as reflected in the familiar timeand-space partitioning of Table 3.4. A good deal of this literature has been concerned with face-to-face collaboration. With the continual advance of computational hardware technology, and the increasing bandwidth available for information transfer, there w i l l be less need for collaborators to travel for meetings, and a concomitant decrease in the emphasis on face-to-face meeting support technology. This discussion w i l l therefore focus on geographically distributed systems. Halasz [93] writes: Hypermedia is a natural medium for supporting collaborative work. Creating annotations, maintaining multiple organizations of a single set of materials, and transferring messages between asynchronous users are the kinds of activities that form the basis of any collaborative effort. These are also activities for which hypermedia systems are ideally suited. Work in the field of C S C W is so closely associated with hypertext and hypermedia that it is tempting to conflate these issues. Researchers intent upon creating  65  a sense of co-presence, or being higher on "the social awareness scale" [125], or what B l y has called "connectedness" [27] and Laurel "engagement" [123], have jumped at multimedia technologies as part and parcel of the solution, without showing first that such measures are necessary. In fact, users of systems which incorporate video to facilitate face-to-face interaction pay far less attention to the video information than expected [27].  15  Sarin et al. [177] advance the notion of a [real-time] conference as an 'abstract object' and discuss some conference design issues. They identify the following dimensions: shared versus individual views, access control, concurrency control, getting data in and out (from other applications, or from paper, etc.), constraints on real-time conference design. They describe the 'virtual terminal approach' as a way of giving single-user applications the means of serving multi-user conferences: a 'virtual terminal controller' is responsible for multiplexing user I/O.  16  Sarin's work does not get sidetracked with multimedia issues and good advice is to be found throughout the article for practitioners involved with the design and implementation of C S C W systems. Lee [124] presents a system and an approach designed to support group decision making, focussing upon representation of the task-derived components of decision-making processes. In particular, the alternatives are represented explicitly, as are the goals to be satisfied, and the arguments "evaluating the alternatives with respect to these goals."  Z O G and K M S : Shared, distributed hypermedia systems designed for collaborative work.  K M S [3] [203] was developed over the 1970's at Carnegie M e l -  lon University, and then commercialized by Knowledge Systems, Inc. A version F o r further emphases that technology alone is not enough to solve the C S C W problems, see Engelbart's urgings [65]. 1 5  T h i s approach is subsumed by the 'symbiotic interface' proposed by B o o t h and Gentleman [29]. 1 6  66 found application on board the USS Carl Vinson for a variety of tasks. K M S was designed "to help organizations manage their knowledge," and features organization wide support for collaboration in a broad range of areas, including electronic publishing, on-line documentation, project management, software engineering, computer aided instruction, electronic mail and issue analysis. Screen-sized W Y S I W Y G workspaces called 'frames' contain text, graphics and image items, and can be linked to other frames or used to invoke programs. Links are unidirectional, and destinations are entire frames, which are viewed one at a time. There is no mode boundary between navigating and editing, which is to say that there is no system-imposed distinction between reader and writer. These role categories are implicitly conventionalized in the K M S user community, and much of K M S ' s functionality depends upon convention. For instance, rather than provide a separate system-level representation for annotations, these are ordinary links distinguished only by a character prefix provided by the annotator and conventionally understood by the user community as identifying an annotation. Certain functionality within the system depends upon convention as w e l l ; electronic mail service, for instance, depends upon access to shared frames which function as user mailboxes. The designers of K M S have in this way relied upon convention to augment their rather minimal user-interface. See Table 3.5 for a summary. Time-multiplexing of screen real-estate via quick response time is used in favor of multiple-window 'space multiplexing.'  17  Adhering again to a minimalist  principle, K M S does not provide a separate mechanism in support of user navigation; there are no 'overview maps' or other devices to help orient the user in unfamiliar parts of the information space. Instead, the designers claim to have been vindicated in their belief that fast response - w h i c h enables low cost exploration and backtracking- is enough to avoid getting irreparably lost. D i s c u s s i o n s with K e l l y B o o t h have led me to believe that although increases are still possible in both time and space multiplexing as described above, growth w i l l be more bounded in the space 17  than in the time domains.  67  Deliverable Task/Data Domain Scope Target M e d i a Flesh and B l o o d Style Models Role Relationships Heterogeneity Spaces  Document organization General info, manipulation Versioning and maintenance Conventional graphic display Not explicit Local convention Not explicit Implicit, conventionalized Seamless Dynamic shared information  Table 3.5: KMS.  K M S is also notable for its approach to access control. The system supports shared access to a single logical but physically distributed database. Since the basic unit of information is a frame and is thus limited to what can fit in a screenful, any large database w i l l be composed of a large number of frames; this number is generally much larger than the number of users of the database. The designers of K M S use this observation to point out that access conflicts w i l l be rare, and that conventions can once again be relied upon to evolve which w i l l serve to circumvent those collisions which might occur. Access privileges are established on a frame-by-frame basis by the creators of frames. Once again, it is convention, rather than embedded functionality, which governs access to information under K M S . A guiding force in the design of K M S was the desire to permit paper renditions of K M S documents. This is achieved by hybridizing markup notation into the W Y S I W Y G frame displays. This, along with the design intention to support individual as well as collaborative work, contributes to the seamlessness of the system.  68  Shared Workspaces  The main thrust of systems under this rubric is the creation  of collaborative working environments that promote the illusion of 'mutual situatedness.' Although collaborators may be i n physically separated environments, they work in a space which is shared, in the sense that they can refer to objects in that space, relying upon mutual awareness of the terms of the referral as well as of the referent {cf. deixis). Many strategies have been adopted to this end. Sara B l y and other Xerox researchers have explored a variety of shared workspaces. A n interesting experiment was conducted involving the interconnection of the Portland, Oregon and the Palo A l t o laboratories with a network of video, audio and computing technologies called Media  Space  [153]. Similar experiments were  conducted, connecting the Palo A l t o labs with Xerox labs in the United Kingdom. Other work at Xerox has focussed on shared drawing surfaces [28]. Further experiments [135] explored the three-way sharing of the drawing surface. TeamWorkstation [99] [100] is a desktop, real-time system advancing a shared workspace in the form of a sharable computer screen for concurrent pointing, writing and drawing, as well as live video and audio links for face-to-face conversation. The creators of TeamWorkStation emphasize its 'seamless' operation: since the shared screen is video-generated, the individual participants can continue to use the tools with which they are already familiar, including the still ubiquitous pencil and paper. Although the system is currently implemented on Macintosh computers, this same factor w i l l permit interoperability between heterogeneous computers. The technology is used here to facilitate remote interaction, and every attempt is made to avoid interfering with existing work habits. Distributed Knowledge Worker ( D K W ) [96] was developed at the I B M Canada Lab i n recognition of the importance of meetings to the operation of businesses today, and with the awareness that face-to-face settings are not always possible for these meetings. Addressing the issues surrounding remote, real-time meetings, D K W is a minimal support system, providing the means with which to har-  69  ness available bandwidth rather than imposing a structure on the meeting itself: the 'meeting facilitator' component of the system manages rather than controls the meeting. D K W supports multiple meetings, in which a shared information space is also a shared presentation space (i.e., W Y S I W I S ) in the form of whiteboard, multiuser text editor, video, file transfer and text chat functions, not all of which are completely implemented. The set of on-going meetings is a conference. A ' m i n utes log' is automatically augmented with information about meeting starting and ending times, times of members joining and leaving a meeting, information saved on the whiteboard, and information on files that were transferred.  3.7  Structured Documents: Content from Form  Some document preparation systems (e.g., Scribe [165] and L a T e X [121]) have made the important distinction between content and form, between what is presented and how it is presented. Such systems make possible the separation of the specification of the contents of a presentation from the specification of how it should look, and though little advantage has been taken thus far of the potential represented by this separation, the ramifications are beginning to be well understood. This methodological separation of content from form prefigures the further separation of intent advocated as part of this thesis. In the same way that the structured document paradigm makes possible new ways for browsers to interact with the presentation space, the intent-based paradigm enables new ways for readers to interact with the information space. Efforts continue to build automatic interface and application generators [149] [148], where the form of the interface need not be decided in advance by the application designer. Relational data models likewise separate low-level disk storage issues from higher, user-level interpretations of meaning [49] [41]. A n d , of course, the evolution of high-level programming languages has led to increasing abstrac-  70  tion; programmers need know less and less of the underlying machine constructs and can focus instead upon concepts defined at the task level. The leading edge of this trend is to be found in the object-oriented programming paradigm, where system behavior emerges from the interaction of self-contained objects created by a programmer to be isomorphic to concepts in the task domain. Even so, the structured document paradigm [7] strays only little from the conventional, traditional notion of authoring (see Section 1.1) i n that it is usually the author himself who decides the content of the eventual presentation. A brief review of the literature pertaining to structured documents appears i n this section. Reid [166] offers what he calls "observations about systems employing structured documents" [pl08]: • Structured documents contain more information than just the text or graphics itself. That information is usually called "the structure." • Most structured documents must be processed or compiled or formatted into some concrete form before they can be printed or displayed. • The compilation process discards information: the same structured document can be processed into several different concrete documents. Newcomb [146, p67] offers a definition: are so named because the hierarchical and sequential structure of the various kinds of information they contain is made explicit by identifying tags. Each tag associates a "generic identifier"— the name of the kind of thing being tagged (e.g., "subsection") — with the data surrounded by a start tag and end tag of the same generic identifier. Structured documents  The familiar (in academic circles) L a T e X [121] document formatter (built as a macro package for the T e X type-setting system [112]) is i n this spirit, as was Scribe [165], its acknowledged predecessor. The "structure" and other information added to create a structured document is sometimes referred to as its "markup"  71  and the dialects used to convey this information have been called "markup languages." Hypertext Markup Language ( H T M L ) is probably the best known, currently in wide use for the authoring of documents on the World Wide Web ( W W W ) (see Section 3.9).  18  Such documents generally leave rendering decisions to a run-  time browser program such as the popular Netscape and Mosaic products, which are graphical browsers that try to provide a full color high-resolution layout for H T M L files, or Lynx, a text-only browser. Reid [166, p i 0 7 ] nicely sets out the problems in store for intent-based authors when he points out (the emphasis is mine): There is usually a separation in time or space between the creator of the document and the user of the document. The difficulty in communicating and storing structured documents comes from the need to make sure that the reader will interpret the stmcture in the way that the writer intended. ... to make sure that there is enough information included with the document that equivalent bindings can be made by the reader, to produce a printed or displayed unstructured document that properly resembles what the writer intended. But then he goes on to conclude that these problems are unsolvable: The problem of communicating structured documents is fundamentally unsolvable; it is almost unsolvable by definition. This is because the only way to be certain that the structure will not be misinterpreted, that the binding decisions will be made properly, is to make all of the binding decisions and remove the stmcture from the document. This reduces the problem to one that is quite solvable, namely the communication of an unstructured document, but also that is not nearly as useful. Reid appears not to have considered the application of automated intelligence to supply the missing information. The intent-based approach advocated in this H T M L is actually a document type defined i n Standard Generalized M a r k u p Language ( S G M L ) . S G M L (ISO 8879-1986) has been adopted by many o f the world's largest publishers and by many governments, including the U S and the E C . 1 8  72  thesis is actually twofold: the author supplies an intent as an extension to the structure referred to in the structured document paradigm, and a user model is consulted at run time by an intelligent browser that supplies what Reid thinks of as the missing rendering information. What Reid takes as a weakness of the structured document paradigm may well be one of its greatest strengths. The late-binding referred to by Reid is what makes possible the user-tailored presentation of the author's intention. That the receiving station, the presentation system, may be ill-equipped to cope with the task at hand is something computational systems w i l l have to deal with; human interlocutors have been misunderstanding each other since before they even developed language, or speech, yet enough effective communication has taken place to permit the development of human civilization.  3.8  Hypertext and Hypermedia  Hypermedia provides one solution to the problem with traditional authoring identified in this thesis, of being unable to tailor presentations to individual users at run-time. In hypermedia systems, the viewer completes the job of the author by selecting and ordering the information to be viewed through the process of navigating at run-time the links established by the author. But this only pushes the problem from one person (the author) to another person (the viewer) and it dramatically increases the demands on the author who must provide the explicit navigation cues. Reducing the amount of human effort required from the author and viewer is still a significant problem with current approaches to video authoring. These effects can be mitigated by the intent-based authoring approach advocated in this dissertation. A hypertext document or hypertext is a collection of distinct nodes of information connected via a network of links. When nodes contain information of differ-  73  ent types, such as graphical, auditory or video sequences, the term hypermedia  is  often applied to the network. Innumerable variations exist on the kinds of links employed, and the way they are activated. The links both impose a structure on the document, and permit run-time determination of the order in which a reader accesses the information. Most irksome of major problems with the hypertext approach that w i l l not d i minish with mere technological advance is the 'lost in space' problem: i n such a huge information space, it is easy to get lost, or sidetracked. A promising solution to this problem is the application of user modelling to hypertext systems. "Hypertext" is a term coined by Nelson in 1965 [145] and covers a wide range of concepts, many of which had their inception in the vision of the memex by VannevarBush [33, p i 9 0 ] . Bush's vision went further, and were it not for technological limitations of his time, he would have tried to build hypertext systems very much along the lines we see today.  19  The idea of modelling the user of such systems appears i n Bush's writing, as well as in A l a n Kay's vision of the  Dynabook  [110]. Negroponte's [142] musings  are also easy to interpret in the context of user-modelling: Imagine a machine that can follow your design methodology and at the same time discern and assimilate your conversational idiosyncrasies. This same machine, after observing your behavior, could build a predictive model of your conversational performance. Such a machine could then reinforce the dialogue by using the predictive model to respond to you in a manner that is in rhythm with your personal behavior and conversational idiosyncrasies. These and other hyper-X concepts appear in many of the systems reviewed in this chapter; some of these were described under multimedia, others under C S C W S e e " M e m e x Revisited," i n Science is not Enough, 1967, as well as the original M e m e x article, " A s W e may T h i n k , " The Atlantic Monthly, v o l . 176, July 1945, pages 101-108, reprinted i n [87, p i 7 - 3 4 ] and available at publication time on the W o r l d W i d e Web at U R L http://www.csi.uottawa.ca/dduchier/misc/vbush/as-we-may-think.html 1 9  74  topic headings. This choice was based in each case upon the original design i n tention. Thus, F R E S S , E D S , Intermedia and InterNote, - w h i c h were not explicitly designed to support collaboration- appear under multimedia, while Z O G and K M S are described under C S C W .  2 0  The rest of this section reviews several hypertext systems conspicuous in the history and the literature of the field. Xanadu [147, p33] is the embodiment of Nelson's vision of a future in which a single hypertext system provides access to all of the world's literature.  21  The  Xanadu model permits access to any part of any document from any document. Nothing is ever deleted in Xanadu, so that links to specific versions of a document are guaranteed to persist.  22  Copyright protection w i l l be supported in commercial  versions of the system [51]. NLS was developed by Engelbart in the nineteen-sixties, as the first (computational) hypertext-like system, and was highly successful within the research environment at SRI. It provided the impetus toward interactive computing that drove much of the research for the next two decades [147, p32]. The Symbolics Document Examiner is considered the first 'real-world' hypertext system [147, p38]. It serves as the on-line interface to the extensive on-line documentation for the Symbolics system. It is of note to this survey because it is representative of systems which enforce a modal boundary between authors and readers, and it does so i n the most direct fashion: Documents are authored with a separate application called Concordia. This strategy is at the extreme opposite end of the spectrum from systems like Intermedia, which do their best to blur these boundaries. Interesting observations on h o w to make hypermedia systems more user friendly are to be found i n [77], [133], and [206]. Parts o f the vision are implemented and are available commercially from the Xanadu Operating Company. 20  2 1  T h i s feature is relevant to the notion o f 'thickness' discussed elsewhere i n this document, as w e l l as to issues o f literary criticism and deconstructionism. 2 2  75  The most notable difference between the features provided to users of D o c u ment Examiner, the reading tool and Concordia, the writing tool, is the link mechanism. While readers are limited to uni-directional links, authors in the Symbolics environment are provided with links that are bi-directional. The designers of the system felt that authors needed to know the possible paths readers might take to arrive at a node in order to provide a useful "rhetoric of arrival" [147, p i 6 7 ] . A user of Document Examiner can not make changes to the underlying hypertext document, though he can save sets of pointers to nodes called  bookmarks.  The popular HyperCard system shipped free by the Apple Corporation with all Macintosh computers permits both authoring and browsing of hypertexts, but distinguishes these uses via a user-level mechanism. The system operates in various modes: browse-only, authoring, and programming [9]. A likely reason for the fertility of the area of hypertext is the syncretic, interdisciplinary backdrop against which the work has taken place: computer scientists, management information specialists, sociologists, anthropologists, artists, educators, mathematicians and other groups have all contributed to the development of concepts and implementations in hypertext.  3.9  Putting it all Together: Cyberspace?  There is much talk today about cyberspace, the semi-mythical electronic environment in which we and our computational surrogates w i l l one day meet to work, play, and think. Networks spring up every day now, and everyone is anxious to be connected. W h y all the sudden interest? Conklin [51, p454] cautions against the easy answer that recently empowering technology has made these visions of the near future more realistic. He suggests that there has been a gradually growing awareness of the potential benefits of hypertext, and that where the computer industry showed no interest twenty years ago  76  in demonstration systems running on state of the art, dedicated hardware, experts today fawn over the potential of a basic home computer connected to the network. Social changes are at the heart of the information revolution, and it w i l l be further such changes at the individual and collective level that w i l l fuel continued "electronification" of information and practice. The rapid acceptance of the global network makes it the ideal carrier for the intent-based authoring and presentation paradigm. The World Wide Web combines unified access to the different kinds of information on the Internet, with electronic publishing in Hypertext Markup Language ( H T M L ) , a format for which browser client programs exist on all major computer platforms, from laptop to mainframe. It is a new and continually evolving publishing medium that allows reading and writing and interlinking of documents irrespective of topic or geography, but with the overwhelming size of the information space come difficult search and retrieval problems. Nelson's vision of a literary hypertext network spanning the globe is rapidly becoming reality, but what form w i l l it take? H o w w i l l familiar notions carry over from the paper and pencil documentation preparation tradition? The answers to these questions spell out the future for the intent-based authoring paradigm, where the role of the author is completely redefined. W h o owns the copyright on a document rendered by a browser based upon a run-time model of the reader, where the only contribution of the "author" is an intent?  23  David Sewell of Rochester University's English Department writes [184]: The traditional view of an "author" as a single autonomous agent, the sole intentional creator of a work, is a product of the age of the codex book, when writing was both material and unalterable. But the electronic medium... "denies the fixity of the text, and... questions the authority of the author..." Curiously, though, electronic communication has tended to hang on tenaciously to the single, identifiable author: on-line journals have conventional Authors in the intent-based authoring paradigm can contribute content, but they are only to contribute intent. 23  required  77  tables of contents and author attributions, nearly all e-mail and news-posting systems identify message senders, and on networks like Usenet the elaborate ".sig" or signature appended to one's postings has become a way of transcending the uniformity of the medium... Despite the network's potential to allow anonymous collaboration, it is rare for even experimental network art and participatory projects to be anonymous... Those of us actually taking part in the on-going 'electronification' of the global information space surely recognize the veracity of Sewell's observations.  How  many of us would be willing under the existing social structures to devote our time and energy to producing anonymous documents that can be borrowed, modified and claimed by others? Lamenting the death of the author [18] was obviously premature. The message for systems developers of the near future is that support is still required for version control, access control, copyright control. Control is the operative word; if the author is dead, his ghost still wants the rights to his work.  78  Scenario: Information Gathering This scenario considers how the implementation  described in Chapter 6 functions  in the context of the UBC Department of Computer Science Hyperbrochure, an actual application under development in conjunction with this thesis.  Information Gathering John, a prospective graduate student, starts up Valhalla after signing on as g u e s t . The user model window pops up with the system's a priori hypotheses about John. Since usage of the guest account carries little information beyond the reasonable assumption that the user is not a current member of the department, some default hypotheses are based upon the knowledge that the terminal John is using is located in a faculty office, and that the departmental on-line calendar lists a faculty recruiting seminar that day. These coincidences conspire to produce the false assumption that John is a prospective faculty member. If John notices by looking at the user model window that Valhalla thinks he is a prospective faculty member, he may correct this false assumption at this point by clicking on the button that represents that he is prospective student. John can interact with the user model window immediately, or he may wait until after pressing the show button and perhaps wondering why the presentation is not meeting his needs as a prospective student. In either case, after correcting the system's misconception, he is presented with a brief introduction to the department by its head, and then with a number of clips designed to motivate and increase his interest in the department. Valhalla makes numerous assumptions here about the interests of students and instantiates these goals with footage about sports facilities  on cam-  pus, regular social events in the department, and a brief overview of research activities. John's hypothesized age, which is influenced by whether he is a student or faculty member, has an influence upon whether the system assumes he is single or married, which in turn influences content selection. John lets the presentation play to conclusion and logs out. Mary, a prospective faculty member, signs on at the same terminal, also as guest, and consults Valhalla. This time, the a priori assumptions are more relevant. (Mary is, in fact, the visiting faculty scheduled for that day.) Mary sees the  79 introduction, and then an overview of each of the laboratories in the department. She replays the clip of the Laboratory for Computational Intelligence (LCI) several times; this usage information is passed on by Valhalla's interface to the reasoner, which infers that Mary is more interested in AI research than other activities in the department (although there could be other explanations). Mary asks for another presentation (either before or after the current one runs to completion) and is then presented with more detailed footage about the LCI, as well as with interviews with key AI researchers in the department. This second presentation is shortened to accommodate Mary's optimal viewing time, as represented in the system's model of her. Both John's and Mary's presentations include clips about the Vancouver area, because it is considered by many to be very attractive. This kind of information can even be acquired automatically, by noticing, for instance, that out-of-town users tend to linger over scenic shots in the video presentations much more than do locals (who can just look out the window).  2i  Had they been assumed by Valhalla to be  current, rather than prospective members of the department, John and Mary would not have been presented with this extra information.  On the other hand, a user  accessing Valhalla over the Internet might receive a presentation with even more emphasis on the local geography, on the assumption that they had never before been to Vancouver.  I.e., the a priori probabilities of assumables can be upgraded according to well-known learning algorithms [201].  Chapter 4 Formal Background This chapter provides background material upon which the contribution of this dissertation is based. The subject of symbolic logic and default reasoning is introduced first, and then a particular formalism for hypothetical reasoning is pursued in Section 4.1.1, which is the basis from which the formalism of this thesis as well as the prototype implementation are later built. Decision theory has been advanced as a normative tool for design under uncertainty. It is briefly reviewed in Section 4.2. Decision theoretic approaches involve averaging over some or all models of the world to produce a "compromise' design >  that maximizes expected utility. Speech A c t Theory is introduced in Section 4.3.  4.1  Symbolic Logic and Default Reasoning  Symbolic logic was intended originally as a language for unambiguously describing mathematical entities, but with recent technological developments has come a growing interest in the use of logical systems for reasoning as well as for representation. Given a set of axioms or formulae which are true, a logic is a set of  80  81  syntactic rules for deriving new statements from existing statements. Statements that conform to the syntactic rules of the language are called well formed formulae or wffs. First-order logic is both a language for expressing knowledge and a means by which further statements can be derived. Assuming that the available knowledge is complete, consistent and monotonic, the derived statements can be regarded as true. Completeness with respect to a particular domain is the property that all facts needed to solve the problem at hand are present in the system or derivable from those that are. Consistency is the property that all the axioms are true and cannot lead to contradictions. The property of monotonicity holds when the addition of new facts is guaranteed not to lead to contradictions; the size of the knowledge base in terms of the number of statements in it can only grow. Non-monotonic reasoning systems are those which are designed to solve problems and manipulate representations in which one or more of these properties do not necessarily hold. The inferences made in such systems are said to be defeasible, because new information (observations about the world or environment, for example) may invalidate earlier conclusions, which may in turn have to be retracted. Research in the field of non-monotonic reasoning sometimes goes under the rubrics "default reasoning" or "hypothetical reasoning," and is broadly characterized by the common goal of achieving reasoning behavior in closer correspondence with intuition. Numerous formal approaches have been developed in support of making i n ferences in the absence of complete and reliable information (see, for instance, Brewka [32], Konolige [117], Geffner [81], Reiter [167], Etherington [67] and Kautz [109]). The contribution in this thesis is built upon the Theorist formalism developed by Poole [159].  82  4.1.1  Default-Programming with Theorist  To make the following presentation more precise, the simple hypothetical reasoning framework of Theorist [162] is used. "Vanilla" Theorist is defined in terms of F, a set of closed formulae, called the "facts", and H, a set of (possibly open) formulae called "possible hypotheses," or assumables. The following definitions are relevant: Definition 1 (Scenario) A scenario is F U D where D is a set of ground instances of elements ofH such that F Li D is consistent.  Definition 2 (Explanation) If g is a closed formula, an explanation of g is a scenario that implies g. Such a g is referred to here as an explanandum (the plural being, of course, explananda).  Definition 3 (Extension) an extension is the set of logical consequences of a maximal (with respect to set inclusion)  scenario.  There's more than one way to use a hypothetical reasoning formalism. It can be used at least for prediction and for abduction, often in a single domain or problem. 1  Theorist is particularly of interest because the same formal definition allows for both default and abductive reasoning [159]. It is also implemented; the examples provided in this thesis have been tested on a running version of the program. These different uses of Theorist can be characterized along two dimensions: • Status of Explananda, and • Status of Assumptions These two dimensions are the rows and columns, respectively, of Table 4.1, which is referred to henceforth as the Domain-Formulation Grid, reflecting its i n tended use as an aid in the formulation of problems and domains for Theorist and other formalisms for hypothetical reasoning. T h e Encyclopaedia o f Philosophy defines abduction as: " C . S . Peirce's name for the type of reasoning that yields from a given set o f facts an explanatory hypothesis for them" [64, p5-57]. The term "abduction" is used throughout this thesis in the formal sense o f Csinger and Poole [61], w h i c h is consistent with Peirce's treatment; refer to Section 5.4 for details. x  83  Status of Explananda The first dimension concerns whether the explanandum is known or not. This distinction corresponds to a choice between the following: Abduction: The system regards the explanandum (the observation of the world or the design objective) as given, and needs to find an explanation for it. The idea is to find assumptions that imply the goal. We consider all explanations of the goal as possible descriptions of the world. Prediction: The system does not know if the explanandum is true, and the idea is to determine what can be predicted from the facts (the general knowledge and the observation or design objective). The issue is whether the explanandum is known to be true or whether it is something that has to be determined. For instance, if a reasoning agent knows (or has as defaults) that a —»• b and that a, the agent can predict b from its knowledge. A n other agent who also knows that a —> b, as well as 6, might be able to assume in the absence of contradictory evidence that a. The first agent is using prediction, while the second agent is using abduction. One interesting difference between abduction and prediction is in the relevance of counter-arguments. For instance, when predicting g, it is important to know if ->g can also be explained. In abduction, however, an explanation of ->g is irrelevant [159].  Status of Assumptions A l o n g the other dimension we can distinguish between the two types of tasks: Design tasks [74] are those in which the system can choose any hypotheses it wants. For example, a system can choose the components of the design in order to  84  Explanandum Unknown Known Abduction  Prediction  Design Who  User  Recognition Nature Table 4.1: Domain-Formulation  fulfill its design objective, or choose utterances to make in order to achieve a discourse goal. A consistency check is used to rule out impossible designs. A l l other sets of components that fulfill the goal are possible, and the system can choose the "best design" to suit its goal. Design can be done abductively to try to hypothesize components in order to imply a design goal [74]. Alternatively, design can be done predictively to derive a design from goals and any hypotheses we care to choose. Recognition tasks are those in which the underlying reality is unknown, and all we can do is to guess at it based on the observations we make about it. This definition includes diagnosis, scene recognition and plan recognition. Recognition can also be performed abductively or predictively [160]. In an abductive framework, each explanation is a possible description of the world, while the disjunction of all explanations is the description of the world. In the predictive framework, an appealing strategy is to predict something only if it is explained from the observations even when an adversary chooses the hypotheses [159], which corresponds to membership in all extensions.  85  This distinction turns on whether the system is free to choose any hypothesis that it wants or whether it must try to "guess" some hypothesis that "nature" or an adversary has already chosen. For example, agents who know only that there is to be a meeting on the hour, sometime between 09:00 and 12:30, but not at 10:00, are able to make only disjunctive statements about when it w i l l take place; they recognize that the meeting is at 09:00 or 11:00 or 12:00. In contrast, the agent organizing (designing) the meeting is free to pick any time that is consistent with its own knowledge. The planning agent is free to choose that the meeting w i l l take place at 11:00. Note that these frameworks are different ways to use the same formal system for different purposes. In order to use the system, we have to choose one way to implement our domain. In general, there are not enough constraints in a domain to uniquely determine the approach that the reasoning system should take in formalizing its characteristics [160]. The 'causality' in the domain does not uniquely constrain its defaultreasoning axiomatization. These choices are succinctly represented by the number of ways of situating the problem into the domain-formulation grid of Table 4.1 ?  4.1.2  Summary and Conclusions  Formalisms for hypothetical reasoning can be used abductively or predictively. Theorist is one such formalism. Finding enough constraints in a domain to uniquely define its default axiomatization is not usually possible. Default implementations can be classified along (at least) two dimensions: the assumption and explananda status dimensions, which we have represented as the rows and columns of the domain-formulation grid shown The grid merely summarizes some of the different possible uses of the hypothetical reasoning formalism; different problems/domains fall into different boxes, corresponding to different uses of the reasoning system. 2  86  in Table 4.1. The domain formulation task can be superficially regarded as one of finding how the domain fits into the grid's representation framework.  4.2  Decision Making Under Uncertainty  M a k i n g decisions in the absence of a complete description of the world is a complicated task. Many arguments have been advanced to support the popular claim that decision theory yields optimal results (see, for instance, Savage [179]).  4.2.1 Bayesian Decision Theory Various models of decision making under uncertainty have been proposed with the goal of attaining an optimal decision, all of which embody the notion of maximizing expected utility over a probability distribution of states of the world. Decision theory offers a kind of normative standard for decision making under uncertainty, and has been applied to design tasks under uncertainty. Some of this literature (see Cheeseman [40] for a discussion) argues that the best design is the one that results from averaging over all possible models; it w i l l be argued in Section 5.7.1 that classical decision theory is not the right approach for the intent-based authoring paradigm being advanced in this thesis. One model [179] consists of a set 5" of possible states of the world, a set O of possible observations, and a set  of decision alternatives. A conditional proba-  bility distribution P(o\s) describes how likely it is to observe o when the state of the world is s, and a prior probability distribution P(s) describes how likely the world is to be in state s. The utility function p,(d,s) represents the reward to the decision maker for selecting decision d G Q, when the world is in state s G S. The d  general problem is to decide on a mapping from O tott which dictates the action d  to take for each observation; such a mapping is usually referred to as a policy. The  expected utility Es induced by the policy S : O —>• D, is defined by d  87  E=  p(S(o),s)P(o\s)P(s)  S  ses,oeo The principle ofmaximizing expected utility states that a rational decision maker chooses the policy 5' that satisfies E$i = maxsEs where the maximization is over all possible policies. The quantity maxsEs is called the optimal expected value of the decision problem.  4.2.2 Example A simple example follows. Let S = {rain,-'rain}, O — {wet^wet}. The conditional probability distribution P(0\S) and the prior probability of S are e m pirically measurable, but assume here that they are as follows:  rain  wet 0.8  dry 0.2  no rain  0.1  0.9  Because we are in Vancouver, P(rain) = 0.9. Utilities might be as follows:  take umbrella don't take umbrella  rain  no rain  0 -100  -10 0  The problem is to decide whether or not to bring the umbrella given the observations. There are four possible policies:  88  Si(wet) = take, Si(dry) = ->take 82(10 et)  = -*take,  (4.1)  8 (dry) = take 2  8z(wei) = take, 8i(dry) = take 8 (wet) = ^take, 82(dry) — -<take 4  Es  1  = n(8i(wei), rain) * P(wet\rain) * P'(rain) +  (4.2)  fi(8\(dry),rain) * P(dry\rain) * P(rain) + fj,(8i(wet), -irain) * P(wet\-^rain) * P(->rain) + fi(8i(dry),~irain) * P(dry\^rain) * P(-<rain) = -18.1  E$  2  = /j,(8 (wet), rain) * P(wet\rain) * P(rain) + 2  (4.3)  /j,(82(dry),rain) * P(dry\rain) * P(rain) + li(8 (wet), ->rain) * P(wet\^rain) * P(->rain) + 2  /j,(82(dry),-<rain) * P(dry\-*rain) * P(->rain) = -72.9 Similarly, £ , $ 3 = — l . O a n d E ^ = — 9 0 . We would choose policy £1 over policy 8 because Es > E$ , which corresponds to our intuitions. 2  x  2  Interestingly, the decision maker prefers, under the given utilities and probability distributions, the policy 8 of always taking the umbrella regardless of the 3  observation; no observation can improve his outcome. This is because these utilities reflect a strong aversion to getting wet, and only a small nuisance factor for  89  being unnecessarily encumbered with an umbrella on a sunny day. A different utility function would, of course, yield different decisions. The lesson here is that it is often difficult to operationalize our intuitions with meaningful utility values.  4.2.3 Decision Analysis The preceding formalization has considerable representational power, but at considerable computational cost for real-world problems. It can be used to select an optimal sequence of actions (a policy) from many possible sequences (policies). It takes the position that one cannot anticipate future observations, and must therefore decide what to do by averaging over possible future observations. Such power is not always required. If the observables or some subset thereof were available to the decision maker before a policy needed to be formed, then using the available information could reduce computational requirements. We can write an expected utility expression that refers not to a policy, but to an action S(o) in the presence of observation o:  s  Elements of decision theory are used in this dissertation for the sensitivity analysis described in Section 5.5.  4.3  Speech Acts  Speech act theory [182] [14] distinguishes different categories of communicative acts. Searle [181] divided speech acts into five general classes, from which different hierarchies have been developed (see, for instance, Bach and Harnish [15]). Searle identified:  90  1. Representatives: acts that make a statement about the world, and can be judged to have a truth value  (e.g., inform, boast, deny)  2. Directives: acts that involve influencing another agent's intentions or behavior  (e.g., request, beg, suggest, command)  3. Commissives: acts that commit the speaker to some intention or behavior  (e.g., promise) 4. Expressives: acts that express the speaker's attitude toward something  (e.g.,  apologize) 5. Declaratives: acts that explicitly involve language as part of their execution  (e.g., quit, fire, marry) The authorial intentions referred to in this thesis are generally of the first type, but it is not hard to see how a system like the one described in this thesis could be used to encode communicative goals and perform communicative acts from other categories in Searle's hierarchy. Interesting work has been done to extend speech act theory to other modalities  (e.g., Novitz [152] gives convincing arguments for the ability of pictures to  accomplish the equivalent of illocutionary acts) and multimodal presentation systems have made use of this work [5] [10] [11]. The notion of communicative intent espoused in this thesis is similar to any one of many alternative formulations of the  speech act. In particular, the illocutionary  act corresponds quite well with authorial intent: the understanding that the author is trying to convey. A n utterance is said to be successful orfelicitous when it results in the appropriate, intended perlocutionary act, or effect. When the hearer of an utterance is convinced where the speaker's intention was to convince, the utterance is felicitous.  91 Humans make many assumptions about their interlocutors while communicating with each other, and this assumption-making strategy has mapped well onto some formalisms for default reasoning [56]. When these assumptions are incorrect, they can lead to perlocutionary  failure  and an infelicitous utterance.  M u c h of the speech act literature refers to attempts at defining a meaningful taxonomy of these acts into requests, statements, indirect, direct, and so on. More recent work in automatic presentation [5] consider the generation of documents as a sequence of speech acts to achieve a more complex communicative goal. Some of these recent studies have also widened the scope of the inquiry to include nonlinguistic "utterances" like graphics. Operationalization of Rhetorical Structure Theory by Moore and Paris [138] is another effort to build taxonomies that might prove useful i n automatic generation systems [139], and Andre [6] has extended and adapted R S T theory to suit the planning of multimodal presentations. The approach described in this dissertation is insensitive to the details of a particular theory of speech acts, but is informed by these on-going efforts to structure communicative acts from underlying speech act components; see Section 5.3.  Part III Contribution  92  93  Scenario: Back to School Steve and Peter are both considering going to grad school, and both of them are using a future, enhanced version of Valhalla, the application which is described in this thesis, to help them make their decision. As they are distinct individuals— and they are very different individuals—the system supports them in different ways. The intent of this scenario is not to imply that users can be characterized to any degree by their .newsrcfiles,or more generally, by their news reading habits, but that even in such data is potentially useful —and potentially abusable— information. Steve goes back to school 'Wow that my coop program isfinished,let's see what grad school has to offer!" Steve relishes the idea of pursuing some of the questions his recent experiences have raised, and wants a new intellectual challenge. He glances at his watch. "Just enough time before that lecture on database theory. Great!" He logs into a machine on the University of British Columbia's computer network, starts up his favorite web browser, and asks to receive a presentation from Valhalla, the automated on-line departmental hyperbrochure. Because Steve is signed on to a departmental machine with an account that identifies him as a visitor of Kellogg Booth, a professor in the department, Valhalla reasons that Steve is a visiting researcher in afield related to computer graphics or human-computer interaction, since these are the principal interests of his hosting professor. Steve watches the departmental introduction by its head, and pays attention to several clips on human- computer interaction before skipping over the next few clips on computer graphics research within the department. Valhalla re-designs the presentation to emphasize human- computer interaction research, but Steve grows restive and clicks on a button that reveals elements of the user model upon which Valhalla is predicating its designs. "Ah," breathes Steve, "It thinks I'm a visitor." Steve is driven to look at the display of the user model because he can't understand why Valhalla is showing him so much material on graphics and HCI, when in fact his interests lie in applied physics and numerical analysis. He sees from  94 the model display that Valhalla thinks he is a visiting researcher, and simply clicks on a radio button to inform the system that he is an undergraduate student. The system recalculates the best model and designs a new presentation that includes an overview of sporting and social events on campus. Peter goes back to school "I'm tired of my job", thought Peter; "I hate my boss, my life is going nowhere. Hmmm. I may as well go back to grad school." Since it's the mid-nineties, the most effective way to find up-to-date information on anything is to surf the net, and Peterfiresup his favorite browser. Peter's browser incorporates the latest version of a user-modelling add-on developed at the University of British Columbia, It knows quite a bit about Peter and is able to tailor interaction to suit him. A variety of university names appear on the monitor, but of the thousands of educational institutions offering post-graduate studies, only those appear which offer courses in computer science, and which are also close to either skiing or surfing. "Let's see now... Grad school...Montreal: too cold in winter. MIT, nah. Stanford, too many earthquakes. UBC. Yeah, let's take a look at UBC... " He wishes to himself that he had a coffee right now, but can't quite muster the energy to go to the kitchen and make one. He has just enough time to stroke the stubble of his week's growth, before the home page for UBC's Department of Computer Science fills the display. Peter clicks on a hot-link promising a departmental overview. This future version of Valhalla tries to negotiate an exchange of user-modelling information with Peter's net browser program, which refuses because Peter has instructed all of his software agents never to divulge anything. Valhalla then asks for a copy of Peter's .new src file. "No harm in that, I guess," mumbles Peter, and releases his .newsrc with a click. Valhalla correlates the new information with a vast database of .newsrc files and derived user models in order to arrive at some reasonable hypotheses about Peter and his interests. Valhalla counts the number of active newsgroups and notices that Peter is up to date on at leastfiftygroups including alt.fan.monty-python, alt.rec.humor, alt.tv.simpsons, alt.jokes, and talkpolitics. Valhalla concludes that Peter has time to spare (Peter's version of Valhalla has no sense of humor and does  95  not know how to avail itself of this opportunity for cynicism), and revises its default assumptions about presentation time limits. Valhalla infers from the domain suffix of Peter's network address that he is not currently local to the Vancouver area, and also finds that interest in leisure activities correlates well with the active newsgroups in his .newsrc file. There is no perceivable delay before Peter is shown a breath-taking video sequence ofprofessional skiers zooming through fantastic vistas, maps of the ski regions in the Vancouver area, and an introduction to some of its interesting hiking trails. Because Peter lingers over sequences about the bars on campus, Valhalla tries to test some alternative hypotheses that might explain his action at the interface. An overview of the department's beer brewing club prompts Peter to fastforward over the material, but he rewinds and plays twice in its entirety a walkthrough of night-life possibilities. Valhalla infers that Peter wants to party. "All right! Let's party!" says Peter out loud, actually relishing the prospect of graduate school. The presentation ends with a minute or two about the research facilities of the department, during which Peter begins to doze off. "Yeah, UBC. That's the one forme. Gosh. I hope my marks are good enough... "  Chapter 5 User Models for Intent-based Authoring 5.1  Overview  Following the brief motivation in Section 5.2, where the argument for Intent-based Authoring is recapped as the solution to problems with traditional authoring techniques, Section 5.3 introduces theoretical elements of the solution and their application. The approach, using probabilistic recognition and cost-based design, is advanced in Section 5.4; this approach is an extension of the formalism described in Section 4.1.1. Users need to understand the behavior of the system on their own terms in accordance with what is called in this thesis the scrutability desideratum, which the 1  system supports with a sensitivity metric. Scrutability and the sensitivity metric are both described in Section 5.5. Section 5.7 considers some alternatives to the approach advanced herein. The scru-ta-ble (adj) [ L L scrutabilis searchable, fr. L scrutari to search, investigate, examine more at S C R U T I N Y ] (1600) xapable o f being deciphered: C O M P R E H E N S I B L E (Webster's 7th Dictionary, on-line copy). 1  96  97  use of decision theoretic techniques is considered in Section 5.7.1 for the domain of multimedia presentation design, and it is concluded that these "compromise" designs are inappropriate for interactive presentation environments. Section 5.6 is a detailed example of the operation of Valhalla, and Section 5.8 summarizes this Chapter.  5.2  Motivation  Except for some of the work by Feiner [183], and the W I P system at D F K I [195] (see Section 3.5), the presence of the user/reader has little effect on the form and usually no effect on the content of the presentation produced by the system. This early commitment to form and content is a severe limitation; there is no reason in general to assume that a single document can serve the purposes of multiple readers, and certainly any single document w i l l be sub-optimal for some reader [104]. Authors have grown accustomed to having complete, dictatorial control over the form and content of their document product, and equally accustomed to the complete loss of control ensuing from the production process. Until now, we have not questioned this traditional model of authoring, in which authors as knowledge workers labor over their intellectual child until it is ready, finally to launch it into the world via a printing press or a C D R O M burner; then, like hopeful parents, they wait and watch to see if their book or multimedia product has the desired effect. This traditional model is so entrenched because, until now, there has been no alternative. Technological advances thus far have merely amplified the strengths and the weaknesses of existing approaches to authoring, and it is only recently that processors have become powerful enough for us to consider applying intelligence, rather than mere horsepower, via computation. The Intent-based authoring paradigm advanced in this thesis permits authors to send their work out into the world not as a rigid block of content, sealed forever  98  inside the cover of a book or the shrink-wrap of a C D R O M release, but as a body of knowledge framed by the author's point of view. Paul Saffo of the Institute of the Future calls this intellectual commodity  context, and recognizes that its value  surpasses that of mere content. Intent-based authoring makes explicit the point of 2  view of the author, so that client-side computational mechanisms can be brought to bear at run-time to design a presentation of the material that conforms to the author's point of view, or  5.3  intent, while meeting the needs of the individual viewer.  Components of the Theory  Before content can be separated from intent, and the author's compile-time specification task completely decoupled from the user's run-time viewing task, some critical knowledge bases must be developed. Some of these knowledge bases w i l l be extremely labor intensive to create, and the first intent-based presentation has taken years to prepare. The second one w i l l be able to build on the knowledge of the first, the third on the accumulated set of knowledge, and so on. A t some point, preparing the n-th intent based presentation w i l l require less work from an author than it would to have prepared a traditional document. Similar to the considerations which motivate code re-use in software engineering, this economy of effort w i l l be an important factor in the success of the paradigm. The two key knowledge-based ingredients in the intent-based authoring theory are 1) representations of the author's compile-time communicative goals, or intent, and 2) the user's run-time information-seeking needs and goals. These are the components which mediate the new, abstract, extended interaction between author and reader. The first is to be supplied (or selected from an existing set) by the author, the second is to be determined automatically at run time by the system and used in turn to determine the content of the presentation. A s w i l l be shown 2  F r o m his keynote address at the W.R.I.T.E. conference i n Vancouver, Canada, 16 June 1995.  99  throughout this chapter, the author intent and the user model both serve as constraints on the application of presentation schemata, which are rules that encode notions of stylistic coherence, and reliable organizational principles for effective communication. A simple example of a presentation schema is that a presentation should have a beginning, a middle and an end, and that the beginning should i n troduce, the ending summarize, the material to be found in the middle. In addition to representations of authorial intent, user model, and the presentation schemata, other knowledge bases are also consulted by the system at run-time to prepare the presentation. These knowledge bases may be contributed by the author, the viewer, an annotator, or a knowledge engineer; in the near future, intelligent agents may scour the Internet for additional knowledge that could be used in unanticipated ways. Each of these knowledge-based components is considered now in some more detail.  Authorial Intent  In declaring explicitly his intention, an author can license a  presentation system to prepare a presentation that w i l l meet unanticipated run-time contingencies. Even in the absence of the author, a presentation can be made automatically, in accordance with the author's intention, taken by the system to be a specification of the presentation. A n intent is analogous to a speech act (see Section 4.3 for some background). It is a complex communicative goal, arbitrary up to the limits of expressiveness of the language in which it is articulated. The language used here is described in Section 4.1.1 and Section 5.4. A simple example of an intent is "Show the user a presentation about the D e partment of Computer Science, but don't bore the user with irrelevant material and try to accommodate the presentation to the amount of time the user has available. Try to impress the user with the department." This intent is broad, and quite general. It might be expressed in the underlying representation language by writing a  100 rule called show and telling the system to apply that rule at run-time; assuming that the knowledge base already contains one or more ways to satisfy the rule show, different presentation schemata would compete for dominance until the best presentation is derived for the most likely user model. These processes are described at length in this chapter.  User Models Models [115] [196] of the readers or viewers of presentations are also needed to overcome traditional limitations and move into intent-based authoring. A reader model can be considered to be a representation of the reader's attributes relating to his or her information-seeking needs and objectives in consulting the system. A reader model consists of a set of hypotheses about that reader, acquired at run-time by making observations of the user's interaction at the interface.  3  The application of rules from the knowledge bases to derive the eventual presentation, must be consistent in a logical sense with the elements of the model. The representational range of the models may vary between domains; different domains w i l l require the representation of different kinds of information. A k n o w l edge engineer determines a priori this representation range, by specifying a number of dimensions of analysis along which user attributes can be assigned. The values that these attributes can take are predefined, and are called assumables,  or potential  hypotheses, and they collectively comprise an ontology for possible user models. A s developed in later sections of this chapter, a user model in our theory is a set of assumptions drawn from the set of assumables. The assumables establish the representational range of the possible models, and so should be crafted with their eventual usage in mind. For instance, in the system that has been implemented, the domain of discourse is the Department of Computer Science, and potential users fall into the pre-determined classes of Faculty, Student, or Staff, along the dimenO n c e acquired, individual models may also be stored and retrieved where appropriate; although these issues are beyond the scope of this thesis, see Section 7.2 and Section 7.2.2 for some discussion about future work, and privacy issues, respectively. 3  101  sion of User Type; they are also evaluated on whether they are visiting or local, and whether they are male or female. These and other sets of assumables affect the design of the eventual presentation as described in this chapter. A user modelling system may attribute to a user assumptions at different levels of abstraction; while watching a video of a dinner being prepared by a chef, for instance, if the viewer is also  ex hypothesis  a chef, the system might assume that  the user believes that chicken marinara is being prepared. If the viewer is a typical North American fast food junkie, the system may attribute the belief that the dish involves chicken and some sort of sauce. How these user models are acquired by the system is discussed in this section. Acquisition:  Models in our approach are acquired both explicitly and i m -  plicitly through an abductive reasoning framework described later in detail. The system makes observations of the user's interaction and tries to explain these observations by making hypothetical attributions of user status on one or more relevant dimensions of evaluation. Explicit acquisition takes place when observing the interaction of the user with a description of the user model. Implicit acquisition takes place when observing the interaction of the user with the presentation. The difference lies only in that during explicit acquisition, the user is made aware of the model and is actually called upon to refer to and to manipulate it, while during implicit acquisition, the user need not be aware of the model at all, nor even of the fact that modelling is taking place. Explicit acquisition has the advantage of reliability, but it is intrusive, and can distract the user from the task; implicit acquisition has the advantage of unobtrusiveness, but suffers from potential inaccuracy. A combination of these techniques is used in the system, with a view to having the best of both. See Chapter 2 for a discussion of the difference between implicit and explicit acquisition of user models.  102  Other Knowledge  A rule base of facts and assumables (as described in Sec-  tion 4.1.1) is supplied by a knowledge engineer. Part of the database is world knowledge that the system uses in domain dependent ("Faculty members don't take courses, typically," etc.) and independent ("There are 24 hours in a day," " M e n are mortal," etc.) ways, and there can be knowledge about the characteristics of the media involved ("Don't play the audio track when video is played at less than half-normal speed," "Use still-frame techniques when showing a video clip of less than one second duration," etc.) Categorical, contingent, and hypothetical statements are expressible in the language. Presentation plans are the bulk of the knowledge encoded. These are variously elaborated rules for delivering information; a convince plan might present examples as evidence in support of a conclusion; stylistic or cultural factors might govern whether the evidence precedes the conclusion (prefix) or comes after it (postfix). A schema to describe something might be implemented as a rule which says: "Describe a Thing by Describing its Parts." A n intent-based author might invoke a schema that has been defined to not offend the viewer; such a schema might consult a database of cultural sensibilities, and tailor the presentation to suit the hypothesized cultural vagaries of the viewer. (In some cultures, for instance, the viewer might be offended by the tone of voice, the style of address, the dress or even the gender of a speaker; the system can choose the appropriate design element at runtime to suit these constraints. Such just-in-time choices were not possible under the traditional model of authoring.) Presentation schema are instantiated finally with actual content: text, video clips, audio, etc., selected by the application of rules during a proof process described later. In order to select these content elements, the system must be able to reason about the content. For this reason, another important part of the database is devoted to meta-descriptions of the content from which a presentation is to be made. In the case of the system implemented for this thesis, the content is a c o l -  103  lection of video clips, a selection of which is assembled by the system at run-time into a presentation tailored to meet the needs of the individual user as represented in the hypothesized user model. To be able to reason about the contents of the video, the system requires a description of its contents, which we call annotations. The annotations link keyword descriptions of the contents of specific clips with the associated time codes on the media (video tape, laser disk, digital video streams, etc.) For example, an annotation might carry the information: A n interview with M a r i a Klawe containing overview material is to be found between time codes 1:12:14 and 1:15:01 on the stream called "Department". Figure 5.1 is a tree diagram of a presentation that has been designed to convince someone to join the faculty in the Department of Computer Science at U B C . It consists of the application of both the Convince and the Describe schemas just mentioned. The presentation has a description, followed by a conclusion; the description of the department is instantiated with an intro by the department head on video, followed by descriptions of parts of the department (laboratories), which in turn are instantiated by video clips of interviews with representatives from the two labs that were judged by the system to be of most interest to the viewer, and scenes from the labs. The conclusion consists of descriptions judged by the system to be of general appeal; rules have been encoded to assert that it is generally believed that the Vancouver area is very scenic, and that this is a quality that can be used to convince people, so scenic shots of the area are presented. This example is taken from the working version of the system. Later examples in this thesis w i l l show that these scenic overviews are provided only to viewers assumed by the system to be not from the Vancouver area. Finally, individual pieces of the video record must be chosen to fill the slots in the now elaborated plan schema; the leaf nodes in the tree representing the plan schema are to be expanded. In Figure 5.1, the logical form Describe(lci) is instantiated with the video clips labelled interview(alan-mackworth)  and tour-lci, which  104  Convince(join-ubc-faculty) Describe(department) edit-list([(00:01:26, 00:02:27, head-klawe-intro)]) Describe(laboratories) Describe(lci) edit-list([(00:43:41, 00:44:23, interview(alan-mackworth)), (00:12:15, 00:15:54, tour-Ici)]) Describe(imager-lab) edit-list([(00:45:08, 00:46:16, interview(kelly-booth)), (00:03:06, 00:03:13, rendered-dragon-speaks)]) Conclude(general-appeal) Describe(vancouver) edit-list([(00:20:15, 00:20:20, cypress-mountain)]) Describe(campus) edit-list([(00:25:07, 00:26:59, aerial-view-campus), (00:27:43, 00:27:44, faculty-club)])  Figure 5.1: A partially elaborated presentation are to be found on the video tape between absolute time codes 00:43:41-00:44:23 and 00:12:15-00:15:54, respectively. Some part of the database might one day consist of a collection of intentions from which intent-based authors can select, i f they don't want to specify their own intentions (or don't know how, because they aren't programmers...). Intent-based authors could compose the intentions from the database into higher level intentions. For instance, i f there already exists a representation of an intention to convince as well as a representation of an intention to amuse, an intent-based author might conjoin these to amuse and convince the reader. Only the hint of this future facility is currently available in the system, which still requires that intent-based authors have a strong ability to program their intentions in the underlying representation language.  4  A visual editor could also be added, that allows authors to compose intentions through the manipulation o f a graphical interface. 4  105  Roles  Inputs  Process  Outputs  Presentation (video edit decision list)  Display of User Model  Figure 5.2: Roles, Inputs and Outputs  5.3.1  Summary: Inputs, Outputs and Roles  The outputs of the system are (1) an edit decision list of video clips which is played under user control on a video display device, and (2) a presentation to the user of parts of the user model derived by the system. Refer to Figure 5.2: Author(s) supply or select intentional descriptions of their communicative goals. K n o w l edge engineers provide general and specific knowledge, as well as the assumables for model recognition and presentation design. The system calculates the most likely user model from observations of the user's activity and uses that to design the "best" presentation. Both the presentation and components of the user model are displayed to the user. Several agents are involved in the Intent-based video authoring model: They are:  106  1) Annotator: the person or artifact who indexes relevant events and intervals in the raw footage 2) Author: the person or artifact who specifies the intent (and optionally elements of the content and form) of the eventual presentation 3) Viewer: the eventual consumer of the presentation 4) Knowledge Engineer: the person or artifact who prepares the knowledge bases, particularly the domain-dependent knowledge base Note that more than one of these roles can be played by a single agent.  5.4  An Abductive Framework for Recognition and Design  The language of Theorist introduced in Section 4.1.1 is now extended to include probabilities and costs, and a notion of explanation is presented that reflects a new combination of design and recognition within a single formal framework. The set % of assumables is partitioned into the set 71 of those available for recognition and the set V available for design. Each assumable r in 7c has associated with it a prior probability 0 < P(r)  < 1. R is partitioned into dis-  joint and covering sets J , which correspond to independent random variables (as in Poole [161]). Every assumable d in V is assigned a positive cost p,(d). 5  Representative recognition assumables found in the database include the ones shown in Figure 5.3, where P is the prior probability associated with actions of n  the corresponding classes, £ Pi — 1.0, and the syntax in use is: dis]o'mt([assumption\  : Pi, assumption  : Pi, • • •])  Some examples of facts and rules that appear in the database are shown in F i g ure 5.4. The description here of the recognition process conforms to the discussion of probabilistic Horn abduction provided by Poole [161]. 5  107  % For r e c o g n i t i o n of the user model: % f a c u l t y / s t u d e n t / s t a f f user d i s j o i n t ( [ u s e r T y p e ( f a c u l t y ) :0 . 35, user Type (student) -.0.6, u s e r T y p e ( s t a f f ) :0 . 05]) . % l o c a l / p r o s p e c t i v e user d i s j o i n t ( [ g e o ( l o c a l ) :0 . 65, geo(prospective) :0.35]) .  Figure 5.3:  5.4.1  Representative recognition assumables  Recognizing User Models  Observations are made of the actions of the user at an interface. The interface agent communicates this information to the reasoner, which tries to explain the obser6  vations, making recognition assumptions along the way. User actions can be of two major types: the user can interact with a control device to manipulate the presentation elements directly (a virtual V C R control panel, for instance, by which the presentation can be replayed, paused, fast-forwarded, etc.), or the user can interact with a representation of the system's model of the user. When the system makes assumptions to explain its observations of the user's behavior at the control panel, we call the recognition process "implicit" acquisition; when the system makes assumptions to explain its observations of the user's behavior at the user model w i n dow, we call the recognition process "explicit." Typically, the user model window 7  does not contain the entire user model (as there may be very many assumptions in the user model), but only a salient subset of it, determined by sensitivity analysis (see Section 5.5). We call this subset the salient model.  Explain is used  6  7  here in its technical sense, as described in Section 4.1.1. T h e current version of the prototype implements only the explicit approach.  108  % C a t e g o r i c a l world knowledge: g e n d e r ( g e o r g e _ p h i l l i p s , male). % George i s a male gender(maria_klawe, female). % Maria i s a female partof(cpu,  computer). % a cpu i s p a r t of a computer  % Annotations: % Maria speaks from 1:26 t o 2:26 on the video r e c o r d : interval(00:01:26, 00:02:26, speaker(maria_klawe) , [ ] ) . % George i s i n t e r v i e w e d between 30:57 and 31:19 interval(00:30:57, 00:31:19, i n t e r v i e w ( g e o r g e _ p h i l l i p s ) , [ ] ) . % There i s a v i d e o - o n l y (no-relevant % UBC between 25:07 and 26:59 interval(00:25:07,  audio  track) a e r i a l view  00:26:59, v i d e o _ o n l y ( u b c _ a e r i a l _ v i e w )  ,[]).  % schemata % A video c l i p (or c l i p s ) can be a d e s c r i p t i o n o f a Thing d e s c r i b e ( T h i n g , D e s c r i p t i o n ) <= editList([],Description,description(Thing),0,_L). % a BigThing can be d e s c r i b e d by d e s c r i b i n g i t s p a r t s d e s c r i b e ( B i g T h i n g , D e s c r i p t i o n ) <= bagof(Thing, subsumption(Thing, BigThing), Things), desc(Things, [ ] , D e s c r i p t i o n , 0, _Length). d e s c ( [ ] , D e s c r i p t i o n , D e s c r i p t i o n , Length, Length) <= t r u e . desc([H|T], InD, D e s c r i p t i o n , InL, Length) <= e d i t L i s t ( I n D , OutD, d e s c r i p t i o n ( H ) , InL, OutLength), desc(T, OutD, D e s c r i p t i o n , OutLength, Length).  Figure 5.4: Example facts and rules  109  Example: Perceptual Salience Here is an example that involves explicit acquisition of parts of a user model. When the salient model is displayed to the user, the user's action at the interface may lead to observations which in turn lead the system to calculate a new and different model. For instance, i f the user informs the system that the user is a faculty member rather than a student, as had been assumed, the system may reliably retract its initial assumption and assume the corrected value offered by the user (if the user is now telling the truth). Perhaps more interestingly, even the user's inaction may result in changes. For instance, i f the system displays to the user the assumption that the user is a faculty member, and the user does not critique the assumption, it makes sense for the system to re-evaluate the likelihood of the model that includes the assumption in question, ostensibly to arrive at a higher ranking for it, under the additional assumptions that the display of the user model has been seen and understood by the user. These additional assumptions are reasonable if: the window in which the assumption is displayed is not obscured, the text in the window is clearly rendered and is large enough to be easily read, the user is not distracted by other events on the desktop, the user is not distracted by other events in the environment (babies crying, cars colliding, etc.), and so on. The case where a user critiques assumption A but does not critique assumption B is of particular interest. The likelihood of the model can be increased on the basis of the user's direct action as discussed in the previous paragraph, but should also be increased because of the user's inaction with respect to the display of assumption B, on the grounds that the user was actually attending to the appropriate window (because the user did critique assumption A). Although the current prototype merely orders the displayed assumptions according to a calculated sensitivity value, various other display strategies can be used to draw the user's attention to one or more of the displayed assumptions,  110  thereby licensing the additional assumption that the user has attended to the w i n dow in which the user model is being displayed. A particular assumption can be highlighted (with color, font, or animation, for instance) to increase the likelihood that the user is attending to the relevant portion of the desktop, and to the relevant part of the window. Such perceptually salient display techniques can increase the reliability of the system's assumptions about the user, but require that the system model such things as the media capabilities of the user's display and interaction devices. The system would also benefit from knowing whether the window in question is partially or totally obscured by other desktop objects, which would require some degree of integration between the system and the operating system or w i n dow manager. A deep analysis of such desiderata for future operating systems remains to be conducted, but are mentioned here because they are compatible with the interaction paradigm that is described in this thesis, and can be represented in the reasoning framework. Empirical investigations are needed to decide which are the most effective display techniques. Another way to look at the issue of perceptual salience is to note that the system is engaged in a dialog with the user, and that the presentation of the user model window is a communicative act by the system. The perlocutionary effect [182] of these acts is knowledge, on the part of the user, of the system's user model. The intuition is that the likelihood of achieving this communicative goal is enhanced by using appropriate presentation techniques to highlight the most important parts of the message, and that the use of such techniques licenses an increased commitment by the system to the beliefs which follow from successful communication. The user's action at the interface is just another observation available to the reasoning system, and the state of the display is known as a fact, since it is under control of the system. The system explains these observations by (hypothetically) attributing to the user membership in one of eight interaction classes. The system calculates the model incrementally, yielding a final model with a  Ill  % Rules f o r p e r c e p t u a l s a l i e n c e c a l c u l a t i o n s % action(R, V) i s true when user takes a c t i o n V % w i t h r e s p e c t to random v a r i a b l e R. % radio(R, V) i s true when the d i s p l a y e d value of random % v a r i a b l e R ( v i a a g r a p h i c a l r a d i o button, e.g.,) i s V. % val(R, V) i s true when the value of random v a r i a b l e R i s V. % r u l e s when d i s p l a y e d action(R, V) <= radio(R, V), val(R, action(R, A) <= radio(R, V), v a l ( R , A \== V. action(R, none) <= radio(R, V), val(R, action(R, other) <= radio(R, V), val(R,  value i s c o r r e c t % rule for e x p l i c i t validation V), ev(R). % rule for e x p l i c i t l i e V), e l ( R ) , % this i s a l i e % rule for tacit validation V), t v ( R ) . % rule for implicit validation V), i v ( R ) .  % r u l e s when d i s p l a y e d action(R, V) <= radio(R, D), val(R, D \== V. action(R, A) <= radio(R, D), val(R, D \== V, V \== A. action(R, none) <= radio(R, D) , val(R, D \== V. action(R, other) <= radio(R, D) , val(R, D \== V.  value i s not c o r r e c t % rule for e x p l i c i t correction V), ec(R),  Figure 5.5:  % r u l e f o r weird l i e V), wl(R), % rule for tacit l i e V), t l ( R ) , % rule for implicit l i e V), i l ( R ) ,  Rules for perceptual salience  112  cumulative probability influenced by the following rules and assumables; there is one rule for each of eight interaction classes (see Figure 5.5), of the form:  action(Var,  Action)  display(Var, value(Var,  Display)  A  Value) A  class(Var) where Var is the name of the random variable under consideration, Action is the actual action taken by the user, and Display is the description of the display. class(Var) is a conjunct that can be satisfied only via assumption, forcing the cumulative probability of the current model to be multiplied by the prior probability of membership in the interaction class. Other conjuncts are included only to ensure that the eight rules are exclusive. Users can explicitly validate the contents of the display, with respect to a particular random variable; a faculty member can do this, for instance, with respect to the "user-type" random variable, by clicking on the already highlighted "faculty" radio button (see Figure 5.6). Users can tacitly validate the contents of the display by performing some other action, such as requesting that the video presentation be made without further delay. Users can implicitly validate the display with respect to one random variable, by performing an action at the interface with respect to some other random variable; a faculty member might implicitly validate the assumption of the system that the user is a faculty member (represented to the user by the highlighted "faculty" radio button), for instance, by clicking on a button that expresses gender or age, or geography. Users can be explicitly lying, by clicking, for instance, on the "student" button when they are in fact faculty members.  113  Each item below represents an important assumption that the system has maile about yuu. Correctiiiy Uieassumptu «s hehaviour of the system and tiie nature of your presentations.  C ijsefType'i'acuityi C userType(stud8rit3 C userTvpefstafl) ei(maie) (~ gsnaeijfemals) BC<IOC*II  C  geoi'prospectivi?)  Figure 5.6: The Valhalla User Model Window A tacit lie is when the user takes no action to correct a false assumption by the system. (The user could obviously be missing the cues in the display by accident; the names attached to the eight interaction classes are purely syntactic, and do not necessarily indicate an intent on the part of the user to deceive the system. The names merely make it easier to remember the different categories.) A n implicit lie is when the user takes some action with respect to some random variable, but ignores an incorrect assumption by the system. Explicit correction is the normative action taken by a user when he or she discovers an incorrect assumption by the system; thus the faculty member may click on the "faculty" radio button when the system has incorrectly assumed that the user is a student. The normativity of this action is represented by the high prior probability associated with it. Finally, the weird lie category captures the action with respect to a particular random variable, where the system has made an incorrect assumption, the user takes action to change this value, but changes it not to the correct value but to another incorrect value. The faculty member, for instance, upon perceiving that the  114  % assumables f o r p e r c e p t u a l s a l i e n c e c a l c u l a t i o n s d i s j o i n t ( [ e v ( R ) :0 .15, % explicit validation tv(R):0.4, % tacit validation iv(R):0.4, % implicit validation el(R):0.05]). % explicit l i e disjoint([tl(R):0.05, il(R):0.1, ec(R):0.8, wl(R):0.05])  % % % %  tacit l i e implicit l i e explicit correction weird l i e  Figure 5.7: Assumables for perceptual salience calculations system has made the incorrect assumption that the user is a student, might then click on the "staff" radio button to indicate to the system that he or she is a staff member rather than a faculty member or a student. The code fragment in Figure 5.7 shows the assumables which represent the eight possible classes to which user interaction can belong. The syntax in use again here is disjoint{[classi{Var)  : Pi,class (Var) 2  : P , • • •]), where P is the prior 2  n  probability associated with actions of the corresponding classes. Some of the more cumbersome detail and irrelevant syntactic sugar has been omitted from the preceding, but the point should be clear. When the user changes a value displayed by the system, the assumption by the system that the user's action is an explicit correction, for instance, is preferred over the assumption that the user is explicitly lying to the system; the calculation of the final user model is influenced (as usual) by these incremental assumptions. Perceptual salience is a potentially strong source of knowledge upon which to predicate further reasoning, and w i l l be the subject of much future work. Any system that makes presentations to users can benefit from modelling these interactions at some level of abstraction, even at a very basic level.  115 The probabilities, of course, should be determined through some empirical approach; the values used here are plausible, and serve to order models in which the user has manipulated the user model window in some way. Actual values in a future system would be determined by observing the behavior of users during testing sessions designed for that purpose. A s an example of perceptual salience in the system, let the facts F consist of the rules shown in courier font in this chapter, and let the set of potential hypotheses H (the assumables) consist of the disjoint sets shown throughout this thesis. The initial, "empty" set of observations stems from the intent that underlies the presentation; in this case, the "show" predicate (see Figure 5.8) encapsulates the intent, and requires that certain assumptions be made. This is equivalent to saying that the initial set of observations Obsi is {3uTUserType(UT) A 3ogender(G) A  3 geo(W)}. w  Assume now that 1) the system has arrived at the model consisting of the assumptions that the user is a student, male, and local to the department, (i.e., that  UT = student, G = male, W = local) 2) the system has represented this model to the user via the user model window, and 3) the user has explicitly corrected the model by clicking on the Faculty button. The new observations Obs^ to be ex-  plained are: {action(userType, faculty), action(geo, other), action(gender, oth Referring to the rules, the observations Obs now to be explained are: Obs =  065/ U Obs . N  The probability of the model before the user takes action is P  x  = Prudent *  Pfnale * ^/oca/- After the user takes action, the following possibilities exist with respect to the user type variable: The user is a student, and is explicitly lying about being a faculty member; the user is a faculty member, and has explicitly corrected the system's error; the user is a staff member and is lying weirdly about being a faculty member. With respect to the locality variable, the following conditions exist: The user  116  is local, and has implicitly validated the system's assumption; the user is remote, and has implicitly lied by ignoring the system's false assumption. With respect to the gender variable, the following conditions exist: The user is male, and has implicitly validated the system's assumption; the user is female, and has implicitly lied by ignoring the system's false assumption. In the absence of other information or observations, the most likely model is the one in which the user is a faculty member, and has explicitly corrected the system's misattribution (and implicitly validated the system's locality and gender assumptions). The probability of this model is Pf  acidty  *P  t o c a  i *P i  ma e  * 0.8 * 0.4 * 0.4.  Here are all the possible explanations (user models) of the observations Obs:  s  R e c o g n i t i o n assumptions w i t h P r o b a b i l i t y iv(gender) gender(male) iv(geo) g e o ( l o c a l ) ec(userType) u s e r T y p e ( f a c u l t y )  0.024752  are:  R e c o g n i t i o n assumptions w i t h P r o b a b i l i t y iv(gender) gender(male) i l ( g e o ) geo(prospective) ec(userType) u s e r T y p e ( f a c u l t y )  0.003332  are:  R e c o g n i t i o n assumptions w i t h P r o b a b i l i t y i l ( g e n d e r ) gender(female) iv(geo) g e o ( l o c a l ) ec(userType) u s e r T y p e ( f a c u l t y )  0.001092  are:  R e c o g n i t i o n assumptions w i t h P r o b a b i l i t y iv(gender) gender(male) iv(geo) g e o ( l o c a l ) el(userType) userType(student)  0.000572  are:  R e c o g n i t i o n assumptions w i t h P r o b a b i l i t y i l ( g e n d e r ) gender(female)  0.000147  are:  The width o f the probability band was set to 0.000005 (see Section 5.4.2).  i l ( g e o ) geo(prospective) ec(userType) u s e r T y p e ( f a c u l t y ) R e c o g n i t i o n assumptions w i t h P r o b a b i l i t y 0.000117 a r e i l ( g e n d e r ) gender(female) iv(geo) g e o ( l o c a l ) el(userType) userType(student) R e c o g n i t i o n assumptions w i t h P r o b a b i l i t y 0.000077 a r e iv(gender) gender(male) i l ( g e o ) geo(prospective) el(userType) userType(student) R e c o g n i t i o n assumptions with P r o b a b i l i t y 0.000022 a r e iv(gender) gender(male) iv(geo) g e o ( l o c a l ) wl(userType) u s e r T y p e ( s t a f f ) R e c o g n i t i o n assumptions with P r o b a b i l i t y 0.000016 a r e i l ( g e n d e r ) gender(female) i l ( g e o ) geo(prospective) el(userType) userType(student) R e c o g n i t i o n assumptions w i t h P r o b a b i l i t y 0.000016 a r e i l ( g e n d e r ) gender(female) iv(geo) g e o ( l o c a l ) wl(userType) u s e r T y p e ( s t a f f ) R e c o g n i t i o n assumptions with P r o b a b i l i t y 0.000003 a r e iv(gender) gender(male) i l ( g e o ) geo(prospective) wl (userType) u s e r T y p e ( s t a f f ) R e c o g n i t i o n assumptions with P r o b a b i l i t y 0.000002 a r e i l ( g e n d e r ) gender(female) i l ( g e o ) geo(prospective) wl(userType) u s e r T y p e ( s t a f f )  118  If other evidence suggests that the user is, for instance, a student (e.g., the user's id is known already to the system to belong to a student), the system may end up calculating a new, "best" model which includes a weird lie or tacit lie, etc. Note that the system reports the probability of a model M as a non-normalized prior P(M); this is adequate for the system's purposes because the value is used only to order models. In order to obtain a normalized posterior P(M\Obs), the system would have to calculate all models which explain the observations, to arrive at the sum P(obs) as seen in Equation 5.1.  ,M\Ob.\  P  1  1  1  =  P  '  ° P(Obs) M  A  f  a  )  = I^L = < > P(Obs) £,P(M.) P  M  (5 1) ' "  P  This would place an impossible burden on the reasoner, which is currently asked to calculate only the best model, and not all models. Nonetheless, for purposes of illustration, the normalized posteriors for the current example are provided in Table 5.1; the value of P(Obs) is 0.030148. The best explanation is seen from this table to account for over 8 0 % of the probability mass.  P(M)  P(M\Obs)  0.024752 0.003332 0.001092  0.821016 0.110521 0.036221  0.000572  0.018973  -\og(P(M\Obs)) 0.197212 2.202546 3.318108 3.964735  Table 5.1: Priors a n d normalized posteriors.  User M o d e l : formal definition  Formally, a model is defined as follows: Definition 4 — M o d e l : A model of the user is an explanation R consisting only of recognition assumptions which explain observations  119  Obs about the user: R c 71, FUR  ]/= ®,FUR  (= 0 6 s . The probabil-  ity of a user model is the product of the probabilities of its elements: P(R) = TlreR P( ): where we have assumed independence of recogr  nition partitions [161]. The 'best' model is the one with the highest probability.  5.4.2  Designing Presentations  A single abductive reasoning engine is employed for both recognition of the user model, and for design of the presentation. Design and recognition are interleaved, in the sense that the rule being applied by the reasoner could call at any point for the assumption of either a design or a recognition assumable; a partial model and a partial design are accumulated until either the proof is complete, or it fails. Various design decisions are made by the system in the course of reasoning. Just as models are defined by their constituent recognition assumptions and formally explain observations about the user, "designs" are defined by their constituent design assumptions, drawn from the set of design assumables, and formally explain the authorial intent. A n example is the use of design assumptions to induce a preference by the system for multiple topics rather than a single presentation topic. This preference can be induced by forcing the system to make an assumption whose cost depends upon the relationship between elements in the presentation. Specifically, the cost of the assumption is greater when the two elements are selected i n support of a single topic, than when they support multiple topics; the system currently values diversity over emphasis, because of the domain in which it is being used: Valhalla tries to provide interesting overviews of the department rather than in-depth detail. In conjunction with the show intent described earlier and shown in its entirety in F i g ure 5.8, the following design assumables have the desired effect.  120  disj oint([different_topic_cost:0, same_topic_cost:100]).  Another example is how the system arbitrates between showing or not showing scenic shots of Vancouver to users who are local or remote. Here is how a preference for showing scenic views to prospective (remote) department members can be encoded. The following rule licenses the showing of a scenic clip for the "right" reasons (i.e., the user is thought by the system to be prospective): s c e n i c _ c l i p ( p r o s p e c t i v e , Pin, Pout, Lenin, LenOut) <= e d i t L i s t ( P i n , Pout, mountains, Lenin, LenOut), P i n \== Pout. % make sure mountain c l i p s are n o n - n i l Pin is the presentation thus far designed, Pout contains the new mountain clip. The editList predicate makes relevance assumptions while trying to instantiate a clip (or sequence of clips) about the requested subject, in this case mountains. Lenin and LenOut are, respectively, the length of the presentation before and after the addition of the scenic clip. The following rule licenses the omission of the scenic clip for the "right" reasons (i.e., the user is thought by the system not to be prospective): s c e n i c _ c l i p ( G e o , P, P, L, L) <= Geo \== p r o s p e c t i v e .  % no mountains here  These two rules capture what are in some sense the "right" actions for the system to take; it is also possible for the system to take other design actions, but the "right" ones should be preferred. This is accomplished by forcing the system to make (relatively costly) design assumptions in order to take these alternative branches of the search tree. The following rule models the cost of showing a scenic clip for the "wrong" reasons. It forces the assumption of bigcostassumption, at a cost of 200; this is the  121  "nuisance factor" or cost that the system attributes to local users who are shown scenic views they could get by just looking out their windows: s c e n i c _ c l i p ( G e o , P i n , Pout, Lenin, LenOut) <= % e x t r a cost of mountain c l i p to [ p r o s p e c t i v e boredByView(200), Geo \== p r o s p e c t i v e , e d i t L i s t ( P i n , Pout, mountains, Lenin, LenOut), P i n \== Pout. % make sure mountain c l i p s are n o n - n i l Similarly, the next code fragment models the cost of omitting the clip for users who are not local, and who might have benefitted from seeing some nice scenery. s c e n i c _ c l i p ( p r o s p e c t i v e , P, P, L, L) <= % e x t r a cost of no mountains f o r p r o s p e c t i v e boredbyNoView(150).  types  The mechanics for inducing the appropriate design costs are included in the last code fragment, only for completeness:  disjoint([boredbyView(X):X,  boredbyNoView(X):X]).  It would not be difficult to generate such rules automatically from a table of preferences, such as the one shown in Table 5.2.  geo(local) geo(prospective)  Scenic  -> Scenic  200  0 150  0  Table 5.2: M y o p i c T r a d e o f f T a b l e  We call Table 5.2 the Myopic Tradeoff Table. It captures the formulation of preference alluded to in this section; the term myopic is used to emphasize that the system does not have a.complete table of utilities, but uses this approach as a  122  "greedy" discriminator between competing design elements. Such myopic strategies often work well in practice [176, p490]. The table can be constructed for disjoint sets of any cardinality, and for any number of design components; here we present only the myopic tradeoff table for the disjoint set  {(geo(local),p), (geo(prospective), 1 — p)} and the presentation element Scenic. The cost of showing a scenic clip to a local user (who can look out the window at the mountains any time) is 200; the cost of failing to show such a view to a remote user is 150; there is no cost associated with showing the scenic view to a remote user, or with omitting the scenic view for local users. These costs influence the final proof in the usual sense that lower cost designs are preferred to higher. The overall weight of an assumption, as always, is proportional to its magnitude; if a dimension is very important in the current domain, the cost magnitudes of its assumptions should be chosen to be greater than the costs of assumptions on other dimensions. This suggests that the costs of design assumptions should not be chosen in isolation, but according to some ranking of relative importance. Such a ranking is not likely to be known a priori but w i l l more likely emerge from iterative refinement of the knowledge base, as has been the case with the implementation under review here. Search Strategy The prototype referred to in this thesis employs a Prolog meta-interpreter which implements an iterative deepening search strategy wherein first the probability bound on models and then the cost bound on designs is adjusted to yield solutions in desired probability and cost ranges, or bands. The width of these bands can be adjusted to yield desired precision, trading execution time for precision. In classical iterative deepening search [172], the tree is searched depth-first one  123 level deeper on each iteration. The approach combines the space utilization advantages of depth-first search with the characteristic of breadth-first search that, i f there is a solution, it w i l l be found; in addition, because shorter paths are searched before longer ones, i f there are multiple solutions they w i l l be found in ascending order. The nodes i n the search tree of the application are connected by arcs which can be labelled with the probability factor or cost increment incurred in taking that branch. The cumulative cost of a partial proof is measured in terms of the probability of the partial user model M and the cost of the partial design D . The pair p  (P(M ),C(D )) P  P  p  characterizes the partial proof currently being evaluated by the  system: P(M ) is given by the product of all probability terms on the path through P  the search tree to the current node (i.e., the product of the probabilities of all recognition assumptions made thus far), C(D ) is given by the sum of all cost terms on P  the path through the search tree to the current node (i.e., the sum of the costs of all design assumptions made thus far). On the first iteration, the system searches for low cost designs and models with high probability; this is accomplished by setting the probability band to be between 1.0 and (1.0 — p), where p is the width of the probability band, and setting the cost band to be between 0 and c, where c is the width of the cost band. A n y partial proofs being considered whose cumulative probabilities drop below the probability band are discarded (they fail in the Prolog sense). Similarly, any partial proofs being considered whose cumulative costs exceed the cost band are discarded. A n y successful proofs (i.e., those where 1.0 > P(M) > p and 0 < C(D) < c) are reported in the order in which they are found. When all possible proofs within both bands have been considered, the cost band is advanced and the search mechanism re-engaged so that only proofs whose cost meets the condition c < C(D) < 2c are reported. This process is continued until all proofs have been considered, whereupon the probability band is advanced and the cost band is re-initialized so that  124 only proofs whose probability and cost meet the condition (1 — p > P(M) > 1 — 2p) A (0 < C(D) < c) are reported. The control structure is essentially a nested loop, where the probability band is decremented and the cost band is incremented as the outer and the inner loops, respectively, as in the following algorithm: repeat pmax = 1 ; pmin = pmax - d e l t a P ; repeat cmin = 0 ; cmax = d e l t a C ; repeat i t e r a t i v e - d e e p e n i n g - s e a r c h ( p m i n , pmax, cmin, cmax, pfail, cfail, fail, Model, Design, intent(Presentation)) ; report(Model, Design, Presentation) until cfail cmin = cmax ; cmax = cmax + d e l t a C ; until pfail pmax = pmin ; pmin = pmin - d e l t a P ; cmin = 0 ; cmax = d e l t a C ; until fail  This control structure determines the order in which the solutions are found, and can be tuned to effect a tradeoff between precision and time; the narrower the band(s), the more reliable the ordering of the solutions. For instance, i f there are five proofs with design costs  {6,7,8,8,9}, and the width  of the cost band c is  5,  all of these solutions w i l l be found on the second iteration of the cost loop, but not necessarily in best-first order. On the other hand, i f c = 1, the solutions are guaranteed by the algorithm to be found in increasing order of cost, but at considerable added computation (many more iterations, or passes through the search space, are required). The iterative deepening meta-interpreter approach ensures that the first solution to be found is the lowest cost design for the most probable model. Note that  125  setting the initial probability bound to zero results in getting the explanations in an order that depends only upon the design costs. Setting the initial cost bound to infinity results in an order that depends only upon the recognition probabilities. The underlying representation is compatible with an existing implementation of Poole's Probabilistic Horn Abduction Framework, which maintains an ordered queue of partial proofs [161]; in that implementation, partial proofs are not discarded, but suspended and queued when some other partial proof becomes preferred. Some small changes are required to the existing implementation of Poole's queuing mechanism before it can accommodate the separation of recognition and design assumptions advanced in this thesis; future prototypes may include these modifications.  Intent: formal definition Intent-based authors are free to deploy the full power of the underlying representation language to specify their intents. Formally, an intent is defined as follows: Definition 5 — Intent: A n intent I(Pr)  is a predicate over presen-  tations which is true when the presentation Pr satisfies the author's intent. For instance, the intent to convince the user might be captured in a c o n v i n c e predicate that encodes an argument structure consisting of an introduction, a body, and a summary (each of these with its own rules). A s an example of the specification of an intent, see the more detailed description of the s h o w intent provided in Figure 5.8.  Design: formal definition Formally, a design is defined as follows:  126  show(P) <= % UserType i s i n s t a n t i a t e d i n the f o l l o w i n g l i n e : interested(UserType, T o p i c l ) , % a f i r s t topic i n t e r e s t e d ( U s e r T y p e , Topic2), % a second t o p i c % both T o p i c l and Topic2 i n t e r e s t users of type UserType % T o p i c l and Topic2 are not the same: d i f f e r e n t _ t o p i c ( T o p i c l , Topic2), % make a bunch of assumptions about the u s e r and then f i n d % m o d e l - s p e c i f i c t o p i c s to p r e s e n t . . . gender(Gender) , geo(Geo), availableTime(UserType, Time), % accumulate an e d i t - l i s t e d i t L i s t ( [ ] , PI, i n t r o , 0, L I ) , % an i n t r o d u c t o r y subsequenc e d i t L i s t ( P I , P2, T o p i c l , L I , L2), % sequences f o r f i r s t and e d i t L i s t ( P 2 , P3, Topic2, L2, L3), % second t o p i c s % show an i n t e r v i e w w i t h someone of the same gender, % i n the same area or o r i e n t a t i o n i n t e r v i e w _ c l i p ( X , P3, P4, L3, L4), % interview gender(X, XGender), same_gender(XGender, Gender), % a l s o , i f the user i s not l o c a l , show i n t e r e s t i n g Vancouver % mountains otherwise, s k i p i t s c e n i c _ c l i p ( G e o , P4, P, L4, L ) , costLength(L, Time).  Figure 5.8: Show: an example intent  127 Definition 6 — Design: A design (given a model R) is an explanation D of I(Pr) consisting of only design assumptions: D C V, T U RU  The cost of a design is the  D tf= V),! U RU D \= 3 I(Pr). 7  Pr  sum of the costs of the design assumptions of which it is composed: (*(D) — J2deD ^(d)- The 'best' design for model R is the least cost design that is consistent with R. A design "produces" a presentation of information (e.g., graphs, video, text). Note that different designs may produce the same presentation (there may be different reasons for presenting the same information). The best presentation in the context of model R is the presentation Pr supported by the least cost design that is consistent with R. Definition 7 — produces(D, M, Pr): We define for notational convenience the relationproduces(D, M, Pr), which is true when design D and model M lead to presentation Pr as described here, i.e., produces(D, M, Pr) S f U M U D  ^ I j f U M U D |=  3 I(Pr) Pr  The partitioning of H partitions each explanation of Obs A 3p I(Pr) r  into a  model and a design component which we denote as (R, D). We define a preference relation y  p  over explanations such that: (Ri,Di)  y  p  (R ,D ) 2  2  iff (P(R ) 1  > P(R ) or ( P ( R ) = P(R ) and ^(D,) 2  J  1  2  < p(D ))) 2  This results in a lexicographic ordering of explanations. So, the best explanation consists of the most plausible model of the user and the lowest cost design.  128  Note that the design D which explains presentation Pr'ml(Pr) for some model R is not necessarily generated for some other model R', i.e., there is no reason why produces(D, R, Pr) should imply produces(D, R', Pr). The logic is a means of "weeding out" incoherent designs, and hence presentations. This separation of model recognition and presentation design assumables also results in interesting ramifications for the way presentations are chosen; in particular, using y  p  means that we do not give up good models for which we can  find only bad designs. For instance, consider the case where we have disjoint assumables (student, P ) and (faculty, Pj), where P ^> s  s  Pj, but the lowest cost de-  sign in the context of a model that assumes the user is a student is greater than the one in the context of a model that assumes the user is a faculty member (i.e., fi(D  best  | • • • student • • •) 3> fi(D  best  | • • • faculty • • •). We do not give up the as-  sumption that the user is a student; the reasons for deciding in favor of student are not affected by the system's inability to find a good (low-cost) presentation. This behavior is a direct consequence of the methodological separation of design from recognition assumables. Were there only a single space of assumables, the system would not be able to make these distinctions, and would simply select the best model/design combination according to the ordering metric.  5.5  Interactivity and Scrutable Models  One criticism of systems which attempt to model their users is their inscrutability to these users (see, for instance, Cook and K a y [52] and Orwant [154]). If the system acts in an infelicitous manner which does not meet the needs of the user, and where this action results from an error in the user model, a desirable property of the system is scrutability. Users should also be given the means by which to express their dissatisfaction with a presentation (or with elements of a presentation), always with an eye to pro-  129  viding users with better presentations. This interaction paradigm supports both the scrutability and the dissention conditions described above. In calculating a model that represents the user, the system makes observations upon which further reasoning is conditioned. The user is provided with direct access (via a graphical user interface) to salient elements of the user model, as well as with the ability to criticize these elements. It is this interaction which is used to evolve the user model from a basic model initially derived from such information as the location of the user's terminal, login id, Internet domain, and so on.  Sensitivity Analysis A problem with displaying the model to the user in support of scrutability is that the model may be huge. Certainly, there may be too many recognition assumptions to be effectively displayed on a screen, and far too many for quick assimilation by users. This cognitive comprehension task for the user is characterized by focused, serially-directed visual attention. Research in psychology has shown that the time taken for such tasks is proportional to the number of objects in the visual field [193]. A solution is to display a salient subset C of R, to which we refer as the salient partial model, consisting of those assumptions which have the greatest effect on the design (and therefore upon the presentation as well). The cardinality of C can be varied to maintain an appropriate pace of interaction; if the user is taking too long to evaluate the alternatives, the number of alternatives in the next iteration can be reduced. In addition, since the alternatives can be ranked, [graphical] display techniques can be used to render this ordering to the viewer. In this section we describe the sensitivity analysis currently used by the system to select the critical recognition assumptions. In the following, let R = { r i , r , • • •, r } 2  n  represent the currently most plau-  130  sible model, D the best design that is consistent with R, and Pr the presentation generated. We require a total ordering of the assumptions r, G R by which to rank these assumptions for display to the user. Some useful notation follows. Definition 8 — C (Pr, a): Let C (Pr, a) be the (lowest) cost of the p  p  (best) design that produces presentation Pr in the context of some model M that contains assumption a, i.e.,  min  C (Pr,a) = p  C(D)  D,M:a£MAproduces(D,M,Pr)  To sort the r; G R from most to least influential, we now need an expression parameterized by the assumptions r,- G R, which we can use as a measure of the influence of r, on the cost of D; we want to know how much of a mistake we would be making i f r,- does not correctly model the user, e.g., i f we have assumed that the user is a faculty member whereas i n fact the user is a graduate student. A n expression that serves this purpose is defined as follows:  Definition 9 — Sensitivity: If produces(D, R, Pr), then the sensitivity of Pr to an assumption r G R is the expected cost of the presentation Pr (to users which are incorrectly modelled by r). Let J be the disjoint set {h!,h , • • •} that contains r: 2  S(r,Pr)=  Y,G {Pr,h )P{h ) P  l  i  h.eJ Definition 9 does not consider all possible alternatives to R; i f more than one r{ G R do not correctly model the user, this formalism w i l l not be able to diagnose the multiple fault. This "myopic evaluation" [176, p490] is employed instead of joint sensitivity analysis which would consider multiple simultaneous errors in the  131  model because: 1) single faults are more likely than multiple faults, 2) account9  ing for multiple faults is exponential in the size of the model whereas the myopic approach is linear, and 3) the effects of multiple faults would be very difficult for users to track, resulting in a huge cognitive burden that compromises the scrutability requirement. The assumptions in the model R are sorted on the sensitivity of the presentation to each of the assumptions (i.e., to each of the r; G R for all i). The salient partial model C = { c i , c , • • •, c^}, referred to above, consists of the first k assumptions 2  in this sorted list. These are the assumptions to which the design, given the current user model, is the most sensitive. In the interaction paradigm related here, the user is shown the assumptions in C, in the context of the disjoint sets to which they belong; i.e., where c;  =  (Namei, Pi), the system displays Namei along with all the names of the assumables in the disjoint set to which c belongs. For example, if Namei = faculty and 2  some J = {(faculty, 0.6), (student, 0.3), (staff, 0.1)}, the user sees a set of names faculty, staff, student with the actual assumption highlighted. The number of hypotheses displayed, k, is set to some small integer like 5 or 7. In sample implementations so far, simple text is attached to radio buttons, so that users see not only the assumption that was made, but the other members of the disjoint sets to which the assumption belongs; a simple click on a radio button instructs the system to recalculate the user model with the new assumption. See Section 5.6 for a description of the G U I by which C is communicated to the user.  5.6  Example  The following example is taken directly from the prototype implementation described in the next chapter; specifically, the reasoning and modelling strategies deI.e., if p < l,thenp  9  n  132  scribed in this thesis have been embedded in an application called Valhalla, an automatic video presentation tool that uses the reasoning strategies we have described to select, order and play video segments from the Departmental Hyperbrochure described in Section 1.4. This application uses the authorial intent called "show," the predicate presented in Figure 5.8. In this example, the system was started from scratch after the user had logged in with the personal user id, c s i n g e r . The user immediately requested a presentation from the system, which responded as follows (comments have been added for clarity): The initial model, with likelihood 0.029494, is composed of the following recognition assumptions: married(yes) orientation(theory,y) geo(local) area(ai,y) c_gender(male,student) userType(student)  the user is married orientation: theoretical is local to region research area is ai is student, and consequently male is student  These are based upon prior probabilities and other contextual information such as the user's login id. A t U B C , we consult a local personnel database to make a best guess about the user's membership in a variety of categories; this database informs us, for instance, that user c s i n g e r is a graduate student, and that user p o o l e is a faculty member. Using this source to justify the assumption that a user with a certain login id belongs either to the class faculty or student is reasonable, but not infallible. User c s i n g e r might have logged in a visiting faculty member, for instance, so that he or she could use Valhalla to learn more about the department while Csinger attended to an unavoidable interruption in the meeting they were having.  133  The best design based upon this model "costs" 51 and is composed of the f o l lowing design assumptions: overLength(25) rel(topic(lei),topic(research),20) rel(video(sports),topic(sports),5)  rel(topic(introduction),intro,1)  different_topic_cost  amount by which exceeds optimal presentation time cost of relevance assumption: lei is relevant to research cost of relevance assumption: a video clip about sports is relevant to the topic of sports cost of relevance assumption: a clip called intro is relevant to the topic "introduction" the topics included are not identical  Associated with each design assumption is a cost. The cost of the assumption d i f f e r e n t _ t o p i c _ c o s t , for instance, is less than the cost of the assumption s a m e _ t o p i c _ c o s t , from the same disjoint set; this arrangement prefers presentations with multiple topics over those with single topics. Some costs are context dependent: the cost of the o v e r 1 e n g t h assumption is proportional to the amount by which the length of the current presentation exceeds the optimal length for a user of this type. The corresponding presentation Pi is: (0:1:26,0:2:27,topic(introduction)), (0:24:44,0:25:3,video(sports)), (0:12:15,0:15:54,topic(lei)), (0:31:54,  0:32:44,interview(jim_kennedy))  Specifically, P consists of four clips from the video archive with the indicated y  absolute time-codes and topic identifiers.  134  The user at this point indicates dissatisfaction with the current presentation, perhaps after viewing part of it via the Virtual V C R interface, by clicking on the button provided for that purpose. The system interprets this action as a request for another presentation using the same user model. The next best design based upon this model "costs" 56 and is composed of the following design assumptions: overLength(35.0) relevant(topic(lei),topic(research),20) relevant(topic(introduction),intro,1) different_topic_cost The corresponding presentation is: (0:1:26,0:2:27,topic(introduction)), (0:12:15,0:15:54,topic(lei)), ( 0 : 3 1 : 5 4 , 0 : 3 2 : 4 4 , i n t e r v i e w ( j im_kennedy)) The costs of the two designs differ because there is a preference built into the system for presentations of an optimal duration, defined by the user's membership in certain categories. The user can now navigate through the presentations with a virtual V C R interface. W h i c h assumptions are displayed in the user model window? This is determined by the sensitivity analysis algorithm, the results of which are shown in Table 5.3; this table indicates sensitivity calculation results for the first model and design pair described in this example. The second column shows the calculated sensitivity for the assumption in the first column. The presentation is most sensi-  135  S(r,-,Pi) geo(local) c .gender (male, student) userType(student) area(ai,y) orientation(theory,y) married(yes)  104 96 61 51 51 51  Table 5.3: Results of Sensitivity Analysis before user action tive to the assumption that the user is local, and completely insensitive to his research area or orientation, or to his marital status.  10  Because the assumptions are only hypothetical, provision must be made for their revision, both in terms of the underlying reasoning engine, and in terms of a mechanism whereby such revised data can be acquired. In particular, i n support of the scrutability desideratum, users can interact with the system to validate or correct the salient assumptions of the model. Valhalla gives the user the means by which to validate or correct assumptions that the system has made by displaying critical assumptions i n what we call the user model window,  an instance of which is shown in Figure 6.5.  Returning to our example, the user informs the system that he is not, in fact, local, but a prospective student by clicking on the appropriate radio button. The system then re-calculates, with the result that the new model, with probability 0.045375, is:  married(yes) orientation(theory,y) / . e . , S(r, Pi) = C(D) = 51, when r is an area, orientation, or marital status variable. In other words, the cost o f the original best design is the same as the cost o f designs induced by the sensitivity metric with respect to these assumptions in the context of presentation Pi. 10  136  area(ai,y) c_gender(male,student) userType(student) geo(prospective)  # user  i s n o t from  the region  The new design, with cost 137, is: relevant(topic(ubc_scenic),mountains,1) relevant(topic(introduction),intro,1) different_topic_cost The corresponding presentation P is: 2  (0:1:26,0:2:27,topic(introduction)), (0:18:17,0:28:27,topic(ubc_scenic))  S( ,P ) 267 242 237 237 227 227 ri  geo(prospective) c_gender(male,student) userType(student) area(ai,y) orientation(theory,y) married(yes)  2  Table 5.4: Results of Sensitivity Analysis after user action The contents of the user model window depend again upon the results of the sensitivity analysis, shown in Table 5.4, where the second column shows the calculated sensitivity for the assumption in the first column. In presentation Pi, a preference for multiple topics is in evidence. This preference stems from forcing the system to make an assumption whose cost depends  137 upon the relationship between elements in the presentation. The system currently values diversity over emphasis, as described in Section 5.4.2. In presentation P , the sensitivity to the assumption of locality is demonstrated. 2  When the user was assumed to be local (in P i ) , there was no clip in support of scenic views of U B C , but when the user is assumed (in this case because the user actually informed the system) to be prospective and therefore not local, a scenic clip is included. Note that this clip is included at the expense of the earlier research and sports clips; the system currently places a greater value on showing U B C ' s scenic merits to prospective department members, than on any other design element.  5.7 5.7.1  Alternative Approaches Decision Theory for Multimedia Presentation Design  Here we explore how decision theory could be used in the service of the stated goals of intent-based authoring, and argue that expected value (see Section 4.2 for background) is probably not the right approach. In this section we focus on the interaction paradigm in support of what has been referred to in this thesis as scrutability: a scrutable system is one that makes clear to users the relationship between the model of the user maintained by the system, and the behavior of the system. The means by which scrutability is achieved in the system is an approach to interaction that permits users to critique the user model in pursuit of better system behavior. A n important element of this approach is the approach to sensitivity analysis, described in Section 5.5. Cheeseman [40] discusses the use of decision theory for design, and has suggested that the best design is the one that results from averaging over all possible models and maximizing expected utility or minimizing expected cost; in Equation 5.2 the expected value of the cost function C over designs is to be minimized,  138 and the best design is D : E(C(D)) is minimal.  E(C(D)) = n(D\M )P(M ) 1  1  + ,(D\M )P(M ) f  2  2  + --- + i(D\M )P(M ) f  n  n  (5.2)  u(D\M) is the cost of design D in the context of model M, and P(M) is the posterior probability of model M (refer to Section 5.4 for formal definitions of the terms "model" and "design," which are used informally here). There are reasons in the current application why this may be the wrong thing to do. For example, consider the following scenario. Assume, for simplicity, that users of the Departmental Hyperbrochure (see Section 1.4) can be only faculty members with prior probability 0.6 or students with prior probability 0.4, respectively, and that the following costs are known: ti(sports\faculty) = 10, ii(sports\student) = 5, ii(research\faculty) = 5, fj,(research\student) = 15. Assume also that a current user of the system is a faculty member, that the system has assumed that the user is a faculty member, and that this assumption is clearly indicated to the user. Then, E(C (sports)) = 8, and E(C(research)) = 9. There is no indication to the user why the former is chosen for presentation over the latter, and the user may wonder why he is watching an irrelevant basketball game. The faculty member using the system may not be aware of the high degree of aversion on the part of students to research—a value not shown to the user (because there may be a very large number of such values). The behavior of the system is inscrutable. It may be better to make a mistake and admit it, giving the user the means to correct the contextual error, rather than average over all possible mistakes. There are exceptions, and it is not difficult to conjure counterexamples under slightly different assumptions from the ones that guide our own effort; consider a scenario where it is possible to present material that is known to be highly offensive to a particular category of user, but which is known to be greatly appreciated by another; pornography is one such delicate issue today, and policies are under review at academic departments around the globe. Another example: one would not want  139 to present a viewer with sensitive corporate data on the assumption that he or she is a joint-venture partner, only to find out later that the viewer is from a competitor's firm. The system calculates the most likely user model, and then uses it to design the best presentation. Refer to Section 4.1.1 for background material about the formalism.  5.7.2  Costs and Utilities  This section is a discussion of how the values associated with the design assumables should be interpreted. These quantities can be interpreted in a number of implementation-dependent ways: it could be an estimate, for instance, of how hard it is for the system (from a computational point of view) to realize the design element, or of how much cognitive or perceptual effort (the system thinks) w i l l be required from a human to comprehend some manifestation of the design element. The values are used to constrain choices in the design space (for instance, to prefer some alternatives over others as in the example provided in Section 5.6); this interpretation sees the values as a measure, attributed by the system on behalf of the user, of how relevant a design element is to the user modeled by the current user model. The investigation in this thesis has been formulated in terms of a system which minimizes positive design cost; this approach encourages designs which are " m i n i m a l " with respect to the number of design assumptions made. A l l costs associated with design assumables are positive, and the function used to accumulate total cost is simple summation, which means that the total cost function is monotonic in the sense that it can only increase as additional assumptions are made over the course of the proof. It is also "unbiased" in the sense that it is possible only to express relative preference (see Expressiveness, below). Another possibility would be to implement the dual system which maximizes  140  positive utility; this approach would produce designs which are "maximal" in the number of design assumptions. These distinctions are important because they affect the expressiveness of the underlying knowledge-representation language, as well as the computational complexity of the implementation that is built with it. Expressiveness:  It is not the same thing to say that 'presentations that do not  depict sports events are preferred over those which do' and to say that 'do not show sports events.' The first can be expressed with a system that maximizes positive utility by setting the utility of sports events to be lower than any other utilities, or with a system that minimizes positive cost by setting the cost of sports events to be higher than the costs associated with presenting other kinds of events. The second requires an ability to express negative bias ('user hates sports') which cannot be represented in a system that uses a monotonic utility function; making additional assumptions can only increase cumulative cost (or decrease utility) and decrease cumulative probability.  Complexity: tonicity  11  The aforementioned expressive power is given up in favor of mono-  because non-monotonic utility functions would require the system to gen-  erate the entire search tree before reporting the best solution: branches would have to be followed to their leaves because arcs could be labelled with negative values, and all branches would have to be followed before it could be known which of them lead to the best design. (Partial proofs would do the system no good at all.) A s the search space may be very large, this would be an unacceptable strategy in our presentation domain, which currently benefits from a best-first search strategy (see Section 5.4.2). These issues can not be simply dismissed by arguing that the system could shift the scale of costs or utilities associated with the design assumables so that the low1 :  T h e term "monotonicity" is used here not i n the logical sense of non-monotonic reasoning, but  in the classical mathematical sense of a function whose derivative does not change sign.  141 est cost or the lowest utility is set to zero. This compromise strategy taken to permit the expression of negative bias while retaining favorable complexity parameters, introduces complications of its own. Consider, for instance, a candidate design with a "research" component (with utility of 100) and ten "sports" components (each with utility of —10); such a design would have a net utility of zero, and any other design with a positive utility would be preferred to it. For instance, a candidate design with a single "entertainment" component with total utility 5 would be preferred. Assume also that there are no design assumables in the database with associated utilities that are "worse" (less than) — 1 0 . If the assumables in the database were preprocessed by the system as discussed above, the assumption associated with the "research" component would now have a utility of 110, and the assumption associated with the "sports" component would have a utility of zero. The candidate design would now have a net utility of 110, and would now be preferred to the "entertainment" candidate, which is now worth 15. Such changes in design preference would be difficult for knowledge engineers and intent-based authors to foresee, and so this scaling remedy does not appear to be a good solution. A g a i n , the ability to explicitly express bias is given up in favor of mathematical monotonicity, which supports the best-first search strategy.  5.7.3  Other similar approaches  W u [200] uses passive recognition techniques to build a model of a dialog partner, but decision theoretic measures are used to decide when the system should intervene to make direct inquiries of the user. The system is said in such an event to have adopted an active  acquisition goal; these A A G s are isomorphic to the salient  set of assumptions described above, in the sense that they are the ones judged by the system using decision theoretic measures to be the (only) ones with which it might be worth distracting the user. The approach taken in this thesis differs significantly, however, in that the user can continue to ignore the user model window,  142 and in that even this inaction can be used as a basis for drawing further conclusions.  5.8  Conclusions  Guided by a desire to build cooperative systems, a way was sought to acquire, represent, and exploit simple models of users. The models are acquired with an abductive reasoning strategy, and consist of assumptions about the user, drawn from a pool of possible hypotheses called recognition assumables. Associated with each of these assumables is a probability, and the probability of a model is the product of the probabilities of the assumptions composing the model. The model explains observations of the actions of the user at the interface to the system. Design assumptions must be consistent with model assumptions, and are drawn from a pool of possible hypotheses called design assumables. Associated with each of these assumables is a cost, and the cost of a design is the sum of the costs of the assumptions composing the design. The design and the model together explain a presentation that satisfies the communicative goal, or intention, of the author. The requirement of scrutability was identified, which is the constraint that the behavior of the system should be seen by its users to be a consequence of the user model. It was found that this goal is not met by expected value approaches to design. The minimal A I approach to user modelling described here services the scrutability condition by providing the means to perform a kind of sensitivity analysis on components of the user model, so that only the most critical elements are displayed to the user, who can then criticize them. The contribution of this thesis is both an interaction paradigm that permits the user to "debug" the user model, and a probabilistic Horn abduction approach to reasoning that implements the paradigm.  Chapter 6 Implementation But men do not begin to act upon theories. It is always some real danger, some practical necessity, that produces action; and it is only after action has destroyed old relationships and produced a new and perplexing state of affairs that theory comes to its own. Then it is that theory is put to the test. — H . G . Wells Outline of History, page 693  A prototype of the system described in the rest of this thesis has been developed. It embodies the intent-based authoring ideas advanced in this dissertation: Valhalla is a scrutable system that selects and orders video clips from a repository of material describing the Department of Computer Science at the University of British Columbia, and the examples in this chapter are drawn from the Departmental Hyperbrochure, described in Section 1.4. Valhalla decouples the specification from the presentation tasks of authoring, abandoning the traditional model in favor of the intent-based paradigm. The author brings an intent, and information he thinks w i l l be relevant to the eventual presentation. After annotation, a representation of this intent, and a set of indices into the raw video reside in a "document." This is all done at compiletime, in the absence of the viewer. Later, at run-time, the reasoner uses the docu-  143  144  ment, along with the user model and other knowledge, to produce an edit-list. The viewer, even i n the absence of the author, sees only the most relevant portions of the video when the user model is accurate, and can take remedial action when it is not.  6.1 Architecture A distributed architecture was chosen for the prototype implementation. Populating Valhalla's framework are a number of autonomous agents that fall into the f o l lowing classes: client applications, network-based multimedia servers, and other service providers such as reasoning engines and annotated video databases. F i g ure 6.1 is taken from Gribble [88] and illustrates agents within the Valhalla application framework.  reasoning engines and other service providers  client applications  9 9  9 network  H  3  multimedia servers  Figure 6.1: Valhalla's Distributed Architecture  145 Each class of agent has associated with it a single communications protocol, which allows client applications to transparently communicate with all instantiations of that class. It is this "plug and play" compatibility between members of a class that makes the Valhalla framework flexible and powerful. One member of each class of application has been currently implemented in the prototype framework. A client application, also named Valhalla, provides a user with intent-based authoring services. This client uses the services of a reasoning engine to generate descriptions of video presentations, and displays the video sequences within the presentations using the services of a distributed multimedia server.  The Multimedia Server F r o m the perspective of client applications, the server is a single entity providing virtual V C R - l i k e control over multiple media sources. The server has the ability to simultaneously operate in two modes: in local mode, media sources are conventional analog devices, whose output may be routed to a number of available display devices using an R S - 2 3 2 controlled switch. In remote mode, digital media is transmitted over the network to the client application from one or more remote sites, and is controlled using media playback applications present on the client's host. The server is implemented as a series of increasingly abstract application programming interfaces (APIs). Each interface can be directly accessed by a client application, but typical operation of the server would only involve access from the highest (and most abstract) layer. A higher layer interface uses the services of a lower layer in order to provide its own services. Figure 6.2 illustrates the relationship between these interfaces and the various components of the server. The lowest layer, called the device layer, is accessible only through T C P socket communications and is composed of a series of device drivers. There is one device  146  Figure 6.2: The hierarchy of interfaces to the video server  147 driver for each physical device, and one driver responsible for dispensing a class of digital data. The next most abstract layer (the class level) is accessible to client applications via an A P I to a library that directly communicates with the server on behalf of the client. This intermediate layer provides completely separate interfaces for each class of device. Devices of the same class share characteristics particular to that class. Providing a separate interface for each class allows client applications to take advantage of these particular characteristics. Supported classes currently i n clude digital video, digital audio, random-access analog video (optical video disc), and tape-based analog video. Future extensions to the server w i l l support other device classes, such as text, M I D I or digital audio. The top layer, called the v i r t u a l - V C R level, is again accessible via an A P I and provides an interface that client applications can use to control any device. Characteristics of different device classes have been abstracted away in order to provide a single V C R - l i k e interface. Using an A P I to communicate with the more abstract layers of the server is advantageous for a number of reasons. The inclusion of a library into the client executable facilitates the distributed architecture of the server; each library can be considered to be an agent of the server executing on the client's host machine. The presence of the server on the client's machine allows the server to directly control applications on the client's host. Such applications would be used for the playback and manipulation of digital media. Secondly, the A P I itself has been designed to provide a uniform method of interacting with multiple media sources and formats, allowing the differences between classes of media to be partially abstracted away. The video delivery component of the system is designed to handle tape, video disk and digital video through a video server mechanism currently implemented on a Sun architecture [88].  148  The Valhalla Client Applications The first Valhalla client prototype has been completed, and another is currently under development.  Figure 6.3: The Valhalla Control Window Figure 6.3 depicts the Valhalla control window, the main human interface to the Valhalla client, which is implemented in Objective C and currently resides on a N e X T cube equipped with a N e X T Dimension board. The control window contains — i n addition to the familiar virtual V C R control panel at the lower l e f t — controls to advance to the next clip in the current edit-list, to return to the previous clip in the current edit-list, to replay the current clip, and to proceed with the presentation ("Go"). Depression of the "Show" button is interpreted as a request to calculate the next best presentation for the current model, and is passed on to the reasoner. The " S h o w " button is "wired" to a predicate in the reasoner's knowledge base that encodes a specific authorial intent; for the Departmental Hyperbrochure project de-  149  scribed in Section 1.4, the underlying intent of all presentations is always to inform the viewer, to make maximally relevant presentations of the research and other departmental activities to individual viewers. Other applications using the same client interface program may have the "Show" button wired to other predicates encoding other intentions, and it would not be difficult to provide multiple buttons on the interface so that the user could select different (pre-defined) intentions at run-time. It is also possible in the current version of Valhalla to author new i n tentions at run-time through a special query window that is not shown or discussed further in this document; this avenue is not pursued because the system does not demand that the human in the role of viewer be able to program in Prolog, and to understand first order logic. (It does currently require the authors and knowledge engineers involved to have these skills.) " N o ! " is merely a direct way for the user to express dissatisfaction with the current presentation, freeing the reasoner to recalculate both model and design as required. A n y activity at the control window is echoed to the reasoner, which can use plan recognition techniques to infer the motives of the user from these observations of user behavior. The label of the current interval as provided in the annotation database is displayed. Manual laser disk controls include absolute frame indexing. The services of the multimedia server are used to display clips contained in an editlist. Currently, all media associated with the Hyperbrochure is in the form of (synchronized) analog video and audio stored on C A V laserdiscs. Analog signals from two laserdisc players are routed to a video digitizer board within the N e X T computer, and the resulting digital video is displayed in a window on the N e X T ' s display. A n example frame from a campus sporting event is shown in Figure 6.4. The needs of individual users are met by referring to the user model, which is arrived at by the reasoning method outlined earlier in this thesis. A s discussed earlier, the user is given the opportunity to critique a selected subset of the model  150  Figure 6.4: A frame from the Departmental Hyperbrochure via Valhalla's user-model window, shown in Figure 6.5. The hypotheses actually displayed to the user are context-dependent, selected and ranked by a sensitivity analysis algorithm (described in Section 5.5) that reflects the degree of importance to the design, of each assumption in the model. In addition, the techniques employed to display these assumptions reflect their relative importance; quantities to which the design is most sensitive can be shown, for example, in bolder fonts, brighter colors, larger characters, and so on (cf. perceptual salience, Section 5.4.1). Every effort is made to sanction the further assumption by the system that the user has actually seen and attended to the display in the user model window. The Valhalla User M o d e l window implements the interactivity paradigm advocated in Section 5.5. Clicking on any element of the User Model window first sends a message to the Reasoner which causes it to succeed in its processing loop; this message is followed by an instruction that encodes the change to the user model that the user has just specified. The Reasoner makes the requested change to the user model and then calculates a new presentation. In Figure 6.5, the screen shows a number of sets of radio buttons, which is Valhalla's display technique for  151  User Mode!  Each item below represents an important assumption that the system has made about you. Correcting the assumptions will change the behaviour of the system and the nature of your presentations.  C userTyp8(faculty) C userType(stueier!f) C userType(stafl) C gertdet(maJ6) C ge^local)  C  gendeiflemaie) C gec^prospective) ;  I  „,  ;  1  Figure 6.5: The Valhalla User M o d e l W i n d o w variables whose values are drawn from an exclusive (disjoint) set. Here, Valhalla believes the user is a faculty member; a student can correct the system's misconception with a single click. The pertinent elements of the user model selected by the reasoning engine are transferred to the hyperbrochure application in an abstract form. Instead of specifying a particular G U I "widget", a particular hypothesis may be specified as an element of a (finite) discrete set, as possessing a value within a particular range, or as having a boolean value. It is left to the hyperbrochure to determine an applicable "widget" with which to display this to the user. This method of providing abstract, indirect control over the reasoning process adds to the flexibility of the framework.  6.1.1  The Reasoner  The Reasoner is a best-first Sicstus Prolog implementation of the assumption-based reasoning framework introduced in this paper. The knowledge bases are all written in a Prolog-like Horn clause language extended with assumptions (as described  152  in Section 5.4), and the annotation database consists of only definite clause assertions.  User  ± User Interface  7\  User Model  Reasoner  Video Disk Video Server  \ Edit List  Video Tape  Media  Figure 6.6: Valhalla implementation  Reasoner  Interface  Internet  Figure 6.7: The Valhalla Agents These aspects of the design can be seen in Figure 6.6. Connections between the video server, user interface and reasoning engine are all client/server (TCP/IP) links using standard U n i x sockets, giving flexibility and platform independence. Figure 6.7 illustrates these relationships.  153  6.2  Functionality  ? area(Student, graphics), % Student s t u d i e s g r a p h i c s , supervises(FacultyMember, S t u d e n t ) , % i s s u p e r v i s e d by FacultyMember relevant(FacultyMember, T o p i c ) , % t o whom T o p i c i s r e l e v a n t e d i t L i s t ( [ ] , P r e s e n t a t i o n , T o p i c , 0 , L ) , % Get a v i d e o e d i t - l i s t costLength(L, 3 0 0 ) . % c l o s e t o 3 0 0 seconds  Figure 6.8: Authorial Intent as a Prolog query Presentation is decoupled from specification by having the system prepare an edit list of relevant events and intervals subject to the constraints in the available knowledge bases. The generation of this edit-list is performed at run-time, rather than compile-time, so that the author need not be physically present to ensure that the presentation is suitable. The intent of the author is currently encapsulated in a distinguished predicate that is attached to a Show button on the interface, whose intended interpretation is that viewers should be given a basic overview of the material available, followed by a body which is relevant to their immediate information retrieval goals, and then by a conclusion; other author intentions could be similarly encapsulated and connected to the Show button, or to other buttons on some custom interface. A number of constraints are applied to the design of the presentation, including for instance that its length not exceed a certain amount of time. The user's interaction with the system is restricted in this way to factor out variables that would make it difficult to test the impact of our user modelling approach. Note, however, that there is an additional window, not pursued in this dissertation, that can be used to make arbitrary queries of the reasoning engine, in the underlying representation language described in this thesis; the user can take the role of intent-based author by specifying his or her own intent in this window and instructing the system to find an appropriate presentation. For example, in Figure 6.8, an author might form this  154  query to ask for a presentation of footage (optimal length of five minutes), relevant in some way to a departmental supervisor of a graduate student associated with the Computer Graphics research laboratory, and so on. Obviously, the full power of intent-based authoring is not realizable in the prototype without some facility with Prolog, and with the underlying reasoning and representation methodology; this is why we expect the user testing of Valhalla to make use of the Show button, which abstracts away from these complexities. The system has been tested with the Departmental Hyperbrochure, introduced in Section 1.4. Potential viewers of the material are prospective and current graduate and undergraduate students, faculty and staff, funding agencies and industrial collaborators. A l l these are potential users of Valhalla, and each brings idiosyncratic goals and interests that the system attempts to meet with tailored presentations.  Structured Video  Document Author  Annotation  Viewer  Figure 6.9: Knowledge-based Video Annotation and Presentation The way the intent-based approach is mapped into the video authoring domain is shown schematically in Figure 6.9: the author brings an intent, and information he or she thinks w i l l be relevant to the eventual presentation. After annotation,  155 a representation of this intent, and a set of indices into the raw video reside in a "document." This is all done at compile-time, in the absence of the viewer. Later, at run-time, the reasoner uses the document, along with the user model and other knowledge, to produce an edit-list. The happy viewer, even in the absence of the author, sees only relevant portions of the raw video. Although the current implementation has the Reasoner acting as a server to the client Interface, the simple protocol underlying communications between these agents permits the Reasoner to view the Interface as just another Prolog predicate: an "ok" from the Interface (caused by successful completion of a presentation to the user) allows successful completion of the Reasoner's evaluation loop; any other message from the Interface (caused, for instance, by the user expressing dissatisfaction with the current presentation by clicking on the No! button in the Control window) forces the Reasoner to initiate a Prolog backtrack to find other solutions. Clicking on the  Show button in the first place initiates calculation of  the first presentation, using only prior probabilities for recognition assumptions.  156  Scenario: Company Meeting Company XYZ holds a meeting to decide whether or not to build a new production facility. The meeting is supported by advanced Group Support System (GSS) tools, and complete records (minutes, keystroke logging, video, etc.) are kept. Someone subsequently gets the job of creating a presentation of the meeting in order to document and justify the process by which the decision to build the new plant was reached. The intent of the ensuing presentations is to convince the viewer that the decision was well motivated; different viewers get different renditions of the argument. The marketing manager, for instance, gets a tailored account that focusses on sales forecasts for new plant operations. The accounting manager sees a spreadsheet with emphasis on cash-flow and detailed cost-benefit analyses. Potential investors see glossy images and video backup, along with statistics demonstrating improved delivery schedules—but nothing that would be useful to a competitor. Any individual viewer of one of these presentations might be inspired to query the system for further information on any subject. He thereby becomes an author in his own right and specifies his own intent. A customer, for instance, might want to see how the decision to build the new plant will affect prices. He authors his own presentation of this information; whether or not the intent of the original author (to convince) plays any role in this last presentation is an ethical and practical issue of some interest, but unfortunately outside the scope of this thesis. An example: InterSpect Consulting Corp. 's 1999 Yearly Report includes the following fiscal information (in millions of dollars): • Land usage costs: 12 • Data storage costs: 75 • Doughnuts and coffee: 105 • Communications costs: 23 The presentation system has available to it knowledge about graphical presentation strategies and languages [129] [194], and reasons its way to an optimal  157 presentation for the president oflnterSpect based upon this as well as a continually updated model of the president's goals, beliefs, desires and preferences. If the author's intention in preparing the year-end report was to fairly and accurately portray the company's expenses, the president might see a simple bar chart in which the suspiciously high doughnut and coffee bill would be immediately apparent. If the author's intention was actually an elaborate argument for even more resources to be allotted to the doughnut and coffee account, such a presentation element may not be prudent. Instead, another graphic would be designed at presentation time in which the offending quantity is de-emphasized, taking full advantage of the equipment available at run-time.  Part IV Conclusions  158  Chapter 7 Conclusions  The whole duty of a writer is to please and satisfy himself, and the true writer a plays to an audience of one. — T h e Elements o f Style, page 85.  Most systems which are currently available to support authoring inherit the limitations of the traditional model of authoring, which were presented at length in Section 1.1 and recapped in Section 5.2. The foremost such limitation stems from the early commitment to content, which makes it impossible to provide usertailored presentations at run-time. The problems are exacerbated in the video medium because it is temporally linear (and because humans have so little time), and because current techniques for automatic speech and visual recognition leave the contents uninterpreted. A knowledge-based solution called intent-based authoring was described in Chapter 5 which mitigates the serious effects of these problems, by separating intent from content, analogously to how the structured document paradigm now separates content from form. The separation of intent from content and form enforced by the intent-based authoring paradigm frees the intent-based author to work at compile-time in the absence of the viewer to specify the intent, or communicative  159  160 goal underlying the eventual presentation. A run-time presentation system then makes the presentation by first deriving a model of the viewer and then tailoring a presentation that meets the author's objectives as given in the specification of intent, and the interests of the user as represented in the model. The entire framework is represented in a Horn clause logic dialect, described in Section 5.4, which supports the recognition of user models, and the design of presentations, by abduction. Other components of the framework described include a scrutability desideratum, to offset the possibility that the model contains errors. This desideratum requires that users be shown relevant aspects of the user model, so that they can understand the (mis)behavior of the system. Because the model may be very large, only a salient subset can be shown in general; this subset is determined by a sensitivity analysis, described in Section 5.5. A prototype called Valhalla was implemented and tested, which manifests the intent-based authoring framework. The central focus of this project continues to be the deployment of artificial i n telligence techniques for user modelling. This work is performed within the limits of what has been called "minimal A I " to explore the simplest useful applications of probability, logic and decision theoretic reasoning strategies to the problems of modelling users of computer systems. The approach is very simple, based as it is upon well-tested notions from decision theory and the A I literature. The intent-based authoring paradigm described in this thesis can be applied to different media, domains, and tasks. It has potential to circumvent limitations of the traditional model of authoring. Intent-based authoring requires the application of computational intelligence, a prospect which is only today becoming realistic; the future of authoring looks promising from this vantage point.  161  7.1  Implications  Authoring is hard, which is why technology has been used in the past to aid authors with their toil, and why this research has been undertaken to investigate the issues surrounding the task. However, it seems on the surface that intent-based authoring as advocated in this thesis is hard too—harder than traditional authoring, which is at least well understood, if limited in the aforementioned ways. So why should modern-day authors engage in the intent-based authoring paradigm? In addition to the reasons already discussed (cf. individualized presentations for individuals), is the reuse factor. Intent-based schemata, once authored, can be reused. Knowledge, once appropriately indexed, can be reused, often in unanticipated ways. The principle of code re-use in software engineering is motivated in the same way [119]. While it is true today that an author w i l l need to invest a great deal more work into the specification of an intent-based presentation than in the design of a traditional book or conference paper, the intent-based author and reader benefits from a more effective deliverable: the dynamic, intent-based document.  7.2  1  Future Work  Intent-based authoring captures the notion that authoring is distinct from viewing/reading, that the  design  of the document is not the same as its presentation.  These processes have been separated and the functions of distinct sources of k n o w l edge and information elaborated in the authoring and reading cycles. The author provides some of the information to be presented, and describes his intention as regards this information, but we can go further. intent-based authors o f tomorrow w i l l benefit from the presentation strategies and knowledge bases o f all o f his or her forebears, and, one day, the n-th intent-based presentation may take less effort from a human author than the corresponding traditional presentation would have.  162  The authoring system can support the author in this design or specification task by referring to knowledge sources that may prompt further elaboration on the part of the author. If the authoring system knows, for example, that the presentation facilities include video imaging capabilities, it may prompt the author to provide video information, if available. The authoring system may even have a partially instantiated model, or an aggregate model of the set of intended viewers of the presentation; this too may solicit additional contributions from the author. Viewing, in turn, can be mediated by these same information sources. The model of the reader is consulted at run-time to design the presentation, and specific presentation elements are determined then by available resources: if the color printer is down, the presentation may have to be optimized for delivery on a blackand-white printer; presentations on an interactive display might differ significantly from those destined for hard-copy devices.  2  The effectiveness of these and other techniques w i l l be evaluated in future work. Empirical testing of the Valhalla interface is being undertaken to see if the user modelling techniques it encapsulates help users accomplish certain well-defined information retrieval tasks, as it is believed it w i l l . Empirical studies w i l l also yield insight into the notion of perceptual salience advanced in this dissertation.  7.2.1  Learning: Updating Prior Probabilities  It is possible within the representation described in this thesis (see Section 5.4) to encode dependencies between random variables [161]. We could render explicit the relationship between sex and category, for instance, and continue to add other dependencies we consider important. The problem is that there may be many such relationships, and it w i l l be difficult to decide which are important enough to enT h e W I P system uses just this approach; combined with the intent-based paradigm, it is a promising direction. See also the work of M a c k i n l a y [129]. 2  163  code, and which can be safely ignored. Not only is this a knowledge engineering task which it is desirable to avoid, but the computational complexity of the resulting Bayesian network rises dramatically with the number of arcs representing dependence. Instead, probability values can be obtained from an episodic knowledge base ( E K B ) [53] that tracks the user population over time. Values thus obtained w i l l reflect actual relationships in the user population, with a computational complexity much lower than that for a complete Bayesian analysis, and the significant costs of error-prone knowledge engineering are completely avoided. A t the end of a session, the E K B is supplied with a new data point, a new i n dividual; the information supplied is in fact the current, most likely user model, consisting of the highest-probability assumptions. This data point in the E K B can move over time, with continued experience with the same user. When the reasoning agent requires a prior value, a call could be made to the EKB  (as a procedural attachment, perhaps). P(male) would be originally supplied  by the E K B as simply the number of individuals in the E K B who are male, divided by the total number of individuals. Later, after the user has indicated that he or she is a faculty member, the E K B would be queried for the value of P'(mal e\faculty). This value is the number of individuals in the E K B who are male faculty members, divided by the number of individuals who are faculty members. The E K B could be used as a kind of "user model server," a repository of i n formation about the user population that can be queried by all applications serving that population. Multiple local E K B s could be located within an institution to serve different subsets of the complete user population, and global E K B s could serve queries over the entire population by querying all of the relevant local E K B s . The E K B would function as a separate agent, communicating with other agents via TCP/IP. Craddock [53] describes an E K B that might be employed in the manner suggested here, and Orwant [154] [155] has recently implemented a system called  164  Doppelgdnger  that uses a similar mechanism.  Since the information in the E K B s is intended to be persistent, it is a design issue of some social impact whether the individuals stored i n the E K B should be identifiable. If they are, then persistent models of individuals would result in highly accurate probabilities at the start of a new session with a user that had already used the system, or who had used another system served by the same E K B . If not, then the system must begin to model the user anonymously, from scratch, albeit with the help of a growing E K B . Individual tracking w i l l keep the number of individuals stored in the E K B down to the number of actual users accessing the systems served by the E K B , while anonymous storage w i l l result i n a new entry i n the E K B for every session by every user at every system served by the E K B . There are clearly significant advantages to individual tracking, but these must be weighed against its (still unclear) social repercussions (but see Section 7.2.2). There is an unexplored relationship between the notion, well-entrenched in the user modelling literature, of user  stereotypes  [169] [45] [44] and the E K B ap-  proach to determination of prior probabilities. This relationship is hinted at by Rich: " . . . stereotypes can be viewed as concepts, and then they can be learned with statistical concept-learning methods." [171] The current discussion can be seen in just this light. Once something is known about the user, this information can be used as i n conventional stereotyping approaches to "trigger" the application of a stereotype. Say, for instance, that the user has indicated, or we have reasoned by plan recognition, that he or she is a faculty member. The traditional approach to the use of stereotypes would have us apply the 'faculty-member stereotype' and attribute thereby to the user the defaults therein. The approach described in this thesis c o m putes using best-first abduction other consistent aspects of the user model as needed. The E K B approach would accomplish the same result, by finding the closest point in the feature space.  165 A s with the approach used in this thesis, there is no need in the E K B approach to foresee all the possible combinations of triggers; the E K B can provide, on demand, a 'stereotype' for any conceivable combination of 'triggers.' Thus, we may encounter, in the course of interaction with the user population, an individual who is both a faculty member and a smoker, as well as an avid baseball fan. Although multiple inheritance from separate stereotypes for faculty, smoker and baseball can accommodate this particular example, considerable attention must still be given to preference for attribute values when these disagree between stereotypes. The E K B is immune from these problems, and provides values for any conjunction of attributes on demand. A simpler approach In this subsection a simpler approach is described that does not require the mechanism of an E K B , but which does not capture all the dependencies that may exist in a user population. The knowledge engineer can "seed" the representation of prior probabilities. For instance, he or she can set the value of the random variable describing whether a user is a student or faculty member as follows: {(student, a/c), (faculty, b/c}}; instead of storing priors as single real numbers, he can store pairs of numerators and denominators whose quotients are the priors. The value of the denominator can be interpreted as the size of the sample space from which the prior is determined, and the numerator as the number of positive instances found in that sample. Over usage, additional instances are encountered, which can be seen as extending the sample space, and consequently having an effect on the values of the priors. A positive instance increments the numerator of the corresponding alternative, and the denominator of all alternatives. There are obviously many pairs of numbers which, when divided, result in the same quotient. For instance, there are many a, b such that a/b = 0.5, but the mag-  166 nitudes of the numbers chosen to represent the priors influences their sensitivity to new information. The larger the denominator applied to the representation of a random variable, the more confidence implied, and the less sensitive it w i l l be to observations. (See [201], cf. Dirichlet Priors.)  7.2.2  Privacy  The text of this thesis has consistently evaded the serious issue of privacy, with all of its complex social and ethical ramifications. Though outside the scope of the current work, any serious elaboration or broader application of the technology of user modelling w i l l bring attention to the matter. Obvious questions include: Who or what has access to the models? Are the models anonymous, or are they individual? Are they persistent, or discarded after each session with the system? Is the user aware of the modelling activity, and has he or she consented to such activity? The design of any user modelling system must necessarily answer these and other questions. In this research, the model is derived during the user's session, is available only to the reasoning engine and the user, and is discarded when the user terminates the session. A l l users of the system are made aware of the nature and scope of the modelling activity taking place, and they are, because of the scrutability desideratum advanced in Section 5.5, able to investigate the model itself via the user model window of the application interface. As mentioned in Section 7.2, there is considerable advantage to maintaining persistent, individual models of users. If users return for multiple sessions, the modelling activity can begin where it left off at the end of the last session, thereby making the user's time at the interface more efficient, and more effective. Even anonymous models can have this salutary effect, to a lesser degree. User models are likely to become important elements of future applications, as well as marketable commodities in themselves, very much like conventional mail-  167  ing lists are today. Today, it is polite and correct to ask people if they wish to be on a list before it is released and used to send them solicitations. Some people feel that their membership in a mailing list is to their advantage, others feel it is an i n trusion into their private lives; this dichotomy w i l l likely apply to the use of user models as well, but with greater potential, perhaps, for misuse. Conceivably, models acquired over long periods could contain extremely detailed information about interests, habits and beliefs of individuals which, although valuable in the service of a cooperative application interface, could result in adverse effects ranging from unsolicited (though hopefully well-targeted) junk mail through to discrimination based on any of the many attributes of the model. The aforementioned concerns are addressed by one of the objectives of the privacy protection policy enunciated by the privacy protection study commission [2], referred to as the principle of maximizing fairness. Government databases are subject to these and other guidelines, but no laws have yet been passed in North A m e r i can courts to protect the rights of individual users of computer systems (see Rosenberg [174] for a broader view of privacy considerations in legal and social contexts). Solutions to the problems created by the use of persistent, individual models could take many directions. Users can be given control over access to their models by having them specify which attributes or groups of attributes should be made available to which services and at what degree of authentication [58]. Users could maintain physical control of their models by carrying them on smart cards, P C M C I A cards (as in Doppelganger [155]) or similar technology. Advanced encryption techniques with multiple levels of security may become accepted for networked transactions involving user models. Privacy is an important issue that must be addressed not only from legal and ethical imperative, but because users themselves w i l l be reluctant to use systems that do not treat sensitive personal data with due attention. Emerging secure tech-  168  nologies w i l l hopefully be up to this task.  7.2.3  Future Development  A number of extensions to the hyperbrochure client are being planned. In addition to directly gaining knowledge via the user model window, details of the user's i n teraction with a presentation are provided to the reasoning engine. The reasoner does not currently use this information, but could bring resources to bear on it. For instance, i f a user uses the navigation buttons in order to skip the viewing of the remainder of a particular clip, it may be deduced that the user doesn't have any interest in the contents of that clip and the user model can be updated accordingly (cf. plan recognition). A s was previously mentioned, items in the user model window are chosen based on their degree of sensitivity. Sensitivity analysis is defined (See Definition 9) as the act of determining how much a presentation w i l l be changed when the user modifies a given set of assumptions in the user model. Given a set of user model items, sensitivity analysis currently provides a quantitative measure of these items, which is used by the client application to present the user model items in a reasonable, sorted order. Other visual clues from the reasoning engine can also be considered. L i n k i n g the sensitivity metric with color (for instance) might help to persuade the user to notice and correct faulty assumptions. A s an example, the client application might choose to present all assumptions with a high sensitivity in bright red, thereby providing the suggestion of uncertainty or danger. These perceptual salience metrics need to be determined by empirical study. One of the goals of the intent-based authoring paradigm was to save time and effort by reducing the impact of the temporal nature of the video medium. H o w ever, viewers must still watch entire presentations in order to gauge their relevance and provide useful feedback to the reasoning engine. If the temporal portions of  169  presentations could be summarized in a non-temporal format, the viewer may be able to form an opinion of the presentation in a more timely manner. One possibility is to construct a graphical representation of the presentation and use a fisheye view (Noik [150]) of the representation in order to highlight the relevant features. If the user is in the process of browsing multiple presentations, the differences between presentations may be candidates for relevance. The current system does not take into account what the user has already seen; there is no notion of "viewing history," but something like it could be added. Users could then be given summaries of clips they had already seen, or shown relevant alternatives instead to avoid any form of repetition and boredom. The system could be pressed into service in at least two different ways. First, after the fashion of information kiosks, Valhalla could be used on-line to provide information at varying levels of detail from overview to in-depth, in accordance with the dynamically evolving model of the information-seeking user. Second, after the fashion of a montage table, Valhalla can support a (human) video editor in the preparation of a video presentation that may be intended for a third party. These usage styles are distinguished for a number of reasons. Information seekers w i l l be limited in the time available, so the system must be real-time. The quality of the presentation is not paramount, as it w i l l be seen only once by a single viewer. Response must be very good, however, if the typical kiosk user is to be expected to wait for the presentation, let alone watch it. In this usage pattern, the system is modelling the user-viewer. Editors may be willing to wait for the system to search huge video indices and knowledge bases if the result is better content selection, or more consistent presentation style. The effort w i l l pay off with multiple viewings by multiple viewers. In this usage pattern, the system is modelling the editor's model of the eventual group of users. Only the first of these ways of using the system is illustrated in this thesis with  170 a number of realistic usage scenarios. A new Valhalla client is under development at the Department of Computer Science of the University of British Columbia. It is being implemented by Kurt Hoppe, a masters student there, as a CGI-compliant set of H T M L files to be used in conjunction with industry standard World Wide Web browser programs like Netscape and Mosaic. The video server agent has been extended to interact with industry-standard H T T P server programs; in particular, it has been tested with the N C S A 1.3 H T T P daemon, and can deliver digital video to client applications via the daemon. A n alternate reasoner is also under development by other researchers in the department, also for use with the Hyperbrochure.  171  Postscript Content is everywhere. There is a widespread media-induced misconception that a lack of content lies behind current disenchantment with the world wide web, and that the remedy is in the hands of Hollywood conglomerates who w i l l spew out incredible volumes of this content. What is in fact missing is a recognition of the primacy of intent. There are not enough authors in existence to tailor the presentation of all that content for individual readers. This is why intent-based authoring is pursued in this dissertation, and why the world wide web, which connects authors and readers as never before in human literacy, w i l l one day become a delivery mechanism not for content, but for intent.  Bibliography [1] The Role of Intelligent Systems in the National Information Infrastructure. Artificial Intelligence Magazine, 16(3), 1995. [2] Public L a w 93-579. Section s.2(b) of the Privacy A c t of 1974, 5 United States Code, section s.552a, 1974. [3] Robert M . Akscyn, Donald L. McCracken, and Elise A . Yoder. K M S : A Distributed Hypermedia System for Managing Knowledge in Organizations. Communications of the Association for Computing Machinery, 31(7):820-835, 1988. [4] Elisabeth Andre, Robin Cohen, Winfried Graf, B o b Kass, Cecile Paris, and Wolfgang Wahlster. Proceedings of the third international workshop on user modelling. Proceedings D-92-17, Deutsches Forschungszentrum fur Kunstliche Intelligenz, Stuhlsatzenhausweg 3, D-6600 Saarbriicken 11, Germany, August 1992. [5] Elisabeth Andre and Thomas Rist. Towards a plan-based synthesis of illustrated documents. Research Report R R - 9 0 - 1 1 , Deutsches Forschungszentrum fur Kunstliche Intelligenz, Stuhlsatzenhausweg 3, D 6600 Saarbriicken 11, Germany, September 1990. [6] Elisabeth Andre and Thomas Rist. The design of illustrated documents as a planning task. In M . Maybury, editor, Intelligent Multimedia Interfaces. A A A I Press, 1993. [7] J. Andre, R. Furuta, and Quint V., editors. The Cambridge Series on Electronic Publishing. Cambridge University Press, 1989.  172  173  References  [8] Douglas E. Appelt and Martha E. Pollack. Weighted abduction for plan ascription. User Modeling and User-Adapted Interaction, 2(1-2): 1-25,1992. [9] Apple Computer, Inc., 20525 Mariani Ave., Cupertino, C A 95014. HyperCard User's Guide, 1987. [10] Yigal Arens and Eduard Hovy. H o w to describe what? towards a theory of modality utilization. In Cognitive Science Society Meeting, Cambridge, M A , August 1990. [11] Yigal Arens, Eduard Hovy, and Susanne van M u l k e n . Structure and Rules in Automated Multimedia Presentation Planning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pages 1253-1259, Chambery, France, September 1993. [12] Paul Van Arragon. User modeling bibliography. Technical Report C S - 8 7 22, Department of Computer Science, University of Waterloo, March 1987. [13] Paul Van Arragon. Nested Default Reasoning for User Modelling. thesis, University of Waterloo, 1990.  PhD  [14] I. L. Austin. How to do Things with Words. Oxford University Press, 1962. [15] Kent Bach and Robert M . Harnish. Linguistic Communication Acts. M I T Press, Cambridge, M A , 1979.  and Speech  [16] A f z a l B a l l i m . ViewFinder: A Framework for Representing, Ascribing and Maintaining Nested Beliefs of Interacting Agents. P h D thesis, l'Universite de Geneve, 1992. [17] A f z a l B a l l i m and Yorick Wilks. Beliefs, stereotypes and dynamic agent modelling. User Modelling and User-Adapted Interaction, 1(1):33—65, 1991. [18] Roland Barthes. The death of the author. In Image, Music, Text. H i l l and Wang, New York, N Y , 1977. [19] L. Bartram, R. Ovans, I. D i l l , M . Dyck, A . H o , and W.S. Havens. C o n textual assistance in user interfaces to complex, time-critical systems: The intelligent zoom. In GI '94: Graphics Interface 1994, Banff, A L , Canada, M a y 1994.  References  174  [20] M . Bauer, S. Biundo, D. Dengler, M . Hecking, J. Koehler, and G. Merziger. Integrated plan generation and recognition. Research Report R R - 9 1 26, Deutsches Forschungszentrum fur Kunstliche Intelligenz, Stuhlsatzenhausweg 3, D-6600 Saarbriicken 11, Germany, August 1991. [21] Richard J. Beach. Setting tables and illustrations with style. C S 45, U n i versity of Waterloo Computer Science Department, Waterloo, Canada, M a y 1985. P h D thesis. [22] Izak Benbasat, Gerardine DeSanctis, and Barrie R. Nault. Empirical research in managerial support systems: A review and assessment, June 1991. [23] Izak Benbasat, Albert S. Dexter, and Peter Todd. The influence of color and graphical information presentation in a managerial decision simulation. Human-Computer Interaction, 2 : 6 5 - 9 2 , 1986. [24] Jacques Bertin. La Graphique et le Traitement Graphique de ITnformation. Flammarion, 1977. [25] Jacques Bertin. Graphics and Information Processing. Walter de Gruyter, 1981. [26] Jacques Bertin. Semiology of Graphics: diagrams, networks, maps. U n i versity of Wisconsin Press, 1983. [27] Sara Bly. Shared workspaces: A look at supporting distributed workgroups. C I C S R invited lecture, September 1991. [28] Sara A . B l y and Scott L. Minneman. Commune: A shared drawing surface. In Proceedings of the Conference on Office Information Systems, pages 184-192, Boston, M A , A p r i l 1990. [29] Kellogg S. Booth and W. Morven Gentleman. A symbiotic approach to v i sualization and the user interface. Unpublished, 1990. [30] G . Brajnik and C. Tasso. A flexible tool for developing user modeling applications with non-monotonic reasoning capabilities. In Proceedings of the Third International Workshop on User Modelling, pages 4 2 - 6 6 , August 1992.  References  175  [31] Giorgio Brajnik, Carlo Tasso, and Antonio Vaccher. UMT User Manual Version 1.0, September 1992. Universita degli Studi di Udine, Laboratorio di Intelligenza Artificiale, Dipartimento di Matematica e Informatica, V i a Zanon, 6, 33100 Udine, I T A L Y , 1992. [32] Gerhard Brewka. Preferred subtheories: A n extended logical framework for logical reasoning. IJCAI, 1989. [33] Vannevar Bush. Pieces of the Action. W i l l i a m Morrow and Company, 1970. [34] W. Buxton and T. Moran. E u r o P A R C ' s Integrated Interactive Intermedia Facility (IIIF): Early Experiences. In S. Gibbs and A . A . Verrijn-Stuart, editors, Multi-user Interfaces and Applications, pages 11-34. 1990. [35] Sandra Carberry. Modelling the user's plans and goals. Computational guistics, 14(3):23, September 1988.  Lin-  [36] Sandra Carberry. Plan Recognition in Natural Language Dialogue. Press ( A Bradford Book), 1990.  MIT  [37] T.A. Cargill. A view of source text for diversely configurable software. Technical Report C S - 7 9 - 2 8 , University of Waterloo Computer Science D e partment, 1979. [38] Stephen M . Casner. A task-analytic approach to the automated design of graphic presentations. Association for Computing Machinery Transactions on Graphics, 10(2), A p r i l 1991. [39] Timothy Catlin, Paulette Bush, and Nicole Yankelovich. Internote: Extending a hypermedia framework to support annotative collaboration. In Hypertext '89 Proceedings, pages 365-378, 1989. [40] P. Cheeseman. On finding the most probable model. In J. Shranger and P. Langley, editors, Computational Models of Scientific Discovery and Theory Formation, chapter 3, pages 7 3 - 9 5 . Morgan Kaufmann, San Mateo, 1990. [41] P.P. Chen. The entity-relationship model: Toward a unified view of data. ACM Transactions on Database Systems, 1(1):9—36, 1976.  References  176  [42] Mourad Cherfaoui and Christian Bertin. Video Documents : Towards Automatic Summaries. In Workshop Proceedings of IEEE Visual Processing and Communications, pages 295-298, Melbourne, Australia, September 1993. [43] H. Chernoff. The use of faces to represent points in k-dimensional space graphically. Journal of the American Statistical Association, 68:361-368, 1973. [44] David N . Chin. K N O M E : Modeling What the User Knows in U C . In A . Kobsa and W. Wahlster, editors, User Models in Dialog Systems, pages 7 4 - 1 0 7 . Springer-Verlag, 1989. [45] David N g i Chin. Intelligent Agents as a Basis for Natural Language Interfaces. P h D thesis, Computer Science Division ( E E C S ) , University of C a l i fornia, Berkeley, C A 94720, jan 1988. [46] W i l l i a m S. Cleveland. The Elements of Graphing Data. Wadsworth and Brooks/Cole Advanced Books and Software, Pacific Grove, C A , 1985. [47] W i l l i a m S. Cleveland. A model for graphical perception, June 1990. Personal correspondence. [48] W i l l i a m S. Cleveland and Robert M c G i l l . Graphical perception: theory, experimentation and application to the development of graphical methods. American Statistical Association, 79:531-554, September 1984. [49] E.F. Codd. A relational model of data for large shared data banks. Communications of the Association for Computing Machinery, 13(6):377-387, June 1970. [50] Robin Cohen and Marlene Jones. Incorporating User Models Into Educational Diagnosis. Technical Report C S - 8 6 - 3 7 , Faculty of Mathematics, University of Waterloo, Waterloo, Ontario, September 1986. A l s o published as Chapter 11 of User Models in Dialog Systems, edited by Kobsa and Wahlster (Springer-Verlag, 1989). [51] Jeff Conklin. Hypertext: A n introduction and survey. In Irene Greif, editor, Computer Supported Cooperative Work: A Book of Readings, pages 4 2 3 475. Morgan Kaufman, 1988.  References  111  [52] R. Cook and J. Kay. The Justified User M o d e l . In Proceedings of the Fourth International Conference on User Modelling, pages 145-150, Hyannis, M A , August 1994. [53] A . Julian Craddock. Induction and the Reference Class Problem. P h D thesis, University of British Columbia, 1993. [54] A . Csinger, H. da Costa, and B. Forghani. A general-purpose programmable command decoder. In IEEE Proceedings, Conference Compint, pages 139— 41, November 1987. [55] Andrew Csinger. F r o m utterance to belief via presupposition: Default reasoning in user-modelling. In S. Ramani, R. Chandrasekar, and K.S.R Anjaneyulu, editors, Lecture Notes in Artificial Intelligence, number 444, pages 407^117. Springer-Verlag, 1989. [56] Andrew Csinger. Implementing a theory of communications in a default reasoning framework. T R 91-30, Department of Computer Science, 1991. [57] Andrew Csinger. The Psychology of Visualization. Technical Report 28, Department of Computer Science, November 1992. [58] Andrew Csinger. OpenMed: Open Systems for Secure Health Care Information Transaction. A Joint C A N A R I E / T A D Proposal submitted by C y berstore Systems Inc., C H A R A Health Care Society, Lions Gate Hospital, Mission Memorial Hospital, InterSpect Systems Consulting Corp., Network Systems Group Inc., Health Informatics Research Group of U B C , September 1995. [59] Andrew Csinger and Kellogg S. Booth. Reasoning about Video: Knowledge-based Transcription and Presentation. In Jay F. Nunamaker and Ralph H. Sprague, editors, 27th Annual Hawaii International Conference on System Sciences, volume III: Information Systems: Decision Support and Knowledge-based Systems, pages 599-608, M a u i , HI, January 1994. [60] Andrew Csinger and David Poole. From utterance to belief via presupposition: Default reasoning in user-modelling. In Proceedings of the Conference for Knowledge Based Computing Systems, KBCS-89, pages 4 0 8 - 4 1 9 , Bombay, India, December 1989. A l s o appeared in Springer-Verlag Lecture Notes in Artificial Intelligence, number 444, pages 407-417.  178  References  [61] Andrew Csinger and David Poole. Hypothetically Speaking: Default Reasoning and Discourse Structure. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pages 1179-1184, Chambery, France, September 1993. [62] Glorianna Davenport, Ryan Evans, and M a r k Halliday. Orchestrating D i g ital Micromovies. Leonardo, 26(4):283-288, 1993. [63] Glorianna Davenport, Thomas Aguierre Smith, and Natalio Pincever. C i n ematic Primitives for Multimedia. IEEE Computer Graphics and Applications, pages 61-14, July 1991. [64] P. Edwards, editor. The Encyclopaedia Free Press, 1967.  of Philosophy. Macmillan and The  [65] Douglas C. Engelbart. Knowledge-domain interoperability and an open hyperdocument system. In Proceedings of the Conference on ComputerSupported Cooperative Work, October 7-10, 1990 Los Angeles, CA, pages 143-156, 1990. [66] James T. Enns. The promise of finding effective geometric codes. Conference Presentation, SIGGRAPH'90, 1990. [67] David W. Etherington. Formalizing non-monotonic reasoning systems. Technical Report 1, University of British Columbia, Vancouver, Canada, V 6 T 1W5, 1983. [68] Ryan George Evans. L o g B o y Meets FilterGirl: A Toolkit for Multivariant Movies. Master's thesis, MIT, February 1994. [69] Steven Feiner. Apex: A n experiment in the automated creation of pictorial explanations. IEEE Computer Graphics and Applications, 5(11), November 1985. [70] Steven Feiner. Research issues in generating graphical explanations. Proceedings Graphics Interface '85, pages 117-122, 1985.  In  [71] Steven Feiner and Clifford Beshers. Worlds within worlds: Metaphors for exploring n-dimensional worlds. In Proceedings Symposium on User Interface Software and Technology, Snowbird, UT, October 1990.  References  179  [72] Steven Feiner, Sandor Nagy, and Andries van Dam. A n experimental system for creating and presenting interactive graphical documents. Association/or Computing Machinery Transactions on Graphics, 1(1):59—77, January 1982. [73] Steven K. Feiner and Kathleen R. M c K e o w n . Coordinating text and graphics in explanation generation. In Proceedings AAAI, pages AA2-AA9, Boston, M A , July 1990. [74] J. J. Finger and M . R. Genesereth. Residue: A Deductive Approach to D e sign Synthesis. Technical Report S T A N - C S - 8 5 - 1 0 3 5 , Department of C o m puter Science, Stanford University, Stanford, C a l . , 1985. [75] T.W. Finin. G U M S : A General User Modelling Shell. In A . Kobsa and W. Wahlster, editors, User Models in Dialog Systems, pages 4 1 1 ^ - 3 0 . Springer, 1989. [76] Gerhard Fischer. Shared knowledge in cooperative problem-solving systems: Integrating adaptive and adaptable systems. In Proceedings of the Third International Workshop on User Modelling, pages 148-161, Dagstuhl, Germany, August 1992. [77] Gerhard Fischer, Raymond M c C a l l , and Anders M o r c h . Janus: Integrating hypertext with a knowledge-based design environment. In Hypertext '89 Proceedings, pages 105-117, Pittsburgh, P A , November 1989. [78] Edward A . Fox. The coming revolution in interactive digital video. Communications of the Association for Computing Machinery, 32(7):794-801, July 1989. [79] M a r k Friedell. Context-sensitive, graphic presentation of information. ACM Computer Graphics, 16(3): 181-188, July 1982. [80] M a r k Friedell. Automatic synthesis of graphical object descriptions. Computer Graphics, 18(3):53-62, July 1984.  ACM  [81] Hector Geffner. Default reasoning, minimality and coherence. In KR89, page 137, Toronto, Canada, M a y 1989.  180  References  [82] R i c k i Goldman-Segall. Thick Descriptions: A Tool for Designing Ethnographic Interactive Videodiscs. SIGCHI Bulletin, 21(2), 1989. [83] R i c k i Goldman-Segall. A Multimedia Research Tool for Ethnographic Investigation. In I. Harel and S. Papert, editors, Constructionism. Ablex Publishing Corporation, Norwood, N J , 1991. [84] Robert C. Goldstein, V. Srinivasan Rao, and Andrew W. Trice. MeetingPlace: The Customizable Group Support Environment. Technical Report 9 3 - M I S - 0 0 6 , Faculty of Commerce and Business Administration University of British Columbia, 1993. [85] Bradley A . Goodman. Multimedia Explanations for Intelligent Training Systems. In M a r k T. Maybury, editor, Intelligent Multimedia Interfaces, chapter 7, pages 148-171. A A A I Press - M I T Press, 1993. [86] Bradley A . Goodman and Diane J. Litman. On the interaction between plan recognition and intelligent interfaces. User Modeling and User-Adapted Interaction, 2 ( l - 2 ) : 8 3 - 1 1 5 , 1992. [87] Irene Greif, editor. Kaufmann, 1988.  Computer Supported Cooperative  Work. M o r g a n -  [88] Steven Gribble, Andrew Csinger, and Kellogg S. Booth. A Distributed M u l timedia Architecture for Intent-based Video Authoring and Presentation. In Proceedings of MultiComm'94, Vancouver, Canada, November 1994. [89] H.P. Grice. Logic and conversation. In P. Cole and J.L. Morgan, editors, Syntax and Semantics: Speech Acts, vol 3, pages 4 7 - 5 8 . Academic Press, New York, 1975. [90] Georges Grinstein, Ronald Pickett, and Marian G. Williams. Exvis: A n exploratory visualization environment. In Graphics Interface '89, London, Canada, 1989. [91] B.J. Grosz and C.L. Sidner. Attention, Intentions, and the Structure of D i s course. Computational Linguistics, 12(3): 175-204, 1986. [92] Bernard J. Haan, Paul Kahn, Victor A . Riley, James H. Coombs, and Norman K. Meyrowitz. Hypermedia services. Communications of the Association for Computing Machinery, 35(1), January 1992.  References  181  [93] Frank G . Halasz. Reflections on notecards: Seven issues for the next generation of hypermedia systems. Communications of the Association for Computing Machinery, 31(7):836-852, 1988. [94] Lynda Hardman, Guido van Rossum, and D i c k C. A . Bulterman. Structured Multimedia Authoring. In Proceedings ACM Multimedia 93, pages 2 8 3 289, August 1993. [95] Beverly L. Harrison and Ronald M . Baecker. Designing Video Annotation and Analysis Systems. In Graphics Interface '92 Proceedings, pages 157— 166, Vancouver, B C , M a y 1992. [96] R i c h Helms. Distributed Knowledge Worker ( D K W ) : A Personal Conferencing System. In Proceedings of the 1991 CAS Conference, pages 115— 125, Toronto, Canada, October 1991. I B M Canada, Ltd. Laboratories, Center for Advanced Studies. [97] Tyson R. Henry and Scott E. Hudson. Interactive graph layout. In Proceedings User Interface Software and Technology '91. A C M Press, 1991. [98] Xueming Huang, Gordon I. M c C a l l a , and Jim E. Greer. Student model revision: Evolution and revolution. In Proceedings of the Eighth Biennial Conference of the Canadian Society for Computational Studies of Intelligence, pages 9 8 - 1 0 5 , 1990. [99] Hiroshi Ishii. Teamworkstation: Towards a seamless shared workspace. In Proceedings of the Conference on Computer-Supported Cooperative Work, October 7-10, 1990 Los Angeles, CA, pages 13-26, 1990. [100] Hiroshi Ishii and Naomi Miyake. Toward an open shared workspace: C o m puter and video fusion approach of teamworkstation. Communications of the Association for Computing Machinery, 34(12): 3 7 - 5 0 , December 1991. [101] P. N. Johnson-Laird. Mental models of meaning. In Joshi, Webber, and Sag, editors, Elements of Discourse Structure. Cambridge University Press, 1981. [102] Phillippe Joly and Mourad Cherfaoui. Survey of automatical tools for the content analysis of video. IRIT 93-36-R, Bibliotheque de 1TRIT, U P S , 118 route de Narbonne, 31062 T O U L O U S E C E D E X , 1993. Also available  182  References  by anonymous F T P from ftp.irit.fr in PostScript, ascii and M S Word formats as private/svideo.[ps,as,wd], or by email direct from the authors (cherfaoui@ccett.fr orjoly@irit.fr). [ 103] Marlene Jones and David Poole. A n expert system for educational diagnosis based on default logic. In Proceedings Expert Systems and their Applications, pages 673-683, 1985. [104] W i l l i a m P. Jones. H o w do we distinguish the hyper from the hype in nonlinear text? In INTERACT87, pages 1107-1113, 1987. [105] Peter Karp and Steven Feiner. Issues in the automated generation of animated presentations. In Proceedings Graphics Interface, pages 3 9 - 4 8 , H a l ifax, M a y 1990. [106] Peter Karp and Steven Feiner. Automated Presentation Planning of A n i mation Using Task Decomposition with Heuristic Planning. In Graphics Interface '93, pages 118-127, Toronto, Canada, M a y 1993. [107] Robert Kass and T i m Finin. Modelling the user in natural language systems. Computational Linguistics, 14(3):5, September 1988. [108] Robert Kass and Irene Stadnyk. Using user models to improve organizational communication. In Proceedings of the Third International Workshop on User Modelling, pages 135-147, Dagstuhl, Germany, August 1992. [109] Henry A . Kautz. A formal theory of plan recognition. Technical Report 215, Dept. of Comp. Sci., University of Rochester, Rochester, N Y 14627, 1987. [110] A . Kay and A . Goldberg. 10(3):31^12, March 1977. [Ill]  Personal dynamic media.  IEEE Computer,  J. Kay. um: A toolkit for user modelling. In Proc. of the Second International Workshop on User Modelling, pages 1-11, Honolulu HI, 1990.  [112] D. Knuth. The TeX Book. Addison-Wesley, Reading, M A , 1984. [113] A . Kobsa. Modeling the user's conceptual knowledge in bgp-ms, a user modeling shell system. Computational Intelligence, 6:193-208, 1990.  References  183  [114] A . Kobsa. Towards inferences in bgp-ms: Combining modal logic and partition hierarchies for user modeling. In Proceedings of the Third International Workshop on User Modelling, Dagstuhl, Germany, 1992. [115] Alfred Kobsa. User modelling: Recent work, prospects and hazards. In Proceedings of the Workshop on User Adapted Interaction, Bari, Italy, M a y 1992. A l s o available as a June 1992 Technical Report from Universitat K o n stanz Informationswissenschaft. [116] Alfred Kobsa and Wolfgang Wahlster, editors. User Models in Dialog Systems. Springer-Verlag, 1989. [117] Kurt Konolige. A computational theory of belief introspection. In IJCAI85, pages 502-508, 1985. [118] Robert R. Korfhage. Intelligent information retrieval: Issues in user modelling. Technical Report 8 5 - C S E - 9 , Dept. of Computer Science and E n g i neering, Southern Methodist University, Dallas, Texas, M a y 1985. [119] C.W. Krueger. Models of reuse in software engineering. Technical Report C M U - C S - 8 9 - 1 8 8 , Carnegie M e l l o n U., December 1989. [120] David Kurlander and Steven Feiner. A visual language for browsing, undoing, and redoing graphical interface commands. In S. K. Chang, editor, Visual Languages and Visual Programming, pages 257-275. Plenum Press, New York, 1990. [121] L. Lamport. A Document Preparation System. User's Guide And Reference Manual. Addison-Wesley, Reading, M A , 1986. [122] Brenda Laurel. Issues in multimedia interface design: M e d i a integration and interface agents. In CHI'90 Proceedings, pages 133-139, 1990. [123] Brenda Laurel. Computers as Theatre. Addison-Wesley, 1991. [124] JintaeLee. Sibyl: A tool for managing group decision rationale. In Proceedings of the Conference on Computer-Supported Cooperative Work, October 7-10, 1990 Los Angeles, CA, pages 7 9 - 9 2 , 1990.  References  184  [125] Francis J. L i m and Izak Benbasat. A communication-based framework for group interfaces in computer-supported collaboration. In 24th Hawaii Conference on System Sciences, Kauai, Hawaii, January 1991. [126] Lai-Huat L i m and Izak Benbasat. The effects of group support system on group meeting process and outcomes: A meta-analysis. U B C Working P a per 91 - M I S - 0 2 0 , 1991. [127] W.E. Mac Kay and D.G. Tatar. Special issue on video as a research and design tool. ACM SIGCHI Bulletin, 21(2), October 1989. [128] Wendy E. Mackay and Glorianna Davenport. Virtual video editing in i n teractive multimedia applications. Communications of the Association for Computing Machinery, 32(7):802-810, July 1989. [129] Jock D. Mackinlay. Automatic Design of Graphical Presentations. Technical Report S T A N - C S - 8 6 - 1 1 3 8 , Stanford University Department of C o m puter Science, December 1986. [130] Jock D. Mackinlay. Automating the Design of Graphical Presentations of Relational Information. Association for Computing Machinery Transactions on Graphics, 5(2): 110—141, A p r i l 1986. [131] Mitchell P. Marcus. A Theory of Syntactic Recognition for Natural Language. The M I T Press, 1980. [132] Joseph Marks and Ehud Reiter. Avoiding unwanted conversational i m p l i catures in text and graphics. In AAAI-90, pages 4 5 0 - 4 5 6 , Boston, M A , July 1990. [133] Catherine C. Marshall and Peggy M . Irish. Guided tours and on-line presentations: H o w authors make existing hypertext intelligible for readers. In Hypertext '89 Proceedings, pages 15-26, November 1989. [134] Christian Metz. Film Language: A Semiotics of the Cinema. Oxford U n i versity Press, 1974. Translated by Michael Taylor. [135] Scott Minneman and Sara A . Bly. Managing a trois: a study of a multiuser drawing tool in distributed design work. In CHF91 Proceedings, New Orleans, L A , 1991.  References  185  [136] Fanya S. Montalvo. Knowledge visualizer: A graphic interface-building tool kit. Technical report, M I T M e d i a Laboratory, 20 Ames Street, C a m bridge M A 02139, September 1988. [137] Fanya S. Montalvo. Diagram understanding: The symbolic descriptions behind the scenes. In Tadao Ichikawa, Erland Jungert, and Robert R. Korfhage, editors, visual Languages and Applications. Plenum Publishing Corporation, 1990. [138] J.D. Moore and C.L. Paris. Planning text for advisory dialogues. In Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, pages 203-211, 1989. [139] Johanna D. Moore and Cecile Paris. Exploiting User Feedback to Compensate for the Unreliability of User Models. User Modeling and User-Adapted Interaction, 2(4):287-330, 1992. . [140] Gerald M . Murch. Color graphics - blessing or ballyhoo? In Ronald M . Baecker and W i l l i a m A . S. Buxton, editors, Readings in Human-computer Interaction: A Multidisciplinary Approach, pages 3 3 3 - 3 4 1 . Morgan Kaufmann, 1985. [141] Sung Myaeng and Robert Korfhage. Towards an intelligent and personalized information retrieval system. Technical Report 8 6 - C S E - 1 0 , Dept. of Computer Science and Engineering, Southern Methodist University, Dallas, Texas, March 1986. [142] Nicholas Negroponte. Soft Architecture Machines. The M I T Press, C a m bridge, Mass., 1975. [143] Daniel Neiman. Graphical animation from knowledge. In AAA! '82, pages 373-376, 1982. [144] U. Neisser. Decision time without reaction time: Experiments in visual scanning. American Journal of Psychology, 76:376-385, 1963. [145] Ted Nelson. A file structure for the complex, the changing, and the indeterminate. In Proceedings of the ACM National Conference, pages 8 4 - 1 0 0 , 1965.  References  186  [146] Steven R. Newcomb, N e i l l A . K i p p , and Victoria T. Newcomb. The " H y Time" Hypermedia/Time-based Document Structuring Language. Communications of the Association for Computing Machinery, 3 4 ( l l ) : 6 7 - 8 3 , November 1991. [147] Jakob Nielsen. Hypertext and Hypermedia. Diego, C A , 1989.  Academic Press, Inc, San  [148] E.G. Noik. Automating the generation of interactive applications. Technical Report 90-21, U. of British Columbia, June 1990. Master's Thesis. [149] E.G. Noik. Reducing the validation task by adding conformance at the i m plementation level. In 24th Annual Hawaiian Intl. Conf. on System Sciences, Kauai, HI, January 1991. [150] Emanuel G . Noik. Layout-independent fisheye views of nested graphs. In Proceedings IEEE/CS Symposium on Visual Languages, Bergen, Norway, August 1993. [151] Donald A . Norman. The Psychology of Everyday Things. New Y o r k : Basic Books, 1988. [152] David Novitz. Pictures and their Use in Communication. Martinus Nijhoff, The Hague, 1977. [153] Margrethe H. Olson and Sara A . Bly. The portland experience: a report on a distributed research group. International Journal of Man-Machine Studies, 34:211-228, 1991. [ 154] Jon Orwant. Apprising the User of User Models: Doppelganger's Interface. In Proceedings of the Fourth International Conference on User Modelling, pages 151-156, Hyannis, M A , August 1994. [155] Jon Orwant. Heterogeneous Learning in the Doppelganger User M o d e l ing System. User Modeling and User-Adapted Interaction, 4(2): 107-130, 1995. [156] Ronald Pickett. Integrated displays of multispectral imagery at air force geophysics laboratory. Draft, March 1991.  References  187  [157] Ronald Pickett and Georges Grinstein. Iconographic displays for visualizing multidimensional data. In Proceedings of the 1988 IEEE Conference on Systems, Man and Cybernetics, pages 514-519, Beijing and Shenyang, People's Republic of China, 1988. [158] David Poole. A logical framework for default reasoning. Artificial gence, 3 6 ( l ) : 2 7 - 4 7 , 1987.  Intelli-  [159] David Poole. Explanation and Prediction: an Architecture for Default and Abductive Reasoning. Computational Intelligence, 5(2):97-110, 1989. [ 160] David Poole. Normality and faults in logic-based diagnosis. In IJCAI, pages 1304-1310, Detroit, M I , August 1989. [161] David Poole. Probabilistic Horn Abduction and Bayesian Networks. Artificial Intelligence, 64:81-129, 1993. [ 162] David Poole, Randy Goebel, and Romas Aleliunas. A logical framework for default reasoning. In The Knowledge Frontier: Essays in the Representation of Knowledge, pages 331-352. Springer Verlag, N e w York, N Y , 1987. [163] Bhavani Raskutti and Ingrid Zukerman. Query and Response Generation during Information- Seeking Interactions. In Proceedings of the Fourth International Conference on User Modelling, pages 2 5 - 3 0 , Hyannis, M A , August 1994. [164] Susan E. Rathie. S A V A N T : Video Annotation Support for Meeting M e m ory. Master's thesis, University of British Columbia, 1995. [165] Brian K. Reid. Scribe: A Document Specification Language and its Compiler. P h D thesis, Carnegie-Mellon University, 1980. A l s o issued as Technical Report C M U - C S - 8 1 - 1 0 0 . [166] Brian K. Reid. Electronic M a i l of Structured Documents: Representation, transmission, and archiving. In J. Andre, R. Furuta, and V. Quint, editors, Structured Documents. Cambridge University Press, 1989. [167] R. Reiter. A logic for default reasoning. Artificial Intelligence, 13(1,2):81132, 1980.  188  References  [168] Ronald A . Rensink and Gregory Provan. The analysis of resource-limited vision systems. In Proceedings of the 13th Annual Conference of the Cognitive Science Society, pages 311-316, 1991. [169] Elaine Rich. User modelling via stereotypes. Cognitive Science, 3:329-354, 1979. [170] Elaine Rich. Users are individuals: Individualizing user models. International Journal of Man-Machine Studies, 18:199-214, 1983. [171] Elaine Rich. Stereotypes and User Modeling. In A . Kobsa and W. Wahlster, editors, User Models in Dialog Systems, pages 3 5 - 5 1 . Springer, 1989. [172] Elaine Rich. Artificial Intelligence. M c G r a w H i l l , 1991. [173] Thomas Rist and Elisabeth Andre. Incorporating graphics design and realization into the multimodal presentation system wip. In Advanced Visual Interfaces 92, Rome, 1992. [174] Richard Rosenberg. 1992.  The Social Impact of Computers.  Academic Press,  [175] Steven F. Roth and Joe Mattis. Data Characterization for Intelligent Graphics Presentation. In CHI'90 Proceedings, pages 193-200, Seattle, W A , A p r i l 1990. [176] Stuart J. Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice H a l l , Englewood Cliffs, New Jersey 07632, 1995. [177] Sunil Sarin and Irene Greif. Computer-based real-time conferencing systems. IEEE Computer, 18(10):33-45, October 1985. [178] Margaret H. Sarner and Sandra Carberry. Tailoring definitions using a m u l tifaceted user model. In Proceedings of the Eighth Biennial Conference of the Canadian Society for Computational Studies of Intelligence, pages 1 0 6 113, 1990. [179] Leonard J. Savage. The Foundations of Statistics. Dover Publications, Inc., New York, 1972.  189  References  [180] John L. Schnase and John H . Leggett. Computational hypertext in biological modelling. In Hypertext '89 Proceedings, pages 181-197, Pittsburgh, PA, November 1989. [181] John R. Searle. A Taxonomy of Elocutionary Acts. In Expression and Meaning, pages 1-29. Cambridge University Press, London, 1979. [182] John R. Searle and Daniel Vanderveken. Logic. Cambridge University Press, 1985.  Foundations  of Illocutionary  [183] Doree Duncan Seligmann and Steven Feiner. Automated generation of intent-based 3d illustrations. Computer Graphics, 25(4): 123-132, July 1991. Proceedings of S I G G R A P H '91 (Las Vegas, Nevada, l u l y 28-August 2, 1991). [184] David Sewell. The Usenet oracle: Virtual authors and network community, December 1992. To subscribe, send S U B E J R N L your name to L I S T S E R V @ A L B A N Y . b i t n e t ; to get contents or abstracts of previous issues send: G E T E J R N L C O N T E N T S ; to get Volume 1, Number 1, send G E T E J R N L V 1 N 1 . EJournal is an all-electronic, Matrix distributed, peerreviewed, academic periodical. [185] Stuart Smith, R. Daniel Bergeron, and Georges G . Grinstein. Stereophonic and surface sound generation for exploratory data analysis. In CHF90 Proceedings, pages 125-132, A p r i l 1990. [186] Oliviero Stock. Natural Lnaguage and Exploration of an Information Space: the ALFresco Interactive System. In Proceedings of the 12th International Joint Conference on Artificial Intelligence, 1991. [187] Oliviero Stock. A L F R E S C O : Enjoying the Combination of Natural L a n guage Processing and Hypermedia for Information Exploration. In M a r k T. Maybury, editor, Intelligent Multimedia Interfaces, pages 197-224. A A A I Press/MIT Press, 1993. [188] L. Suchman and R. Trigg. Understanding Practice: Video as a M e d i u m for Reflection and Design. In Greenbaum and K y n g , editors, Design at Work: Cooperative Design of Computer Systems, pages 210-213. 1991.  References [ 189]  190  Markus A . Thies and Frank Berger. Intelligent user support in graphical user interfaces: Plan-based graphical help in object-oriented user-interfaces. Technical report, Deutsches Forschungszentrum fur Kiinstliche Intelligenz, Stuhlsatzenhausweg 3, D-6600 Saarbriicken 11, Germany, March 1992.  [190] Anne Treisman. Properties, parts, and objects. In Boff, Kaufmann, and Thomas, editors, Handbook of Perception, volume II, pages 3 5 - 1 to 3 5 - 7 0 . 1986. [191] Anne Treisman. Features and objects in visual processing. I n l r v i n R o c k , editor, The Perceptual World: Readings from Scientific American. W. H. Freeman and Company, New York, 1990. Originally November, 1986 Issue of Scientific American. [192] Anne Treisman, Patrick Cavanaugh, Burkhart Fischer, V.S. Ramachandran, and Rudiger von der Heydt. Form perception and attention: Striate cortex and beyond. In Lothar Spillmann and John S. Werner, editors, Visual Perception: The Neurophysiological Foundations, pages 273-316. Academic Press, 1990. [193] Anne Treisman and Stephen Gormican. Feature analysis in early vision: E v idence from search asymmetries. Psychological Review, 9 5 : 1 5 ^ - 8 , 1988. [194] Wolfgang Wahlster, Elisabeth Andre, Som Bandyopadhyay, Winfried Graf, and Thomas Rist. W I P : The Coordinated Generation of Multimodal Presentations from a Common Representation. Research Report R R - 9 1 08, Deutsches Forschungszentrum fiir Kiinstliche Intelligenz, Stuhlsatzenhausweg 3, D-6600 Saarbriicken 11, Germany, February 1991. [195] Wolfgang Wahlster, Elisabeth Andre, Wolfgang Finkler, Hans-Jiirgen Profitlich, and Thomas Rist. Plan-Based Integration of Natural Language and Graphics Generation. Artificial Intelligence, 63(1-2):387-427, October 1993. Also available from D F K I as a technical report. . [196] Wolfgang Wahlster and Alfred Kobsa. Springer-Verlag, 1990.  User Models in Dialog Systems.  [197] C o l i n Ware and John C. Beatty. Using color as a tool in discrete data analysis. C S 21, University of Waterloo Computer Science Department, Waterloo, Canada, August 1985.  References  191  [198] Etienne Wenger. Artificial Intelligence and Tutoring Systems: Computational and Cognitive Approaches to the Communication of Knowledge. Morgan Kaufmann, Inc., 1987. [199] Robert Wilensky, David N . Chin, Marc Luria, James Martin, James M a y field, and Dekai W u . The berkeley unix consultant project. Computational Linguistics, 14(4):35-84, December 1988. [200] Dekai W u . Active Acquisition of User Models: Implications for DecisionTheoretic Dialog Planning and Plan Recognition. User Modeling and UserAdapted Interaction, 1:149-172, 1991. [201] Yang X i a n g , Michael P. Beddoes, and David Poole. Sequential Updating Conditional Probability in Bayesian Networks by Posterior Probability. In Proceedings of the Eighth Biennial Conference of the Canadian Society for Computational Studies of Intelligence, pages 2 1 - 2 7 , 1990. [202] Nicole Yankelovich, Norman Meyrowitz, and Andries van D a m . Reading and writing the electronic book. IEEE Computer, 18(10): 15-30, October 1985. [203] Elise Yoder, Robert Akscyn, and Donald McCracken. Collaboration in kms, a shared hypermedia system. In CHI'89 Proceedings, pages 3 7 - 4 2 , M a y 1989. [204] Richard M . Young, T.R.G. Green, and Tony Simon. Programmable user models for predictive evaluation of interface designs. In CHI'89 Proceedings, pages 15-19, M a y 1989. [205] Frank Zdybel, Norton R. Greenfeld, Martin D. Yonke, and Jeff Gibbons. A n information presentation system. In IJCAI '81, pages 978-984, 1981. [206] Polle T. Zellweger. Scripted documents: A hypermedia path mechanism. In Hypertext '89 Proceedings, pages 1-14, November 1989. [207] Ingrid Zukerman. Content planning based on a model of a user's beliefs and inferences. In Proceedings of the Third International Workshop on User Modelling, pages 162-173, Dagstuhl, Germany, August 1992.  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items