UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

User models for intent-based authoring Csinger, Andrew 1995

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1996-090663.pdf [ 11.41MB ]
JSON: 831-1.0051594.json
JSON-LD: 831-1.0051594-ld.json
RDF/XML (Pretty): 831-1.0051594-rdf.xml
RDF/JSON: 831-1.0051594-rdf.json
Turtle: 831-1.0051594-turtle.txt
N-Triples: 831-1.0051594-rdf-ntriples.txt
Original Record: 831-1.0051594-source.json
Full Text

Full Text

USER MODELS FOR INTENT-BASED AUTHORING By Andrew Csinger B.Eng. McGiil University, 1985 M.Sc. (Computer Science) University of British Columbia, 1991 A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D O C T O R O F P H I L O S O P H Y in T H E F A C U L T Y O F G R A D U A T E S T U D I E S D E P A R T M E N T O F C O M P U T E R S C I E N C E We accept this thesis as conforming to the required standard T H E U ^ V E R S I T Y O J ^ R I T I S H C O L U M B I A November 1995 © Andrew Csinger, 1995 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for refer-ence and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is un-derstood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Computer Science The University of British Columbia Vancouver, Canada Date: Abstract Authoring is the collection, selection, preparation and presentation of informa-tion to one or more readers by an author. The thesis takes a new, critical look at traditional approaches to authoring, by asking what knowledge is required and at which stages of the process. From this perspective, traditional authoring is seen to entrench an early commitment to both form and content. Although the late binding of form is now commonplace in structured docu-ment preparation systems, a similar delay in the binding of content is necessary to achieve user-tailored interaction. The authoring paradigm we have developed to service this goal is called intent-based authoring, because the author supplies at compile-time a communicative goal, or intent. Just as S G M L editors and H T M L browsers defer rendering decisions until run-time by referring to a local style-sheet, intent-based authoring systems defer content-selection decisions until run-time when they refer to models of both author and reader(s). This thesis shows that techniques from artificial intelligence can be developed and used to acquire, represent and exploit such models. Probabilistic abduction is used to recognize user models, and cost-based abduction to design tailored presen-tations. These techniques are combined in a single framework for best-first recog-nition and design. These reasoning techniques are further allied with an interaction paradigm we call scrutability, whereby users critique the model in pursuit of better presenta-tions; users see a critical subset of the model determined by sensitivity analysis and can change values through a graphical user interface. The interactivity is modelled to ensure that representations of the user model to the user are made in the most perceptually salient manner. A prototype for intent-based video authoring is described. Video is used as a test medium because it is a "worst case" temporally linear medium; a viable solu-tion to video authoring problems should apply easily to more tractable traditional media. The primary contribution of this dissertation is to the field of applied artificial intelligence, specifically to the emerging field of user modelling. The central con-tribution is the intent-based authoring framework for separating intent from con-tent. i i Contents Abstract ii Table of Contents iii List of Tables vi List of Figures vii Acknowledgements viii I Introduction 1 1 Systems have it Easy 2 1.1 Introduction 4 1.2 Traditional Approaches 5 1.3 Intent-based Authoring 8 1.4 Valhalla and the Departmental Hyperbrochure 11 1.4.1 Sample Session 11 1.5 Overview 13 II Literature Survey and Formal Background 15 2 User Modelling 17 2.1 Overview 18 2.2 Issues: Acquisition and Exploitation 23 2.2.1 Acquisition 23 in 2.2.2 Exploitation 24 2.2.3 Correction: Situations requiring correction 26 2.2.4 Scope: What is Represented? 27 2.2.5 Extent and Adaptivity 28 2.3 Multiple Agents, Multiple Models 29 2.4 User Modell ing Shells 29 2.5 User Modell ing: A Definition by Consensus 31 3 Authoring 35 3.1 Dimensions of Authoring 36 3.2 Graph Generation 41 3.3 Data Analysis and Visualization 46 3.4 Psychophysical Research 50 3.5 Multimedia 53 3.5.1 Video Authoring 59 3.6 Computer-Supported Cooperative Work 63 3.7 Structured Documents: Content from Form 69 3.8 Hypertext and Hypermedia 72 3.9 Putting it all Together: Cyberspace? 75 4 Formal Background 80 4.1 Symbolic Logic and Default Reasoning 80 4.1.1 Default-Programming with Theorist 82 4.1.2 Summary and Conclusions 85 4.2 Decision Making Under Uncertainty 86 4.2.1 Bayesian Decision Theory 86 4.2.2 Example 87 4.2.3 Decision Analysis 89 4.3 Speech Acts 89 III Contribution 92 5 User Models for Intent-based Authoring 96 5.1 Overview 96 5.2 Motivation 97 5.3 Components of the Theory 98 iv 5.3.1 Summary: Inputs, Outputs and Roles 105 5.4 A n Abductive Framework for Recognition and Design 106 5.4.1 Recognizing User Models 107 5.4.2 Designing Presentations 119 5.5 Interactivity and Scrutable Models 128 5.6 Example 131 5.7 Alternative Approaches 137 5.7.1 Decision Theory for Multimedia Presentation Design . . . 137 5.7.2 Costs and Utilities 139 5.7.3 Other similar approaches 141 5.8 Conclusions 142 6 Implementation 143 6.1 Architecture 144 6.1.1 The Reasoner 151 6.2 Functionality 153 IV Conclusions 158 7 Conclusions 159 7.1 Implications 161 7.2 Future Work 161 7.2.1 Learning: Updating Prior Probabilities 162 7.2.2 Privacy 166 7.2.3 Future Development 168 Bibliography 172 v List of Tables 2.1 Errors in user models 26 3.1 Mackinlay and related work 44 3.2 Data Analysis and Visualization 49 3.3 Intermedia 57 3.4 CSCW time and space diagram 64 3.5 KMS 67 4.1 Domain-Formulation 84 5.1 Priors and normalized posteriors 118 5.2 Myopic Tradeoff Table 121 5.3 Results of Sensitivity Analysis before user action 135 5.4 Results of Sensitivity Analysis after user action 136 vi List of Figures 1.1 The traditional approach to authoring 4 1.2 The intent-based approach to authoring 8 1.3 The Valhalla User Model Window 12 1.4 The Valhalla Control Window 13 2.1 User and system models 25 3.1 EX VIS five-dimensional stick figures 47 3.2 Iconograph 48 5.1 A partially elaborated presentation 104 5.2 Roles, Inputs and Outputs 105 5.3 Representative recognition assumables 107 5.4 Example facts and rules 108 5.5 Rules for perceptual salience Ill 5.6 The Valhalla User Model Window 113 5.7 Assumables for perceptual salience calculations 114 5.8 Show: an example intent 126 6.1 Valhalla's Distributed Architecture 144 6.2 The hierarchy of interfaces to the video server 146 6.3 The Valhalla Control Window 148 6.4 A frame from the Departmental Hyperbrochure 150 6.5 The Valhalla User Model Window 151 6.6 Valhalla implementation 152 6.7 The Valhalla Agents 152 6.8 Authorial Intent as a Prolog query 153 6.9 Knowledge-based Video Annotation and Presentation 154 vii Acknowledgements And further, by these, my son, be admonished: of the making of many books there is no end; and much study is a weariness of the flesh. —Ecclesiastes 12:12 M y wife, Sue Rathie, who kept me on the path where I would have wandered, oh, so many times. For reading my thesis more often than anyone else. For helping me with all those details. For listening. For being there, all the time... M y parents, Helen and Joseph, who got me to where I could do the rest on my own. There were times over the last few years when I think the only reason I kept going was because I promised my mother I would "get a PhD." I was six years old when I made that promise... Kel ly Booth has been the source of much good advice, and a partner in many a stimulating discussion. When I went to Germany for a summer of study, he left me with the sage words: "take lots of toilet paper and don't drink the water." It later turned out that the water supply at the German Center for Artif icial Intelligence was tainted, and I remain forever grateful... Dave Poole loves to argue. He won't admit it, but he does. A n d he is infuriat-ingly good at it, as all of his graduates agree. The strange thing is, that after years of bickering, and endless iterations of frustrating non-felicitous utterances and failed mutual beliefs with unintended perlocutionary effects, his students still l ike him. Valerie McRae has looked in on these often boisterous meetings from time to time, wondering i f it wouldn't be prudent to invite the R C M P . Very strange, indeed... Steve Gribble, for implementing the Next interface. It was a pleasure to work with him. He's the best undergraduate I've ever met, with technical and interper-sonal skills far beyond his years. M i k e Horsch, who taught me everything I know. Steve Mason, for taking just as long to finish his PhD. Scott F l inn, who knows ev-erything about NeXT. Imouttahere. v i i i Thanks Thanks to my co-supervisors, Kel logg S. Booth and David Poole, and to the other members of the thesis advisory committee: • Tom Calvert (Computer Science, Simon Fraser University), • Bob Goldstein (MIS, Faculty of Commerce, U B C ) , • Richard Rosenberg (Computer Science, U B C ) . The examiners were: • J im Little (Departmental Examiner, Computer Science, U B C ) , • R ick i Goldman-Segal (University Examiner, Education, U B C ) , • David Chin (External Examiner, Computer Science, University of Hawaii). The student reader was Michael Horsch. I am grateful for all their contributions to this work. ix Parti Introduction i Chapter 1 Systems have it Easy Each of the shots must be physically spliced with cement or tape to the shots that precede and follow it... —James Monaco How to Read a Film, page 117 Do not be afraid to seize whatever you have written and cut it to ribbons; it can always be restored to its original condition in the morning... —The Elements of Style, page 72 The work reported in this dissertation is rooted in the belief that the best way to make progress towards cooperative computational systems1 is to take stock of human capabilities and limitations, and then to pursue human-computer relation-ships which exploit these capabilities and overcome these limitations. A human-computer symbiosis is warranted, where each participant in the relationship does what it does best. Advances in the foreseeable future will likely revolve about the 1 "Instead of the passive-aggressive error messages that are currently given in response to in -correct or incomplete specifications, intelligent agents should collaborate with the user to bui ld an acceptable request." [1] 3 design of this symbiosis, rather than the embodiment of intelligence in some com-putational artifact. There are at least two complementary approaches to achieving this goal. One way is to build superior interfaces with better affordances that clearly ad-vertise their function to the human user, and that cater to known human psycho-physics [151]. Research in Human Computer Interaction (HCI) pursues this ap-proach in many directions—some of it is surveyed in Chapter 3 of this dissertation. Central to the HCI approach is the desire to make the system easier for the user to understand [122, 123]; the user should be able to acquire and exploit a model of the system. The other, complementary way stems from the realization that until now, it has been the user that does all of the explicit modelling, and that perhaps we have reached a stage where the computer can be made to bear at least part of the burden of representation. Research in User Modell ing [115, 116] takes this approach; the system should be able to acquire and exploit a model of the user. This dissertation focusses on the second approach. User-guided theorem-proving systems and diagnostic expert systems are ob-vious examples of tasks in which humans and computers collaborate to achieve a goal. In environments like these, where the goals of the user are known by the sys-tem a priori, the job of acquiring a restricted model of the user is reasonably wel l -defined. Even in relatively open-ended application environments, the fact that the human user has chosen, say, a word-processing program rather than a drawing pro-gram, provides some grounds for model building. In contrast, vague or implicit user-goals in broader areas like decision-support systems (See, e.g., Goldstein [84]) or desktop publishing and production environments makes acquiring and updating models of users very difficult. This dissertation develops a particular approach to acquiring and exploiting models of users. The approach is applied to authoring and a prototype applica-4 tion for video authoring and browsing is described. The introduction first defines authoring, and then offers a new perspective on traditional approaches. This per-spective is one of the contributions of this thesis, and yields insight into the hereto-fore unregarded limitation of the traditional approaches, as well as insight into a new strategy called intent-based authoring which overcomes these limitations and is developed in this thesis. The goal of this thesis is to provide the intellectual and logical foundations upon which intent-based authoring systems can be developed. 1.1 Introduction A u t h o r i n g Information ( Content^) Author v. (^Form """^ Reader Supply Presentation Demand time Figure 1.1: The traditional approach to authoring Authoring is the honorable tradition of collecting, structuring and presenting information in the form of a "document" rendered in some medium or media. Unti l recently, the document has been static, in the sense that once rendered, it is fixed for all time and for all readers. Promising new technologies have recently come into existence that could alleviate some of the limitations of this difficult, knowledge-intensive undertaking. 5 1.2 Traditional Approaches In the traditional model of authoring, the task of an author2 is to collect a coherent body of information, structure it in a meaningful and interesting way, and present it in an appropriate fashion to a set of readers (or viewers) of the eventual work. This traditional notion of authoring commits the author to the form as well as to the contents of the work, well in advance of the actual time at which it is presented. Figure 1.1 emphasizes that there is no clear separation of information from presen-tation, and authors are committed to both the form and content of their message. Structured-document approaches separate form and content, but user-tailored presentation is still not possible; reader "demand" only indirectly affects the au-thoring process. The familiar book format conveys the force of the general prob-lem; once printed, there is no way—short of second-editions and published errata— to change the presentation for the particular needs and desires of individual readers, or groups of readers.3 The author must both select and order in advance the infor-mation to be presented. Presentations tailored to the needs of particular audiences are not possible in the traditional approach to authoring, with its "compile-time" commitment to form as well as to content. The traditional approach to authoring when applied to non-traditional media like f i lm, results in the same limitations. As an offshoot of his semiological anal-yses of the cinema, Metz [134, p.45] wrote that "the spectatorial demand cannot mould the particular content of each film..." Metz is pointing out that when view-ers sit in their theatre seats munching popcorn, it is too late in the traditional model 2 T h e on-line copy of Webster's 7th Dictionary offers the fol lowing definition: 1: the writer of a literary work (as a book) 2a: one that originates or gives existence: S O U R C E <trying to track down the author of the r u m o r x t h e author of a theory > . 3 Tha t some books are published in multiple editions—the Windows versus the Macintosh edi-tion of a manual, or the Prolog versus the Lisp edition of a programming text, for instance, does not address the general problem. Both these groups were anticipated by an author at compile-time. Not all individual readers can be anticipated in this way. 6 for their goals and desires to influence the content of the celluloid images being projected before them. Such statements—though accurate in 1974—are representative now of what should be considered out-dated, traditional approaches that take a technologically imposed "supply-side" view of the authoring process, in which authors and pub-lishers join to decide both the form and the content of a document before read-ers ever make their wishes known. The principal limitation of these traditional approaches is the resulting "one-size-fits-all" static document, exemplified by the venerable book format that we have been using since well before Gutenberg, when scribes laboriously and meticulously copied manuscripts; identical replication was the sine qua non of these technologies. Most approaches to authoring are even to-day just bigger and faster versions of the printing press, and do nothing to over-come this early binding problem. 4 Today we can do better. We now have fast graphics, powerful reasoning en-gines and other technology, and rather than just add horsepower to traditional tech-niques, we can harness these new technologies to change the way authoring activ-ities are conducted. Before continuing the exposition of this new, non-traditional authoring paradigm, we argue that at least two "new" strategies fall within the tra-ditional model and still suffer from its limitations. Hypermedia: (See Section 3.8) is media which can be accessed non-linearly, or non-sequentially, and it is nothing new. The terminology became common when non-linear documents became computerized, but hypermedia has been with us for a long time in the form of indexed documents {e.g., encyclopaedia), footnotes that reference other parts of a document or other documents, and so on. Although ta-bles of contents and elaborate indexes are intended as remedies to the static doc-ument format, the burden of this approach to overcoming the "one size fits a l l " 4 "Binding" is used here in the computer scientist's sense of associating values with variables. The pun was originally unintended. 7 problem falls heavily upon the reader. For instance, an encyclopaedia is a hy-perdocument that can be browsed using the indices and cross-references as nav-igational links. The browsing activity completes the selection and ordering func-tions normally performed by the author and brings with it an inherent overhead that must be assumed by every reader. The viewer completes the job of the au-thor by selecting and ordering the information to be viewed through the process of navigating the links established by the author. This not only pushes aspects of the problem from one person (the author) to another person (the viewer), it also dramatically increases the demands on the author who must provide explicit nav-igation cues in addition to the traditional authoring tasks. Reducing the amount of human effort required from the author and viewer is still a significant problem with current approaches to (hyper-)authoring. These effects can be mitigated by the knowledge-based approach advocated in this dissertation. See Section 3.8 for more exploration. F o r m versus Content: A n author chooses not only the information to be pre-sented (the content) but also the order and style in which it wi l l be presented (the form). Both contribute to the effectiveness of a presentation, yet few people are highly skilled in all aspects of these processes. This problem is at least partially ad-dressed by the structured document paradigm, which attempts to separate the spec-ification of the content of a document from the specification of its form. Markup languages like S G M L (Standard Generalized Markup Language) and Hytime [146] are characteristic of this effort. They permit a delayed binding for what we might call the "surface structure" of a document (the format in which it is finally pre-sented), but they still require the author to provide the "deep structure" (a hierar-chical decomposition of the content as a structured document). See Section 3.7 for more details. 8 Author 5-Supply Authoring Viewing User Model Intent Information Space ( ^ F o r m ^ ) Presentation Space Knowledge ^Presentation, Media, Domain, Reader =»-Demand time Figure 1.2: The intent-based approach to authoring Content versus Intent: In order to tailor presentations to the needs and desires of individual readers, we need consumable models of these readers. For the "demand-side" of the equation to have a direct effect on the form and content of the doc-ument, decisions about the final presentation must be delayed until "run-time," when the model of the reader can be brought to bear on the final stages of the design process. One difficulty is that user modelling is a new and complex problem. As part of this thesis, techniques for user modelling have been developed and applied to the authoring problem. Thinking of authoring in terms of the knowledge required to support the activity has resulted in a new approach developed in this thesis called "intent-based authoring," which may ultimately resolve the principal problems with the traditional approach. 1.3 Intent-based Authoring A more complete de-coupling of specification and presentation processes is re-quired before the goal of truly personalized presentations is attainable. In addi-tion to the content of the document, the author must also supply an intent. The 9 author's intent is an arbitrarily complex communicative goal analogous to the no-tion of illocutionary force in the literature of speech-acts (see Section 4.3 for more on Speech Act Theory and Section 5.3 for more on authorial intent as used in this dissertation), but can be safely interpreted in the context of this dissertation in its typical dictionary definition, which offers as synonyms: intention, intent, purpose, design, aim, end, object, objective.5 This authorial intent is usually implicit in the work; a newspaper article is (some-times) written to inform, an editorial to convince, a dissertation such as this to ar-gue for the acceptance of a new authoring paradigm, and so on. The author's intent is a (possibly abstract, very high-level) communicative goal. Making explicit this intention at the time the document is specified opens the door to truly user-specific document presentation. Information and presentation spaces can be clearly separated, bridged by various knowledge sources. In partic-ular, a model of the viewer permits user-tailored determination of content at run-time; supply meets demand. Illustrated in Figure 1.2, we call this approach to au-thoring intent-based authoring, and describe here an application of the approach to the authoring of video documents. Video is used as proof-of-concept because it has characteristics which make it a popular recording medium, and because it is in many ways more difficult to deal with than other media (see Section 3.5.1); the intent-based approach to authoring advocated in this thesis is expected to apply equally to other media. MacKinlay [130], Karp and Feiner [106] and others have argued similarly in the domains of graphical presentation and animation. Feiner explicitly uses the term "intent-based presentation." Previous work in automatic presentation has dealt with some aspects of the issues addressed herein, though it has been restricted for the most part to choosing "the right way" to display items based on their syntactic form [130, 175]. Semantic qualities of the data to be displayed are seldom consid-5 Source: Webster's 7th Dictionary, on-line copy. 10 ered. Unl ike Karp and Feiner [106], who describe a system for intent-based anima-tion, we do not start with a perfect model of the objects to be viewed and then de-cide on the sequence of video frames to be generated. Instead, we start with a typ-ically large collection of pre-existing video frames (usually sequences of frames) and select and order these to communicate the intended information. Our task is one of (automatic) "assembly," rather than (automatic) "synthesis," a different prob-lem entirely. A presentation for our purposes is an edit list which specifies the or-der in which a selection of video clips is to be played. 6 In different terms: if the presentation is about a cube in space, our model is not of the cube, but of the video tape whose subject matter is that cube; we do not model the cube, with its size and position, but the tape, with its frame numbers and contents. Recently, other researchers have considered related problems. Hardman et al. [94] undertake to free multimedia authors from having to specify all the t im-ing relations for presentation events; some of these are derived by their system at run-time. Goodman [85] also build presentations on-the-fly from canned video clips and other information. The work reported in this dissertation focusses on user modelling, rather than the media and domain concerns that motivate most other work. As shown in Figure 1.2, the user model is a crucial bridging element be-tween the authoring activity that takes place at compile-time in the absence of the eventual viewer, and the viewing activity that takes place at run-time in the absence of the author. 6 S u c h a characterization deliberately excludes from consideration details of how clips are to be visually related (i.e., special editing effects l ike cut, fade or dissolve), attributes of the playback (e.g., screen contrast, color balance, etc.) and other aspects of video authoring that could easily fall within the purview of a framework of this sort. 11 1.4 Valhalla and the Departmental Hyperbrochure The prototypical application called Valhalla, described in Chapter 6, is an intent-based authoring and presentation system. Valhalla is an intent-based implementa-tion of the University of British Columbia Department of Computer Science Hy-perbrochure. Originally conceived as a static one-hour video presentation, the hy-perbrochure has been pressed into service as a prospecting tool for students, staff, faculty, granting agencies, industrial partners and other internal and external inter-ests; this usage is asking much of a single, linear presentation. The needs of one group of viewers are quite different from others, not to mention the differences in the particular interests of individuals within these groups. The need for a more ver-satile way to show different people what the Department of Computer Science has to offer was identified, and Valhalla emerged partially in response to this need, and because it represented an opportunity to deploy the results of this research. The Departmental Hyperbrochure now consists of two thirty-minute video disks that include an introduction to U B C ' s Computer Science Department by its head, interviews with most of the faculty and staff, as well as walk-throughs of the lab-oratories. The remainder of this section is a walk-through of an actual sample session with Valhalla. The reader might keep this example of the usage of the system in mind while reading other parts of this thesis. The same example is treated in more technical detail in Section 5.4.1. 1.4.1 Sample Session Tom, a faculty member at the Department of Computer Science, arrives in one of the department labs with Joan, a visiting student from the University of Toronto. Joan is considering transferring to U B C , and wants to find out more about the de-partment. She doesn't have her own computing account there yet, so Tom logs 12 in and starts up Valhalla. The system derives an initial model of Tom based on widely available information indexed by his login id ; Valhalla knows that Tom is faculty, and that he is local to the department, and assumes that his gender is male because there is currently a higher percentage of males than females in the depart-ment. The system also knows that Tom belongs to the graphics research group and assumes that his principal research interests lie in that area. Based on this initial model, Valhalla prepares a presentation consisting of a set of video clips from the Hyperbrochure. Tom leaves to do other work, and Joan takes his place. • User Modal • Each item below represents an important assumption that the syst has made about you. Correcting, the assumptions will clianije the behaviour of the system and the nature of your presentations. r us r ge r g e Figure 1.3: The Valhalla User M o d e l Window She watches some of the basic introductory material with which the presenta-tion begins, but begins to wonder why the material goes on to talk about graphics, until her gaze falls on the user model window, in which the system has displayed the assumptions that have had the most effect in preparing the current presentation (see Figure 1.3). Joan sees that the system thinks she is male, local, and interested in graphics research. She clicks on the interface to correct the obviously false as-sumptions and instructs the system to design a new presentation based on this re-vised model, by manipulating the virtual V C R interface shown in Figure 1.4. The new presentation includes clips about the A I research group and its labora-13 Current interval: Go | Previous | Next Pen, ay - Manual Laserdisc Centre-Current Frame Numbe Figure 1.4: The Valhalla Control Window tories, as well as a sequence of scenic views of the Vancouver area and opportuni-ties for entertainment. A brief history of the University concludes the presentation. 1.5 Overview This dissertation is divided into three parts. Part I is an introduction. Part II is a literature review, consisting of Chapter 2, a survey of existing ap-proaches to the modelling of agents, and Chapter 3, a survey of the broad spectrum of authoring systems with respect to a number of characterizing dimensions de-scribed in Section 3.1. Chapter 4 is a review of relevant theoretical material upon which the contribution is built, including hypothetical reasoning, probability and decision theory. Part III describes the theoretical and practical contributions of this dissertation: Chapter 5 goes into some detail about the reasoning framework adopted to support the intent-based authoring paradigm, and Chapter 6 describes a prototype imple-14 mentation built to demonstrate these ideas. Part IV concludes with Chapter 7, advancing some generalizations and impl i -cations of the intent-based authoring and presentation paradigm, as well as some proposals for further work. Although the wide space of authoring sampled in Part II provides many use-ful directions for practitioners interested in intent-based presentation, the field is largely an unwritten book, waiting for integrative contributions from researchers in user modelling, psychological perception, and artificial intelligence. A number of "scenarios" are scattered throughout this thesis. These are in -tended to give the reader a sense for the philosophy and goals of the intent-based authoring paradigm, and although some of them are unabashedly science-fiction, all behavior described in the scenarios can be implemented by addressing techni-cal and non-theoretical issues. A central component in all example scenarios is the underlying model of the users involved, necessary to achieve the functionality described. Ful l code listings are available over the Internet from the author.7 The author can be reached at csinger@cs .ubc. ca, and information is available at http://www.cs.ubc.ca/spider/csinger/home.html Part II Literature Survey and Formal Background 15 16 Scenario: Information Retrieval The reader is warned that the following scenario is science fiction; the intent is merely to motivate the reader with the long-range goals of the intent-based author-ing paradigm, which is to move away from telling computers how to do things, to telling them what to do, and finally to just telling them about ourselves. I n f o r m a t i o n R e t r i e v a l Dan and Mike are both users of FAST, the powerful information retrieval system of the twenty-twenties. FAST is connected to a bewildering variety of widely dis-persed databases on all aspects of human endeavor, and operates -as its acronym suggests- at great speed over high bandwidth networks. When Dan asks the system for the names and descriptions of deadly viruses, FAST begins its response after accessing various cross-indexed medical and his-torical databases with the story of the eradication of HIV, the last virus to be tamed by medical science. Dan is a doctor. When Mike presents a similarly formulated query, the system begins with a de-scription o/Michaelangelo, the computer virus that threatened to destroy PC disk-drive information on the artist's birthday in nineteen ninety-two, in the dark, de-pressing, early days of information retrieval. Mike is a computer programmer. Chapter 2 User Modelling " You have a time machine and you use it for... watching television ? " "Well, I wouldn't use it at all if I could get the hang of the video recorder"... —Douglas Adams D i r k Gently 's Holis t ic Detective Agency The loosely defined area of user modelling has grown over the past decade out of its origins in the field of natural-language dialog systems [196], into a wide range of disciplines concerned with developing cooperative systems for heterogeneous user populations [115]. Some of the more prominent disciplines include Human-Computer Interaction, Intelligent Interfaces, Adaptive Interfaces, Cognitive Engi -neering, Intelligent Information Retrieval, Intelligent Tutoring, Active and Passive Help Systems, Hypertext and Expert Systems. Although the specific reasons for modelling agents are manifold, the general appeal of the undertaking is that it promises more effective use of the available communications channels between the agents being modelled. Most of the atten-tion has focussed on user modelling, a special case of agent modelling, where the agent being modelled is the (human) user of an interactive system. The emphasis in this setting is to increase the cooperativeness of the system vis a vis the human; 17 18 this thesis shares the view that user modelling is a likely vehicle to engender more cooperative behavior in human-machine interaction. Kass and Finin [107] provide a useful framework for presenting the user mod-elling literature by categorizing it along six dimensions. They choose to analyze models as to their • degree of specialization, • modifiability, • temporal extent, • method of use, • number of agents, and • number of models. The present discus-sion wi l l be related to this framework. 2.1 Overview A number of classes of research activity have been subsumed under the label of agent modelling. The literature divides fuzzily among natural language understand-ing and generation, computer aided instruction or intelligent tutoring systems, and cognitive modelling. The thread of information retrieval runs through these some-what arbitrary divisions [118] [141]. Each of these areas has its own reasons for modelling agents.1 Recent efforts at developing user modelling shells promise to take the field from the ad-hoc to the systematic. In natural language, there has been growing consensus that models of agents are necessary to both understand references in user utterances, as well as to gen-erate statements that wi l l be comprehensible to the user; modelling is understood to be a necessary component of natural language dialogue [35] [36] [107] [91] [178]. These observations expand readily from text-only to multi-modal and multi -media environments: the presentations produced by a system for the consumption of a user should be tailored to the user's expectations, abilities, and goals. Practitioners of computer aided instruction (CAI) [198] have made use of mod-els of student users (student models) to decide what to teach, and how to teach 1See [12] for an annotated bibliography of the field, divided into the areas of Computer Aided Instruction, Expert Systems, Knowledge Representation, Logic Programming, Natural Language, the Philosophy of Science, and User Modelling per se. 19 it [50]. The interests of this research community address the question of how a user acquires an accurate model of the system (the system model), as opposed to how a system can acquire an accurate model of the user. Cognitive modelling [101] has as its goal the development of psychologically valid models of human cognition. Although researchers in this area generally build computational systems to test their theories, these systems can sometimes form the bases for useful modelling tools. For example, Craddock [53] develops a model for database retrieval which explains aspects of common-sense reasoning in hu-mans, and which admits of a relatively direct implementation path; his implemen-tation could be used as an episodic knowledge base (EKB) that stores user models. Provan and Rensink [168] refer to psychophysical results to support their model of neural connectivity and Marcus [131] also leans on psychological findings to de-fend his novel approach to parsing natural language; such efforts should not be ig -nored when building systems to reason their way to user models from observation of human responses to visual and textual stimuli. Different strategies and technologies have been employed to address the diff i -cult problems associated with user modelling. E.g., default reasoning [103] [50] [60] [13], truth maintenance [98], and Dempster-Schafer analyses [178]. A n early approach to user modelling, which is still the subject of some current research and which is only now beginning to find its way into commercial products, is Stereo-typing [17] [45] [170] [169]. A stereotype is a collection of data which typifies a class of users. Rich [171] defines stereotypes as a means of making a large number of assumptions about a user based upon only a small number of observations, and she pioneered their use in the Grundy system for recommending books to users. The data are usually represented in some kind of logical calculus as statements of belief or knowledge or goals, and there wi l l be as many stereotypes as there are identifiable classes of users. The approach involves first identifying a fixed set of classes of users a priori, 20 then deciding the membership of an individual in one of these classes, and finally attributing the contents of an applicable stereotype to the individual. The accuracy of user models which rely upon stereotyping depends directly upon the number of pre-determined user categories.2 The more stereotypes, the better, although the difficulty of determining which stereotype to apply to an individual grows with the number of stereotypes. The activity of choosing which stereotype to apply is called triggering. Variations upon the theme of stereotyping have been developed in a number of directions; individuals can be permitted to inherit characteristics from multiple stereotypes (e.g., [16]), an approach which affords more accurate modelling at the expense of more elaborate means for arbitration between appli-cable, mutually inconsistent stereotypes. Stereotypes may also be ordered into tax-onomic hierarchies which provide savings in the space required for the represen-tations: if the stereotype of a computer novice contains typical beliefs of novice computer users about the operation of a computer system, then the stereotype of an advanced user need only contain typical beliefs of advanced users where these conflict with, or do not appear in the novice stereotype. The advanced stereotype inherits the contents of the novice where the latter does not conflict with the for-mer. Taxonomies can simplify the attribution process; i f it has been determined that a user is an expert, for instance, there is no need to verify stereotypes higher in the hierarchy. A n extension to stereotyping is what Ba l l im [16] has called "ascription by per-turbation," wherein an agent assumes that another agent is similar to itself and therefore attributes its own beliefs to the other. This approach offers considerable power in environments where a large fraction of the agents' knowledge is com-mon. The usefulness of the approach is illustrated by how well it works in our 2 A n early, typical example is the beginner-intermediate-expert distinction exploited in some popular off-the-shelf word processing programs primarily to vary the verbosity of their help-messages. The assignment of the user to the appropriate category was determined by the user him-self, thereby solving the thorny acquisition problem. 21 own lives; we never know what our fellow humans are thinking or believing, but we rely on our own introspection and perform attributions with internal justifica-tions like: "If I were him, I would be hungry by now. . . " A canonical perturbation ascription rule would be: "assume that another agent's view is the same as one's own except where there is explicit evidence to the contrary." [16, p.76]. Humans appear to be remarkably successful with this approach. Chin [45] [44] introduces the notion of double stereotypes. In addition to cat-egorizing users, his K N O M E system categorizes information into levels of diff i -culty, so that inferences can be represented as relations between user types and difficulty levels. ("Experts know all simple and mundane but only most complex knowledge." "Beginners know most simple, a few mundane and no complex knowl -edge.") A promising improvement to the technique of stereotyping is to dynamically derive membership categories from episodic databases of user activity. Doppel-ganger [155] is a generalized user modelling system that uses learning techniques to interpret data about users acquired through a variety of sensors. Applications connect via standard protocols to a server that provides access to user data. Each user model is a point in a very high dimensional space whose dimensions are deter-mined by the available sensors; this point moves through the space as information about the user is gathered. The categories in Doppelgdnger are called community models, which are computed as weighted combinations of member models, "and thus change dynamically as the user models are augmented." Section 7.2 of this thesis considers the use of learning techniques in future work. A strategy related to stereotyping is the use of profiling, which involves giv-ing users control over various aspects of system operation by allowing them to set the values of a prescribed set of parameters. Common examples are the tailorabil-ity of the Unix operating system with scripts and alias mechanisms, and the cus-tomizability of the Xwindow interaction environment. There is also a wide range 22 of application programs which offer user-modifiable operation (e.g., [54]). Another approach being studied in the research community is the use of plan-recognition. Recognizing the plan of a user permits the derivation of his or her goals and intentions. This has been employed to help provide pro-active user feed-back when faulty plans are recognized [189]. Problems with plan-recognition have been the management of uncertainty and the prohibitive size of the plan library required for serious applications. Various default reasoning techniques have been applied to the former (e.g., weighted ab-duction [8]), but the latter difficulty has hardly been addressed (but for an excep-tion, see the PHI system [20]). Plan recognition is usually performed [109] under the strong assumptions that 1) the recognizer agent has complete knowledge of the domain, and that 2) the agent whose plan is being inferred has a correct plan. These assumptions are clearly not universally true: users may find ways of doing things that the designers of the systems had not considered, and users all too often have faulty plans usually based upon faulty or incomplete models of the systems they are using. These assumptions are explicitly tackled by systems which try to model the faulty plans that users might have (i.e., Bauer [20] and Thies [189]), which in turn have to deal with a potentially infinite number of faulty plans. Plan-based ap-proaches have grown out of basic artificial intelligence research, and applications are few, but some work is already being done to determine how useful plan recog-nition can become to a range of interaction environments [86]. A n interesting twist on the user model is the Programmable User Model [204]. This tool embodies psychologically motivated constraints in a programming envi-ronment which interface designers use to build a model of a user. In principle, such tools wi l l ensure that any task in which a user may engage wi l l be computationally supported in cognitively sensible ways. The remainder of this section examines in more detail some issues of agent modelling that are relevant to the current work. 23 2.2 Issues: Acquisition and Exploitation. The problem of agent modelling divides broadly into questions relating to the ac-quisition and to the exploitation of agent models. Agent models must be acquired. Even the use of user stereotypes requires determination of the user's membership class, although the issue can be circumvented by simply asking the user to decide for himself or herself.3 2.2.1 Acquisition Acquisition is of two varieties: the system model must be acquired by the user, and the user model must be acquired by the system. The emphasis in this thesis is on the latter process. Kass and Finin consider acquisition along a dimension whose extrema are ' i m -plicit' and 'explicit' forms. 4 Explicit acquisition can be as simple as asking the user to fill out forms or enter descriptive keywords; this data can then be used, for instance, to determine the user stereotype. Systems employing explicit acquisition have been called 'adaptable' [76] (cf. the 'computer as tool' metaphor [108]). Implicit acquisition is generally more subtle, requiring that the model be in -ferred from observation of the user. A n approach that has met with some success is that of monitoring the communication between user and application with the aim of inferring all or part of the user model. For instance, Csinger and Poole [60] [55] employ a normative theory of inter-agent communication based on a Gricean anal-ysis [89] to derive the beliefs of interlocutors in a natural language setting. Their system is implemented in a logical framework for common-sense reasoning [158]. Zukerman [207] presents a planning mechanism which she uses in conjunction 3 A s users are typically not very good at deciding such things [169], other methods are desirable. 4 Recent work makes this distinction in various forms. Laurel , for instance: "Increasingly, sys-tems w i l l need to employ either explicit conversations with people to determine task objectives or implici t user-modeling techniques to infer objectives from behavior..." [123, p l 0 7 ] 24 with a model of user beliefs and inferential capability to predict the possible (per-locutionary) effects on hearers of utterance components. Modell ing these effects in this way permits a traditional anticipation-feedback loop for utterance design. These approaches have the common aim of modelling potential perlocutionary ut-terance effects by recourse to a Gricean model of dialogue. Zukerman continues this line of investigation [163] with a system called R A D A R that generates both responses and queries of two types: disambiguating queries and queries to elicit additional information. Decision-theoretic measures are used to determine the d i -alogue strategy. Wu [200] uses decision theoretic techniques to decide when ex-plicit interaction with the user is to be preferred over implicit hypothesizing, by maximizing the expected utility of the intervention. The work described in this thesis also moves in this direction; see Section 5.5. Kass and Finin mention a number of other efforts taken along similar lines. The implicit approach to acquisition promises to extend into the multimedia envi-ronment, as soon as the nature of interaction with these new technologies can be captured in a set of normative rules. Kass and Finin separate the issue of acquisition into the acquisition of goals, plans, and beliefs, suggesting that acquiring beliefs is the hardest of all . 2.2.2 Exploitation The exploitation of models is highly task-dependent, but some broad distinctions can be drawn. Kass and Finin's framework identifies the 'method of use' dimen-sion. They present what they imply is a continuum between 'descriptive' and 'pre-scriptive' models. The difference between descriptive and prescriptive models may be nothing more than the style in which they are employed. Once acquired in some fashion, a model may be consulted for a variety of reasons; if an explanation is sought for the behavior of an agent, then that model may be called a descriptive model. The 25 Figure 2.1: User and system models same - o r another- model consulted with the intention of tailoring system presen-tations to an agent's knowledge, for instance, is being used in a prescriptive sense. A descriptive model of an agent may or may not be accurate; this accuracy can be tested by comparing predictions of the agent's behavior with actual, observed behavior. A normative model is one which canonizes normal, expected behavior: it can be employed descriptively to explain the behavior of an agent, as well as prescrip-tively to anticipate the agent. When predictions on the basis of a normative model conflict with actual, observed, behavior, the observed behavior of the agent can be interpreted as 'wrong' in some sense. The correctness of the normative model is not questioned. Some domains admit of such models, others do not. One agent may refer to its descriptive model of another agent to decide its ac-tions. A typical example of this usage is sometimes referred to as the anticipation feedback loop as found in natural language dialog systems which refer to a model 26 of their human interlocutor to ensure that the utterance under consideration wi l l be acceptable to the user. 2.2.3 Correction: Situations requiring correction. The acquisition task is never complete. Changes in the system or in the domain of interest may occur from time to time, and wi l l require the user to update her system model. It is the task of the system to present information to the user in a manner that facilitates awareness and comprehension of these changes. This is just the kind of pedagogy C A I researchers have been exploring [198]. Likewise, the user may forget information over time, or the user model may have been incorrectly acquired in the first place; either situation requires remedial action by the system. A number of scenarios present themselves: • Beliefs erroneously attributed to users which they do not have (when these beliefs are, in fact, true, a situation arises called false consensus). • Beliefs not attributed to users which they do in fact have (pluralistic ignorance, special case). This is a serious acquisition problem. • Er-roneous beliefs correctly attributed (user misconceptions identified). • Ignorance: simply having no beliefs in respect of the proposition concerned. B cluser BclSySfeTn (B ClUSer) Bel system a a a normative b - b - ib false consensus c -ic c pluralistic ignorance d d - d misconception found Table 2.1: Errors i n user models. See Table 2.1 for a summary of these error situations. These categories of modelling error grow more interesting with the number of agents being modelled. For instance, in the case of a C S C W environment where 27 multiple participants are engaged in a negotiation task in which mutual consensus is the desired outcome, a facilitator agent might more easily identify cases of false consensus than any individual participant; the facilitator wi l l certainly be able to act more easily upon this information than any of the individual agents. In nego-tiation tasks where it is the underlying goal of all participants to maximize joint outcome, some or all of the participants may not believe that some or all of the other participants share this goal. A facilitator or mediator agent might be able to act on recognition of this case of pluralistic ignorance to the benefit of the group and its common goal. Kass and Finin advance as one of the dimensions of their analysis the notion of 'modifiability,' intended to distinguish models along a range between those which are static, and those which are dynamic. It is only dynamic models which wi l l ad-mit of correction, and the authors point out that 'user models that track the goals and plans of the user must be dynamic ' 2.2.4 Scope: What is Represented? A model of a user divides naturally into two components: the normative or generic component, and the specific. The generic component models the abilities and l i m -itations of normative humans. This includes such quantities as psychophysical^ derived limits to visual resolution (see Section 3.4), color preferences and even certain typical pathologies as colorblindness. Although this component may need to be modified for 'abnormal' users, it can in principle be acquired once and for all from psychophysical studies and putative cognitive models. The specific component relates to the goals and beliefs (e.g.) of a specific, in -dividual user. It is distinguished from the generic model in that it must be acquired and maintained for each individual. While the generic component wi l l be useful to ensure that systems present information in cognitively sensible ways, it is the specific component which wi l l induce adaptable, user-sensitive cooperative oper-28 ation, and is the target of the present investigation. Kass and Finin distinguish between models which are ' individual' and those which are 'generic,' recognizing that there is a continuum between these extremes. The stereotyping approach outlined above lies somewhere along this continuum, particularly since various hybridizations are possible, such as creating a hierar-chy of stereotypes to better accommodate variation in agents without dramatic in -creases in storage requirements.5 2.2.5 Extent and Adaptivity Kass and Finin also discuss the 'temporal extent' of the model, arguing that the use-ful lifetime of information varies. The maintenance of the model should be subject to conditions of controlled 'forgetfulness,' where information about the agent that is judged to have outlived its usefulness according to some criteria is deleted, or forgotten. This notion is not pursued in this thesis; the issue of how elaborate a temporal representation is necessary is orthogonal to the investigations of this dis-sertation. The information in the models should be faithful to the composition of the user population, which may change. Predefined user categories such as conventional stereotypes may be inaccurate, misconceived, or out-of-date. Various means of adapting to the user population have been considered, from as early as the Grundy system [171], to the approach described in this dissertation. 5 S u c h an approach is immediately suggestive of an object-oriented agent model. The object-oriented methodology would yield the dual benefit of clean and separable information structures for the model which al low inheritance mechanisms, as well as well-defined accessibility v ia methods. See C h i n [45] and Wilensky [199] for suggestive leads in this direction. 29 2.3 Multiple Agents, Multiple Models In general, there wi l l be more than one agent involved in a collaborative process, and the system wi l l l ikely need to model some or all of them. Even in existing in -teractive systems, there is a need for some sort of multi-agent modelling. Kass and Finin mention medical diagnosis systems, in which both the user and the patient are to be modelled. Even though the user and the patient may be one and the same individual (doctors get the flu too, after all!), it is to the distinct roles in the task domain that the operation of the system is sensitive. This issue wi l l emerge at var-ious points in this thesis. In general, though, a separate model may be required for each agent-role. Certainly in the case of systems designed explicitly to support the cooperative work of more than one individual—i.e., systems designed to support collaborative work—and particularly for the class of such systems called decision-support tools, explicit models of multiple agents wi l l be required. The existence of multiple users is considered in Figure 2.1. Acquisition relationships are suggested by the directed arcs. Not shown are models that users might have of other users. Moreover, it may turn out to be necessary to model agents' models of other agents, including the reasoning capabilities of these agents [13]. i Kass and Finin distinguish between multiple agents on the one hand, and mul -tiple models on the other. They suggest that multiple sub-models might need to be maintained for each user, since their levels of expertise may vary between (sub-)domains. They appear to consider only the stereotyping approach in their discussion. 2.4 User Modelling Shells Parallel to familiar developments in other areas, there is growing interest among the user modelling community in User Modell ing Shells. Just as user interface 30 management systems (UIMS) alleviate part of the systems implementation bur-den and enable cost-effective generation of non-trivial interfaces, a user modelling shell (UMS) provides services for implementing non-trivial modelling capabili-ties. Just as expert system shells made advanced knowledge-based techniques ac-cessible to systems developers, U M S ' s promise to promote transfer of advanced user modelling techniques and technologies into application development environ-ments, which in turn promises to fuel a new round of research and development in the field of user modelling. Current U M S research focusses on developing "integrated representation, rea-soning and revision tools that form an 'empty' user modeling mechanism. When filled with application-dependent user modeling knowledge, these shell systems would fulfi l l essential functions of a user modeling component in an application system." [115]. G U M S [75], B G P - M S [113] [114], and U M [111] are all efforts at implement-ing these functions, and all of them rely on the stereotyping approach described earlier. G U M S [75] permits only single-inheritance in the stereotype hierarchy, and users can belong only to a single stereotype. If new observations about a user in -validate his or her membership in the current stereotype, G U M S moves upward through the stereotype taxonomy to a more general user stereotype. Revision of the user model in G U M S therefore results in a loss of information. B G P - M S [113] [114] (Belief, Goal and Plan-Maintenance System) represents assumptions about the user in an extension of Prolog, and employs multiple in -heritance in a partition hierarchy to extend models of individual users. The sys-tem provides various development and run-time services for developers of appli-cations requiring user modelling. The developers of B G P - M S state their intention of adding an automated truth-maintenance system to provide incremental consis-tency of the user models, and they are investigating the use of modal logics for 31 increased expressiveness of the user models. User Modell ing Tool (UMT) [30] is a general purpose user modelling shell whose approach to user modelling falls into the class of assumption-based user modelling because it uses an assumption-based truth maintenance system (ATMS) to maintain the consistency of the user models. Stereotypes and production rules are the techniques used in U M T to generate and activate the models per se. A L ISP implementation for Symbolics environments is available [31]. Although these efforts are perhaps somewhat optimistic in view of the early state of user modelling technology, they point the way towards the first commercial implementations. 2.5 User Modelling: A Definition by Consensus A frequently asked question at the 1992 User Modell ing Workshop was whether User Modell ing was somehow the same as Interface Design, or more generally, i f there was anything that was not User Modell ing [4]. This question naively presup-poses some consensual definition or understanding of what user modelling actually is, and demonstrates how easy it can be to mislead even self-professed practition-ers in the field with the term user-modelling. By 1994, the user-modelling community had outlived its detractors and out-grown the workshop format. Although practitioners were now wil l ing to use the term without embarrassment, and as if they all agreed what it meant, there was still no "definition" by consensus at the 1994 User Modell ing Conference, which enjoyed contributions from an even wider variety of disciplines. A number of things in particular are not intended with the use of the term in this thesis. For instance, no cognitive adequacy of any sort is intended for the models described here; our approach to modelling agents does not purport to accurately represent the human users of systems to some arbitrary degree. 32 In its broadest interpretation, user modelling is nothing more than applying the user-centred view to the design and implementation of systems. Whether the sys-tem in question is an advanced user interface, a C A D system, a database retrieval system, or some other more or less advanced system, when the designers of these systems adopt the user-centred view described herein, they are engaged in user modelling. In the rest of this document, the use of the terms user modelling or agent mod-elling is somewhat more specific, referring to the acquisition or exploitation of ex-plicit, consultable models of either the human users of systems or the computa-tional agents which constitute the systems. 33 Scenario: Intelligent Meeting Support This is another motivating scenario, illustrating that models of individual users can be exploited at run-time to tailor the form and content of presentations. Intelligent Meeting Support The geographically dispersed executives of InterSpect Systems Consulting Corp. are participating in a meeting via the services of WOW, the intelligent meeting support system that has gained a great following in the early years of the twenty-first century. Participants benefit from a variety of media services including video, graphics, and text, all transmitted over high-bandwidth communication links. Maria, the ebullient manager of central operations, is holding forth on the need to maintain quality by hiring only the best candidates for positions now vacant. Ralph, the chief accountant of InterSpect, is silently preparing a financial re-port indicating that it might be difficult to pay the best candidates what they are worth, given the combined effect of the current state of the national economy and the company's debt load. Fred, a member of the board of directors, hates numbers. Accounting bores him, and pie-charts infuriate him, but he is fascinated by a new tool for 3D visu-alization of multivariate data. As Maria finally winds down at the urging of the WOW facilitation manager, which has determined that she really has had quite enough bandwidth to make her point, Ralph's report is made available to the other participants. Maria, who has a great grasp of numerical data, and who actually savors their visual impact on a page, sees in a corner of one of her displays the rows and columns of an old-fashioned spread-sheet representing Ralph's fiscal objection to Maria's policy. Fred sees a multi-dimensional scatterplot constructed by WOW to make maximal use of the capabilities of his visual system, and other participants experience other varied events in these and other modalities. When Ralph draws the attention of other participants to Payroll Item Number 2070, perhaps only by uttering the expression in English, not only is his utterance simultaneously translated into the (natural) languages of the other participants and optionally presented on their audio channels, but these other events also oc-34 cur: 1) A set of relevant numbers is highlighted on Maria's display, 2) when he is attending to it, the dimensions of Fred's scatterplot are interchanged to perceptu-ally emphasize the data referred to by Ralph, and 3) similar events are experienced by the other participants. Chapter 3 Authoring Every style is but one valid way of looking at the world, one view of the holy moun-tain, which offers a different image from every place but can be seen as the same everywhere. —Rudo lph Arnhe im Art and Visual Perception: A Psychology of the Creative Eye, final paragraph Style takes its final shape more from attitudes of mind than from principles of com-position, for, as one elderly practitioner once remarked, "writing is an act of faith, not a trick of grammar." ...style is the writer, and therefore what a man is, rather than what he knows, will at last determine his style. —The Elements of Style, page 84. This chapter takes a broad view of Authoring research and samples the range of work at a number of relevant focal points. To provide some order for the descrip-tions to follow, and to situate the work leading to this dissertation, the following dimensions of analysis are used. 35 36 3.1 Dimensions of Authoring The work is grouped by the tasks they are designed to support. The work of Mack in -lay, Casner, Roth as well as some of Feiner's is addressed under the heading of Automatic Graph Generation. Pickett and Grinstein are representative of work in Exploratory Data Analysis, while Ware and Beatty, Cowan, and Cleveland are pre-sented along with Psychophysical Research performed by psychologists. Feiner, and McKeown , are best considered in the context of Multi-media. B l y and oth-ers are discussed under the label of Computer Supported Cooperative Work, and the themes of Hypertext and Hypermedia appear in a number of guises under these headings, as well as in their own section. Structured documents, cyberspace, and finally intent-based documents—the core of this dissertation—are explored in sep-arate sections. A number of systems are presented within the framework described above and their interrelationships are shown within the space defined by the dimensions listed below. Deliverable: The deliverable is that which is being authored. There are sometimes [role-dependent] distinctions between artifact and deliverable: the deliverable is not always a physical artifact, and different agents (reflecting different roles) may emerge from the activity with different deliverables: from the author's point of view, the deliverable is the book, while from the publisher's point of view it may be the month-end sales figures. Task/Data Domain: The domain refers to the field to which the task being ex-pedited is relevant. This categorization may also be role dependent: the author of the programming manual may regard his contribution as being to the field of struc-tured programming, while the editor may feel that she is contributing to the field of document design and production. 37 Temporal Scope: Is the emphasis on: presentation and object-level real-time sup-port? representation and meta-level maintainability? Short-term versus long-term storage? Version control? How, in other words, does the system conceptualize and deal with Time? Target Media: The target media are those supported by the system or work under investigation. Varieties of text, graphic, audio, video and any combination thereof are the target media of different systems, each of which may take different views of the relationships between medium and mode. Mode: Flesh and Blood Interface: What attention is being paid to human per-ceptual capabilities and limitations [57]? To visual, aural and gestural primitives? Different aspects of the visual, aural and kinesthetic modalities define the space of human-computer interaction today. Are there other modes (senses) to be consid-ered? Style: Is there an over-arching notion of style in the system? Is it adjustable? In what ways, by which roles, and to what degree? What degree of inter-presentation coordination is allowed? (See Beach [21] and Cargil l [37]). Models: Are there explicit or implicit models of users, systems, agents, roles? How are they employed and combined to achieve the functionality of the system? Role relationships: Who is/are the Author(s), Reader(s)? Are they co-temporaneous? Are they co-present? Other task and domain specific roles include: editor, facilitator, chairperson, manager. The broad view allows for temporal shift-ing and stretching of the authoring process. The locus of creation of presentation can be shifted along a continuum from conventional author, to situations where the presentation is decided by a combination of conventional author and conventional reader, to situations where the presentation is designed with no input from any sin-gle agent resembling a conventional author (e.g., a computer-generated event log 38 of system activity). There are also questions about the number of actual authors and actual read-ers, and whether activities by these agents are simultaneous and distributed. A n -notating is a special case of non-simultaneous authoring by one or more potentially distributed authors. Heterogeneity: This issue ramifies into a number of questions at varying levels of analysis. • Internal (Human) Interface: Is the interface to the system role-dependent? E.g., Do readers and writers have the same interface, or is there a 'modal disparity' between them? • External Interface (Interoperability): Is the system designed to cope with varied sources of information with differing protocols? • Organizational Interface: Can different organizations using the same sys-tem make use of each other's information spaces? • Seamlessness: Can users of new systems integrate their existing tools and work-spaces? In particular, do new C S C W tools support the use of exist-ing tools for individual work? (Can seamless transitions be made between individual and collaborative work? [125].) Some transitional elements are bound to remain central for some time: users may want to continue using pencil and paper, and systems should not force them to change their ways. 1 Information and Presentation Spaces: The presentation space is the arena in which the presentation takes place, composed of various media and directed at var-1 There remain many advantages to hardcopy that w i l l be difficult to displace, so for the fore-seeable future, usable, practical electronic systems w i l l need to interface with the paper and back-of-envelope worlds. 39 ious modes; the information space refers to that which is presented. These defini-tions provide useful categorizations of authoring systems. • Static, or changing, dynamic information space? • Static, or changing, presentation space? • Static, or changing, presentation? • Shared Information or presentation space. Particularly confounding prob-lems arise when information is shared between synchronous collaborators. One of the important motivations for modelling users (see Section 2) is the desire to tailor presentations to individual needs, understanding and goals. This desire conflicts with synchronous collaboration when one participant wishes to direct the attention of another participant by direct reference to an element on his private display. This reference may be very difficult to re-solve when the referent either does not appear at all on the private display of the second participant, or appears in some different form, or at a different lo -cation. (The example on page 33 presented a scenario intended to typify this kind of problem in the domain of intelligent meeting support.) The presenta-tion of identical views 2 alleviates these difficulties by ignoring the problem of tailoring presentation. The survey begins by situating a few traditional authoring systems within the space defined by these dimensions. The familiar B O O K is itself a deliverable, across a wide variety of domains, not the least of which is recreation or entertainment. Its scope varies from the short-term throw-away paper-back to the archival of information in weighty, hard-bound tomes; the information in a book is not typically presented in real-time, ex-cept at poetry readings and for the delivery of bed-time stories to children. Readers 2 K n o w n as W Y S I W I S [125], or what you see is what 1 see presentation. 40 can interrupt their progress at w i l l , and the subject matter can refer to remote points in time. The target media is clearly paper of some sort or another, upon which the information is available to humans via their visual sense, unless they are in Brail le, in which case they refer to the tactile sense; books also provide a pleasant kines-thetic feel. The pages of a book typically adhere to a discernible style, which lends familiarity to the on-going reading process. The only kind of modelling is of the implicit variety, undertaken at the time of writing of the book, and which reflects the author's desire to appeal in some fashion to the reader. Books typically admit only the roles of author and reader, though other roles are hidden in the produc-tion process: the touches of editors, publishers and translators sometimes remain visible in the final copy. There is usually -though not always- a single author and books are generally printed with the hope of attracting more than a single reader! Books fit seamlessly into the lives of most humans, and what problems exist are accepted out of long habit: books can be carried in hand-bags and attache cases and read on crowded buses during rush hour, as well as in bed with a flashlight. Both the information (content) and the presentation spaces (form) of a book are static. L E T T E R S are like books in most respects, differing notably on the dimension of role; there is typically a single reader to a letter, or a small number of secondary readers who may appear on a carbon-copy list in the document itself. A CORRE-SPONDENCE is a sequence of letters between individuals, and therefore admits of a temporal aspect. A M O V I E is a 'book' whose target media is film or video, and which is acces-sible to human viewers via the visual and aural modes or channels (notwithstand-ing the efforts of Odorama and Sensurround to expand the viewing experience to other modes!) The scope is usually real-time with respect to presentation, although a video tape can be stopped and rewound at the viewer's discretion. A traditional theatrical PLAY exhibits aspects of the movie, with the added 41 complication that the audience can affect the performances of the actors in vari-ous subtle ways; the presentation space is dynamic. The notion of I N T E R A C T I V E T H E A T R E (e.g., theatre-sports) amplifies these audience-feedback effects [123]. It is not as easy to rewind a live performance as it is a video tape. A V I D E O G A M E of the arcade variety uses a combination of graphics, video, sound, and sometimes motion feedback to provide a multimedia simulation of an alternate reality. Some games permit multiple players to share the alternate real-ity (shared information space), either on the same video screen (shared presen-tation space) or on separate terminals. The video game is even further along the interactivity continuum, permitting its users to modify not only how information is presented, but its content as wel l ; the information space is dynamic. Some home video games maintain simple models of their relatively small sets of players in order to recall preferences, and perhaps to restore the state of a sus-pended game. These simple, familiar examples illustrate the use of the dimensions of author-ing; the remainder of this chapter surveys the esoterica of authoring, using the same dimensions. 3.2 Graph Generation So I sat down and wrote a program that'll take those numbers and do what you like with them. If you just want a bar graph it'll do them as a bar graph, if you want them as a pie chart or scatter graph it'll do them as a pie chart or scatter graph. If you want dancing girls jumping out of the pie chart in order to distract attention from the figures the pie chart actually represents, then the program will do that as well. •—Douglas Adams Dirk Gently's Holistic Detective Agency The automatic presentation of information has occupied its share of the A I as 42 well as graphics 3 literature. Early work, in particular, recognized the strong re-lationships between knowledge representation and (graphical) presentation which characterizes the current work. For instance, Zdybel et al. [205] define an Infor-mation Presentation System (IPS) as a system that: 1. Automatically generates displays according to content-oriented specifications 2. Provides a systematic basis for interpretation of user graphic input 3. Functions reasonably well without demanding custom-tooling to a particular application 4. Is easily extensible to satisfy domain and user-specific display requirements. They further state (1) that their "view of an IPS is that it is itself a knowledge-based system," (2) that a "high degree of sensitivity to the human end-user can be built in , " and (3) that "an IPS is a place to embody a consistent set of decisions about the human factors of graphic display." The first point is basic to this the-sis, that presentations can in some sense be expressed in and even derived with some form of logical calculus. The second point takes aim at the issue, also cen-tral to this thesis, of user modelling, and the third refers to psychophysical issues addressed in this chapter under the heading ' flesh-and-blood,' and embodied in the implementation described in this dissertation as perceptual salience. In their description of the View System, Friedell et al. [79] discuss how "the graphical presentation of data is tailored to the user's identity, task, and database query." Pre-defined presentation plans are chosen by a best-first search mecha-nism. They recognize the potential of encoding knowledge about graphical pre-sentation in the form of what they call 'synthesis operators [80], and also discuss 3 I n this dissertation, "graphics" means virtually any technique used to produce visual repre-sentations of information; this w i l l naturally include business graphics l ike bar graphs of corporate data, 2 D and 3D drawings generated with and without the aid of computers, and visualization tech-niques for multivariate data. 43 "reasoning about how to select and combine .. .primitive elements of object de-scriptions." Even the possibility of reasoning with multiple media appeared in the literature of the early 1980's. Neiman [143] writes: " . . .the use of the knowledge structures to generate multi-modal output demonstrates the generality of the knowledge rep-resentation techniques employed." His system was used to generate explanatory, data-driven animations for users of a C A D system, coordinated with natural lan-guage explanations. The fields of graphics and A I begin, unfortunately, to diverge at about this time, 4 and the potentially very fruitful area of automatic presentation defined by the in -tersection of these fields has suffered for it. Nonetheless, a few researchers have continued in this tradition, and their work is reviewed in the remainder of this sec-tion. The work of Mackinlay [129] [130] is the first to explicitly address expres-siveness and effectiveness criteria for visual presentations.5 He concerns himself with relational data (this is the domain) and restricts himself to graphical languages commonly associated with business graphics (the deliverable). The presentations are designed for traditional screen or plotter devices (target media). Although there is no explicit modelling of users, the system embodies explicit knowledge of vari-ous media. There is considerable effort to render presentations in accord with hu-man perceptual abilities and limitations: the system can thus be seen as modelling certain normative human characteristics, but this model is static and acquired from psychophysical studies in the literature (flesh-and-blood). There is no over-arching notion of style by which one presentation might be coordinated with another, ex-cept in special cases when a presentation is broken into, say, two line graphs with 4 T h i s c la im is based upon the non-incidence of graphics and presentation papers in important A I conference proceedings after 1983. This regrettable schism appears to be under repair by the mid-nineties. 5 For basic background on graph representation see Bertin [24] [25] [26] and Cleveland [48] [46] [47], or [57] for a survey. 44 Deliverable Business graphics Task/Data Domain Relational data Scope Not applicable Target Media Traditional screen and plotter devices Flesh and Blood Implicit psychophysical limitations Style Not really Models Explicit: media limitations; Implicit: Human perception Role Relationships Single author, multiple reader Heterogeneity Not applicable Spaces Static information and presentation Table 3.1: Mackinlay and related work. a common axis. The prototype developed by Mackinlay is called APT, and has the following typical role-relationships: Single-Author and Multiple-Readers who are not necessarily mutually co-temporal or mutually co-present with the Author. The presentation is static, prepared for a particular data/medium combination; there is therefore no particular support for heterogeneous environments, although the same data may be presented differently in environments which differ in the available me-dia. See Table 3.1 for a summary. Mackinlay's work is interesting both in its own right, and because it has been the starting point for a variety of efforts by other researchers. Roth [175] attempts to characterize semantic dependencies in the data to be presented. He describes the static categories of data types, relational-structure, arity, relations among relations, and recognizes the need to represent the dynamic needs of users. Because he distinguishes time as a separate data type, he is able to make special arrangements for the display of temporal information. Casner [38] undertakes to include task descriptions in the process of automat-ing graphical presentations. His notion of expressiveness is at a much higher level 4 5 than Mackinlay's, and derives from a logical formulation of the task in which a user is involved. Casner's system embodies generic models of human perception which are consulted to produce presentations in support of particular tasks, and which wi l l minimize the perceptual and cognitive demands placed on users engaged in these tasks. The effectiveness of the corresponding presentations is measured by reaction-time regression studies of users engaged in five tasks in the airline reser-vation domain. Marks [132] has addressed the issue of avoiding unwanted conversational i m -plicatures in a coordinated text-graphics environment. One of the novelties he claims is the inclusion of relations that describe the perceptual organization of sym-bols. Canonical organizational principles he mentions explicitly are sequential lay-out, proximity grouping, alignment, symmetry, similarity and ordering. Rather than fabricate some convincing post hoc argument for the necessity of paying at-tention to perceptual issues, he properly and soberly points out that " . . .it is virtu-ally impossible to design meaningful network diagrams for which no perceptual organization wi l l occur." Henry and Hudson [97] have taken what might be called a semi-automatic ap-proach to the generation of graphs. Their paper describes a system which supports a user in the exploration of large graphs, primarily by allowing iterative, interac-tive refinement of layout algorithms. Although the authors do not explicitly men-tion modelling of the user, this paper is interesting in the present context because the observations about layout may be relevant to navigation through abstract in -formation spaces as well. It may not always be the case that (hypertext) systems need to display to their users a graph of all or part of the underlying database, but it w i l l always be the case in appropriately large systems that only some subset of the information can be presented.6 This puts the emphasis upon user-centered means of deciding what 6 T h e idea of an "intelligent zoom" [19] changes this characterization in that while the entire 46 should be presented, and how it should be presented. Approaches like the one de-scribed in this thesis might be used to select a critical subset of the links in a hy-permedia document, which might then be explored by a user. This critical subset is continuously subject to re-evaluation, in the course of new input from the user, and from other sources of information as may be available to the system. Noik [150] pursues another approach to automatic graph layout, taking the no-tion of the fisheye view to multiple focal points in hierarchically nested structures. This leads to the idea of using some technique to determine the focal points from a user model. Although most reviewers would likely place Feiner's work in the context of visualization or multi-media (as has been done here with some of his work), some of his efforts can be seen to contribute to the field of automatic generation [71]. 3.3 Data Analysis and Visualization Or you can turn your figures into, for instance, a flock ofseagulls, and the formation they fly in and the way in which the wings of each gull beat will be determined by the performance of each division of your company. Great for producing animated corporate logos that actually mean something. But the silliest feature of all was that if you wanted your company accounts repre-sented as a piece of music, it could do that as well. Well, I thought it was silly. The corporate world went bananas over it. —Douglas Adams Dirk Gently's Holistic Detective Agency Some of the visualization work being undertaken today is of interest here for several reasons. Researchers in this field have recognized the importance of the human in the HCI loop, and cooperative systems are being designed to take full space is rendered, only parts of it are rendered legibly; such approaches have the advantage of pro-viding some context for the viewer. 47 advantage of human perceptual abilities. To automatically achieve the commu-nicative goals of the author in his absence, systems must be able to present infor-mation appropriately, taking advantage of human perceptual abilities and avoiding its limitations. The best known approach to data visualization is the scatterplot [90] [197]. The success of this technique is due to the ability of the early vision system to group points in space based upon proximity and similarity in color, size and shape. Ware and Beatty have shown that up to five dimensions can be effectively mapped to a ful l color scatterplot display, and suggest ways in which the visual effect can be maximized. Were it not for the need to detect patterns in data of arbitrarily high dimension, efforts might have stopped here. A n increasingly popular approach to enlarging the dimensionality of displays is what has been variously referred to as iconography, and geometric coding. This approach employs a generalization of the traditional graphic primitive, the pixel, into a parameterized icon whose features are mapped to distinct dimensions of the data stream. A famous example which proved more useful as a characterization of the method than representative of its success, is the ChernoffFace icon family [43]. t \ ) (a) (b) (c) Figure 3.1: EX VIS five-dimensional stick figures The generalized icon (gicon) is a generalization of the pixel to higher dimen-sions. The strategy has been to allow the information in different channels of the 48 Figure 3.2: Iconograph input data to control corresponding pixels in each gicon. In general, the gicon is an nxm array of pixels, each mapped to a different input channel. The available dis-play surface is then tiled with these icons. The logic of this and related approaches is that the number of information channels which can be displayed is increased: "Geometric coding allows for further and far reaching extensions [over color] of dimensionality. Observers can utilize shape perceptions to sense the combinations of data at each location and texture perception to sense how those combinations are spatially distributed." This was the rationale behind the Chernoff Face icon family, as well as the stick figure family described by Pickett and Grinstein [157]. The latter is a stick figure consisting of several connected line segments, where the angle of inclination of each limb is controlled by a different dimension of the nu-merical data to be visualized. Figure 3.1 presents a representative member of the aforementioned stick figure family, and Figure 3.2 shows how large numbers of them interact to produce global perceptual effects from the underlying data set.7 7 T h i s figure depicts satellite imagery data from the western tip of Lake Ontario; the data was 49 Deliverable Graphic Task/Data Domain Relational data Scope Not applicable Target Media Conventional CRT, also audio Flesh and Blood Its raison-d'etre Style Not applicable Models Not applicable Role Relationships Not applicable Heterogeneity Not applicable Spaces Static Table 3.2: Data Analysis and Visualization. The search for effective icons is also lead by studies of pre-attention (see Sec-tion 3.4): "Shifts along certain dimensions of color, shape and motion of elements lead to preattentive discrimination, and it is variation in these dimensions that we must seek to bring under data control in our texture displays." Some early implementations have been described in the literature. The Ex -ploratory Visualization (Exvis) project [185], for instance: is a multi-disciplinary effort to develop new paradigms for the exploration of data with very high dimensionality. The fundamental philosophy behind Exvis is that data representation tools should be driven by the perceptual pow-ers of the human. In addition, the interpretation of data of very high dimen-sionality will be maximized only when we learn how to capitalize simulta-neously on multiple domains of human perceptual capabilities. This project is in the early stages of exploring the possibilities of iconographic data representation using sound attributes, along with the integration of auditory and visual displays into a single unified data exploration facility. (See also Gr in -stein et al. [90] and Pickett [156] and the summary in Table 3.2.) collected by a U.S. Air Force Geophysics Laboratory weather satellite [90]. 50 3.4 Psychophysical Research Basic work in psychology has resulted in improving models of human perception. Much of this work (e.g., [144] [191] [190] [193] [192]) has been concerned with elaborating a putative dichotomy between processes which are pre-attentive and those which require attention. Pre-attentive processes are characterized by their speed: they are fast, typically accomplished within 100ms, suggesting that they are performed in parallel by the human perceptual system. Such processes are some-times referred to as automatic, parallel, or early-vision processes. Although such a dichotomy is conceptually attractive, it has been increasingly unable to account for the data, and new models are appearing which refer to a continuous ranking of perceptual difficulty. Pre-attentive tasks are at the extreme 'easy' end of this continuum, while tasks requiring attention are at the other, 'hard' end of the scale. Other researchers (e.g., [137] [66]), while also interested in developing a ba-sic perceptual language, are not so concerned with the underlying psychological model. Color Considerations Color deserves a separate section in this document for several reasons. Our world is, for most of us, a very colorful place, and the value of color should not be ignored in the computational models we build, of people, and things. Color is, not surprisingly, one of the most effective psychophysical stimulus dimensions. Even in the absence of a complete neurophysiological un-derpinning, a tremendous amount of informal as well as empirical information is available on the use of color to accomplish various communicative tasks. Not only is the use of color in visualization powerful, but the kind of knowledge we have of color capabilities provokes questions about other human perceptual capacities. Ware and Beatty show that it is possible for human observers to perceive five data dimensions simultaneously using color [197]. The data they used was 51 characterized by a hyperellipsoidal probability density distribution, but they con-clude with respect to the generality of their results that: "colour is l ikely to be ef-fective in assisting in the perception of correlations in multidimensional space." Although in most cases they found that adding color was expressively equivalent to adding three more spatial dimensions, color is not a completely heterogeneous perceptual space, and "resolution is worse in some directions than in others." In particular, when clusters are separated along dimensions which have been mapped to color, perception suffers. Clusters are perceived as distinct when they are sep-arated by between three and five standard deviations along most of the possible vectors; "much greater cluster separation is necessary before two clusters can be resolved" when they are separated on [only] "a few" specific color vectors. They observe that users require no training to use their color-based five-dimensional visualization tool, but point out the importance of control over the background color, which tends to emphasize particular colors in the display, and consequently particular correlations in the data (this is a perceptual implicature, which systems can attempt to mitigate, or to anticipate and exploit to emphasize data of interest). Murch also gives an interesting summary of the use of color, from the point of view of a graphics practitioner [140]. He distinguishes between the qualitative and quantitative uses of color, and provides a stimulating list of guidelines for the effective use of color derived from physiological, perceptual and cognitive stud-ies. Murch also provides a list of the sixteen best and worst color combinations. This is the kind of knowledge that wi l l need to be consulted by mature automatic presentation systems. Benbasat undertakes a series of empirical investigations on the impact of color on presentation [23], and the impact of presentation on a variety of manage-52 rial decision-making tasks [22] [126]. These investigations are representative of a thread of the literature entirely separate from what has been thus far cited in this paper. The results of Benbasat's work are consistent with those of Mackinlay. There is a wide range of sometimes contradictory information available on the use of color to represent information in visual displays. Making use of this infor-mation is difficult, however, and unless there is a pressing need to convey a large number of categorical variables, a designer is best off with gradations of a single hue to represent changes in the value of a quantitative variable. G r a p h i c a l Semant ics Montalvo [136] [137] and Grosz [132] are both inter-ested in the meanings of graphical displays. Grosz has also done much work in computational linguistics, in a purely textual environment; some ideas have ex-ported well to other media. A more pragmatic approach is detailed by Kurlander and Feiner [120]. These and other similar guidelines are suggestive of a beginning for a database of default axioms for reasoning about presentations of information. Results of these studies are of direct consequence for designers of human-computer interfaces. 53 3.5 Multimedia Words are too solid they don't move fast enough to catch the blur in the brain that flies by and is gone... —Suzanne Vega, Language One day while walking along the road, the monk Gisho met his master, who was blindfolded. Gisho asked, "What have you seen?" and his master replied, "Thewind whistling in my ears." —Zeutuban III By definition and by convention, a system which makes use of more than one medium 8 is a multimedia system.9 In its most liberal interpretation, the rubric "multimedia" would cover any sys-tem which employs the sound of a bell to direct user attention. Typically, though, the term refers to systems with more or less esoteric applications (by today's - o r yesterday's- standards) of graphics or video. Multimedia systems promise great improvements to interfaces, and afford scope for new types of interfaces. Fox [78] writes of the possibilities in computer aided instruction: If computer systems can develop accurate models of user knowledge, includ-ing goals and plans as well as facts, multi-media information might then dra-matically improve the bandwidth and effectiveness of instruction. 8 S o m e confusion exists in the literature(s) as to whether it is medium or mode which is being multiplied, and the terms multimodal and multimedia are often conflated. This thesis is not the place to embark upon yet another religious tirade designed to settle the issue; the terms w i l l be used loosely where such usage is harmless. 9 m u l - t i - m e - d i a ' m e - d - e - - e ( 1 9 6 2 ) : u s i n g , i n v o l v i n g , o r e n c o m p a s s i n g s e v e r a l m e d i a <a m u l t i m e d i a a p p r o a c h t o l e a r n i n g > (Webster's 7th Dictionary, on-line copy). 54 Similar statements can be made for other areas of investigation, and in partic-ular for the general goal of user-tailored run-time presentation. The rest of this section reviews research in multimedia presentation. Feiner Much of Feiner's work has been directed at identifying and resolving multimedia presentation issues [72] [69] [70], and he has given consideration to the use of models (albeit static ones) of tasks, objects and their interactions, and of models of user knowledge. These models have been brought to bear on the pre-sentation task to determine which objects are to be included in a rendering of a scene, the level of detail with which they should be rendered, and what if any spe-cial visual effects are to be employed. For instance, i f the task in which a viewer is engaged involves the manipulation of a complex piece of machinery, its func-tional parts may all be rendered as three dimensional solids; if the task is to dis-assemble the machine, an exploded view or the use of transparency may be called for to show the inter-relationships of the working parts. Feiner's work has been primarily research-oriented and has not resulted in any commercial products. Other work by Feiner et al. also addresses automated generation of graphical presentations. His work with Peter Karp [105] addresses the automated generation of animated presentations with the use of models of knowledge about fi lmmaking, and points out the advantages; not only does design-time authoring support for an-imated presentations decrease their cost, he points out that "automation could ulti -mately make it possible to generate presentations on the fly that are customized for a particular viewer and situation, adaptively presenting information whose content cannot be fully anticipated." Feiner does not pursue this goal, but we do in the cur-rent work. Explicit models of users do not play a part in the prototype associated with Feiner and Karp's work, but the notion of intent-based presentation is begin-ning here to take hold. With Seligman [183], Feiner pays particular attention to the communicative intent behind the presentation; this is where the term 'intent-based' 55 first appears. The Intent-Based Illustration System (IBIS) designs illustrations with a generate-and-test approach using a rule-based system of methods and evaluators. The former are rules that specify how to accomplish visual effects, while the latter are rules that measure how well a visual effect is accomplished in an illustration. The evaluators can be thought of as psychophysical^ motivated constraints on the presentations.1 0 Sti l l more work explores the coordination of different media (text and graphics in particular) in single presentations [73]. WIP The WIP project at D F K I under the direction of Wolfgang Wahlster is s im-ilar to Feiner's IBIS system in that it too uses underlying static models of objects and tasks to determine the layout of a presentation. The WIP researchers say of their approach that it "should be understood as a starting point which is of practi-cal use for the automatic synthesis of various kinds of pictures... in a multimodal system." [173] These statements are made in the light of the realization that there is not yet a mature theory of graphical communication upon which to build such systems. WIP's most notable incremental contribution to automatic multimedia gener-ation is the implemented co-dependence of its content-design and format-presenta-tion engines: the execution of these units is temporally interleaved, permitting feed-back from later in the presentation pipeline to affect and even retract earlier design decisions. For instance, if the system has decided to show a picture of an espresso machine as part of the instructions for its use, and has begun laying out textual la -bels of all its functional parts only to find out that one of the labels is too long to fit within the boundaries of the part, it may decide to lay out the other labels in a different way or to draw the entire espresso machine from a different perspective that would support the original labelling method. 1 0 S e e "perceptual salience" in Section 5.4.1. 56 The W I P architecture also lends itself to parallelization [195]. A L F r e s c o Stock [187] [186] describes an on-going effort to integrate natural language and hypermedia interfaces. The domain is the exploration of 14th cen-tury Italian frescoes, and gives users the means to retrieve multimedia information via natural language queries. Stock argues that this approach takes advantage of multimedia to increase the bandwidth of the communication channel between hu-man and system, and reduces the lost-in-hyperspace problem. F R E S S , E D S , Intermedia and InterNote Work at Brown University has re-sulted in several generations of systems, each building upon the strengths of its precursors. Early experience with the Hypertext Editing System in the late 1960's and then F R E S S [202] , n and E D S [202] led to the more modern Intermedia [202] and InterNote [39] systems. F R E S S (File Retrieval and Editing System) is a multi-user hypertext system developed in the late 1960's, and E D S (Electronic Document System) is a hyper-media system developed in 1982 for V A X and Ramtek 9400 color display envi-ronments. One of the major differences between F R E S S and E D S is the addi-tion of maps to help users avoid getting lost. Both systems offered support for bi-directional l inking and keyworded links and nodes. Intermedia is an electronic document system which is not a separate applica-tion, but a framework for a collection of tools that allow authors to make links be-tween standard types of documents created with heterogeneous applications. "The material an application creates is the document.'" Intermedia was designed with multi-user interactive educational applications in mind, and thus emphasizes in -teractive display and annotation facilities. Professors, for instance, "would be au-thorized to add or change 'canonical' hypertext structure, whereas the students 1 1 Commercial ly reimplemented in the early 1970's by Phi l l ips Corporation [51, p447]. 57 Deliverable Generalized electronic document Task/Data Domain General desktop support Scope Version control and maintenance Target Media Text, graphics, animation Flesh and Blood No explicit perceptual effort Style Not applicable Models Implicit Role Relationships Blurred Heterogeneity Between applications in the desktop environment Spaces Dynamic information space Table 3.3: Intermedia. would be authorized only to add links and annotations." [147] Links in Interme-dia are uni-directional connections between two arbitrary and application-specific objects [92]. See Table 3.3 for a summary. Finally, InterNote is an extension of Intermedia designed to better support small collaborative groups involved particularly with document review and revision. A much-cited aspect of the extension involves the ability to transfer data across links using a technique the authors call warm linking.12 NoteCards [93] is implemented within the Xerox L ISP environment, and was designed to support authors, researchers, designers, and "other intellectual labor-ers" in their daily toil. 1 2 A n o t h e r advocate of non-standard l inking practices is Schnase et alias [180], who urge that arbitrary computational methods be integrated into any l ink structure. This enlarges even further upon even unconventional definitions of authoring; observe that the now popular http protocol for T C P / I P based communication of H T M L - c o d e d information permits just this k ind of l inking via the C G I interface. See h t t p : / / h o o h o o . n c s a . u i u c . e d u / c g i / i n t e r f a c e . h t m l 58 A n electronic generalization of the familiar paper index card, the notecard can contain text, drawings or bitmap images, and -more recently added- video se-quences. Cards also have titles, and there can be a variety of card types. Typed, bidirectional Links connect notecards into networks. Types are labels chosen by the user to represent the relationship between source and destination card. L ink sources are anchored to icons, but destinations are entire cards. Browsers are spe-cial notecards displaying a structural diagram of a network of notecards; they are created by the system and are thereafter manipulable as any other notecard. They also permit manipulation of the structure of the information space via direct ma-nipulation of the objects which appear on the browser. Fileboxes are special cards used to organize or categorize collections of notecards. NoteCards also provides a limited search capability to locate all cards matching some user-supplied specification, currently restricted to text-string search on titles. The system is in use at numerous locations within Xerox, as well as at universi-ties and government agencies for document authoring, legal argumentation, devel-opment of instructional materials, design of copier parts, and competitive market analysis. Halasz [93] makes a number of useful distinctions in order to arrive at three dimensions for analysis. In particular, he explains how systems designed to sup-port browsing differ from those designed to support authoring. As one extreme of a continuum, systems purely for browsing do not permit modification of the infor-mation space. Systems which are primarily intended for authoring are likely to be used by a small number of authors to prepare information for a large number of more-or-less casual readers; these systems wi l l have well-developed creation and modification tools, and wi l l support continuous modification of information struc-ture as part of ongoing task activities. 59 3.5.1 Video Authoring The spotlight follows her for a moment, maybe picking up some stock footage. Video-tape is cheap. You never know when something will be useful, so you might as well videotape, it. —Neal Stephenson Snow Crash, page 33. No matter how apolitical the producer of the work of art may seem, every work has political relevance of some sort. —James Monaco How to Read a Film, page 11 The following extract from Davenport et al. [62] might well serve as a mani-festo for this thesis, with only a few changes and additions to reflect the underlying communicative goal of the author of the presentation, a role unaccounted for by Davenport: In the case of on-line video servers as well as home-movie editing assistants, the machine must respond to the user by selecting the "best" shots, sounds, and text chunks, then orchestrating or sequencing them to emphasize a partic-ular story. The story content should reflect the user's background and intent. The incentive to provide presentations which have been particularized to the viewer's needs and interests is even stronger with video than with traditional media because time is a precious human commodity, and time is what it takes to annotate, and to view video. Traditional authoring paradigms do not support such run-time determination of form and content. Video is finding increasing use as a transcription medium in many fields be-cause it arguably provides the richest record (the "thickest description" [82]) of the events of interest. Video recording offers high bandwidth, greatly exceeding hu-man note-taking skills and speed; researchers can later review and annotate video 60 at leisure. And , increasingly, video is cheap. These attributes virtually ensure a growing abundance of video material for future on-line presentation systems. On the other hand, the limitations of the traditional approach to authoring are most obvious when applied in non-traditional media, such as in the video medium. The raw material must first be acquired, which involves filming and possibly dig-itizing. From this raw source, video authors must assemble cuts into a cohesive presentation. The raw footage can be very voluminous, and the relevant parts of it very sparsely distributed. Ten hours of video taken during a field study for a new graphical user interface (GUI), for instance, may include many instances of coffee drinking and doughnut eating by the users that may not be relevant to any conceivable presentation. Nevertheless, it takes someone at least ten hours 1 3 to scrutinize the footage for something useful. The process of identifying these use-ful events and sequences has been called annotation, and a number of systems have been designed to expedite it. (See, for instance, Buxton and Moran [34], Goldman-Segall [83], Harrison and Baecker [95], MacKay and Tatar [127], MacKay and Davenport [128] and Suchman and Trigg [188].) When the author has finally identified a set of cuts he or she deems to be rele-vant to an eventual presentation, the traditional notion of authoring requires assem-bling these into their final presentation order. Although quite adequate for creating rock music videos, this approach suffers from the aforementioned limitation, that such a presentation can not be tailored to the needs of individual viewers. It is here that video data diverges significantly from text, graphics and even an-imation. Video data is inherently uninterpreted information in the sense that there currently are no general computational mechanisms for content-searching video data with the syntactic precision of generalized textual search. 1 4 Even graphics, 1 3 E m p i r i c a l investigations to date suggest that human annotation of video material takes an order of magnitude more human time than the duration of the video being annotated; see, for instance, Harrison [95]. 1 4 See , however, Cherfaoui and Bertin [42], who use digital image processing techniques to ex-61 because there is usually an underlying model or database that can be queried, and animation, which also has a model or database with temporal information added, can be searched for information using existing computer tools. But frames and sequences of frames in video data cannot easily be queried for semantic content except in fairly specialized domains. A t present, the only practical way of accessing a video database is for a human to first annotate it so that the annotation can be used to guide the author and the viewer. Creating this annotation, at least with the current tools, is an inherently linear operation (in terms of the time required to do it) and is a major bottleneck in the authoring of video documents. It is useful to distinguish between the transcription processes of logging at the lexical level, which lends itself to some degree of automation, and annotation, a semantic/pragmatic task which wi l l require human intervention for the foresee-able future. A log of a meeting can be acquired automatically, for instance, by the Group Support System (GSS) software used by the participants. This log can be subsequently used to index a video record of the meeting to find instances of user actions at as low as the keystroke level and as high as the level(s) of abstrac-tion embedded into the GSS (e.g., "brain-storming session," "open discussion," etc.) [164]. Annotation, on the other hand, is at a higher level of abstraction, de-fined by the eventual use to which the record of the meeting is to be put. Although the intent-based authoring paradigm can influence the annotation pro-cess Csinger [59], the remainder of this dissertation focusses on the post-annotation processes of presentation. In the video medium, selecting the intervals of the record to be displayed, as well as the order in which they are to be displayed, are both serious problems. Previous work in automatic presentation has dealt with some aspects of both tract some types of information from video. Recent work by Goldberg and Madrane on automatic extraction of spatio-temporal indices from video at Eurecom Institut, Sophia Ant ipol is , France, is also noteworthy. Refer to Joly and Cherfaoui [102] for a survey of related approaches. 62 of these questions, and has been restricted for the most part to choosing 'the right way' to display items based on their syntactic form [130] [175] (see Section 3.2). Semantic qualities of the data to be displayed are seldom considered; Karp [106] is an exception, where he describes a system called E S P L A N A D E (Expert Sys-tem for P L A N n i n g Animation, Design, and Editing), a knowledge-based anima-tion presentation planner that uses as input a separately supplied script and a set of communicative goals. E S P L A N A D E creates a presentation plan at the individual frame level, specifying a hierarchy of sequences, scenes and shots. Presentation and transcription are inextricably intertwined. A presentation sys-tem can not present what has not been transcribed; the executive can not retrieve all instances of the mention of a competitor company's name unless the minutes of the meeting contain these references, nor can he retrieve everything said by his subordinate, Doug, unless the minutes are appropriately structured. If the meeting involves a GSS , the facilitation function of the GSS can be expected to provide some of the knowledge required for both the presentation and transcription func-tions. Davenport et al. [62] describe their approach to interactive digital movie mak-ing, a domain similar to ours in that they must log and annotate video footage for later retrieval by computer, in the absence of a human editor. Their domain differs in that it permits control over the acquisition of original raw footage. They are also not as interested in modelling the user per se as they are in giving the user mean-ingful interaction affordances to select variants of the movie. As movie-makers, Davenport et al. go to some effort to maintain the stylistic consistency of their pre-sentation, an important element with which we have not yet concerned ourselves. Davenport et al. [63] describe "cinematic primitives for multimedia," a set of dimensions along which video shots are annotated for later reference. Their chal-lenge, they claim, "is to develop robust frameworks for representing story elements to the machine such that they can be retrieved in multiple contexts." Our goal, 63 though not focussed on interactive film and storytelling, is similar: presentations always have a 'story to tell, ' even i f they are designed automatically. Creators of interactive or multivariant video are interested in preserving "underlying narrative structures" [68, p i 2 ] , a quality not far removed from what is called intent in this thesis. In the absence of a solution to the general problem, the amount of costly human effort currently involved in annotating and browsing multimedia information wi l l only be multiplied with the growing interest in the new technologies. Chapter 6 includes a description of how our approach to video authoring has been applied; this medium was chosen because authoring in the video medium is even harder than in conventional media; there is nothing in our approach that prevents it from working across media boundaries. 3.6 Computer-Supported Cooperative Work Computer Supported Cooperative Work (CSCW) developed over the eighties into a separate field of research in its own right. Work in this area examines the poten-tial use of computational support for individuals engaged in collaborative group work. The field draws on research activity in diverse disciplines including com-puter science, artificial intelligence, psychology, sociology, organizational theory and anthropology [87, p5]. Not as relevant to the current work, perhaps, as some of the other areas sur-veyed here, it is included because automatic authoring and presentation systems can be multi-agented, supporting the collaborative activities of multiple authors and readers. The intent-based authoring paradigm as related in this dissertation in -volves agents in the roles of author and reader(s), communicating in general asyn-chronously via the author's intent and the user model of the reader. Although this thread wi l l not be followed in this dissertation, future work wi l l need to address it. 64 A distinction should be drawn between systems which are cooperative, and those which support collaborative work. Although the literature varies in its usage of these terms, they wi l l be used consistently throughout this document in the fo l -lowing way: Cooperative systems are those which exhibit behavior which is sen-sitive to the needs of the individual user-agent. Collaborative systems are those which have been designed to support the joint activity of a number of user-agents. Same Place Distributed Synchronous Asynchronous blackboard... bulletin board... telephone... email... Table 3.4: CSCW time and space diagram. Much of the C S C W research literature has been broken up along the dimen-sions of support for tasks in which the multiple collaborators operate synchronously or asynchronously, in the same place or remotely, as reflected in the familiar time-and-space partitioning of Table 3.4. A good deal of this literature has been con-cerned with face-to-face collaboration. With the continual advance of computa-tional hardware technology, and the increasing bandwidth available for informa-tion transfer, there wi l l be less need for collaborators to travel for meetings, and a concomitant decrease in the emphasis on face-to-face meeting support technol-ogy. This discussion wi l l therefore focus on geographically distributed systems. Halasz [93] writes: Hypermedia is a natural medium for supporting collaborative work. Creating annotations, maintaining multiple organizations of a single set of materials, and transferring messages between asynchronous users are the kinds of activ-ities that form the basis of any collaborative effort. These are also activities for which hypermedia systems are ideally suited. Work in the field of C S C W is so closely associated with hypertext and hyper-media that it is tempting to conflate these issues. Researchers intent upon creating 65 a sense of co-presence, or being higher on "the social awareness scale" [125], or what B ly has called "connectedness" [27] and Laurel "engagement" [123], have jumped at multimedia technologies as part and parcel of the solution, without show-ing first that such measures are necessary. In fact, users of systems which incor-porate video to facilitate face-to-face interaction pay far less attention to the video information than expected [27]. 1 5 Sarin et al. [177] advance the notion of a [real-time] conference as an 'abstract object' and discuss some conference design issues. They identify the following dimensions: shared versus individual views, access control, concurrency control, getting data in and out (from other applications, or from paper, etc.), constraints on real-time conference design. They describe the 'virtual terminal approach' as a way of giving single-user applications the means of serving multi-user confer-ences: a 'virtual terminal controller' is responsible for multiplexing user I/O.16 Sarin's work does not get sidetracked with multimedia issues and good advice is to be found throughout the article for practitioners involved with the design and implementation of C S C W systems. Lee [124] presents a system and an approach designed to support group de-cision making, focussing upon representation of the task-derived components of decision-making processes. In particular, the alternatives are represented explic-itly, as are the goals to be satisfied, and the arguments "evaluating the alternatives with respect to these goals." Z O G and K M S : Shared, distributed hypermedia systems designed for collab-orative work. K M S [3] [203] was developed over the 1970's at Carnegie M e l -lon University, and then commercialized by Knowledge Systems, Inc. A version 1 5 F o r further emphases that technology alone is not enough to solve the C S C W problems, see Engelbart's urgings [65]. 1 6 T h i s approach is subsumed by the 'symbiotic interface' proposed by Booth and Gentle-man [29]. 66 found application on board the USS Carl Vinson for a variety of tasks. K M S was designed "to help organizations manage their knowledge," and features organi-zation wide support for collaboration in a broad range of areas, including elec-tronic publishing, on-line documentation, project management, software engineer-ing, computer aided instruction, electronic mail and issue analysis. Screen-sized W Y S I W Y G workspaces called 'frames' contain text, graphics and image items, and can be linked to other frames or used to invoke programs. Links are unidirec-tional, and destinations are entire frames, which are viewed one at a time. There is no mode boundary between navigating and editing, which is to say that there is no system-imposed distinction between reader and writer. These role cat-egories are implicitly conventionalized in the K M S user community, and much of K M S ' s functionality depends upon convention. For instance, rather than provide a separate system-level representation for annotations, these are ordinary links dis-tinguished only by a character prefix provided by the annotator and conventionally understood by the user community as identifying an annotation. Certain function-ality within the system depends upon convention as wel l ; electronic mail service, for instance, depends upon access to shared frames which function as user mai l -boxes. The designers of K M S have in this way relied upon convention to augment their rather minimal user-interface. See Table 3.5 for a summary. Time-multiplexing of screen real-estate via quick response time is used in fa-vor of multiple-window 'space mult iplexing. ' 1 7 Adhering again to a minimalist principle, K M S does not provide a separate mechanism in support of user naviga-tion; there are no 'overview maps' or other devices to help orient the user in un-familiar parts of the information space. Instead, the designers claim to have been vindicated in their belief that fast response - w h i c h enables low cost exploration and backtracking- is enough to avoid getting irreparably lost. 1 7 Discuss ions with K e l l y Booth have led me to believe that although increases are still possible in both time and space multiplexing as described above, growth w i l l be more bounded in the space than in the time domains. 67 Deliverable Document organization Task/Data Domain General info, manipulation Scope Versioning and maintenance Target Media Conventional graphic display Flesh and Blood Not explicit Style Local convention Models Not explicit Role Relationships Implicit, conventionalized Heterogeneity Seamless Spaces Dynamic shared information Table 3.5: KMS. K M S is also notable for its approach to access control. The system supports shared access to a single logical but physically distributed database. Since the basic unit of information is a frame and is thus limited to what can fit in a screenful, any large database wi l l be composed of a large number of frames; this number is generally much larger than the number of users of the database. The designers of K M S use this observation to point out that access conflicts wi l l be rare, and that conventions can once again be relied upon to evolve which wi l l serve to circumvent those collisions which might occur. Access privileges are established on a frame-by-frame basis by the creators of frames. Once again, it is convention, rather than embedded functionality, which governs access to information under K M S . A guiding force in the design of K M S was the desire to permit paper rendi-tions of K M S documents. This is achieved by hybridizing markup notation into the W Y S I W Y G frame displays. This, along with the design intention to support individual as well as collaborative work, contributes to the seamlessness of the sys-tem. 68 Shared Workspaces The main thrust of systems under this rubric is the creation of collaborative working environments that promote the illusion of 'mutual situ-atedness.' Although collaborators may be in physically separated environments, they work in a space which is shared, in the sense that they can refer to objects in that space, relying upon mutual awareness of the terms of the referral as well as of the referent {cf. deixis). Many strategies have been adopted to this end. Sara B ly and other Xerox researchers have explored a variety of shared work-spaces. A n interesting experiment was conducted involving the interconnection of the Portland, Oregon and the Palo Alto laboratories with a network of video, audio and computing technologies called Media Space [153]. Similar experiments were conducted, connecting the Palo Alto labs with Xerox labs in the United Kingdom. Other work at Xerox has focussed on shared drawing surfaces [28]. Further experiments [135] explored the three-way sharing of the drawing surface. Team Workstation [99] [100] is a desktop, real-time system advancing a shared workspace in the form of a sharable computer screen for concurrent pointing, writ-ing and drawing, as well as live video and audio links for face-to-face conversa-tion. The creators of TeamWorkStation emphasize its 'seamless' operation: since the shared screen is video-generated, the individual participants can continue to use the tools with which they are already familiar, including the still ubiquitous pencil and paper. Although the system is currently implemented on Macintosh comput-ers, this same factor wi l l permit interoperability between heterogeneous comput-ers. The technology is used here to facilitate remote interaction, and every attempt is made to avoid interfering with existing work habits. Distributed Knowledge Worker ( D K W ) [96] was developed at the I B M Canada Lab in recognition of the importance of meetings to the operation of businesses today, and with the awareness that face-to-face settings are not always possible for these meetings. Addressing the issues surrounding remote, real-time meet-ings, D K W is a minimal support system, providing the means with which to har-69 ness available bandwidth rather than imposing a structure on the meeting itself: the 'meeting facilitator' component of the system manages rather than controls the meeting. D K W supports multiple meetings, in which a shared information space is also a shared presentation space (i.e., WYSIWIS) in the form of whiteboard, mult i -user text editor, video, file transfer and text chat functions, not all of which are completely implemented. The set of on-going meetings is a conference. A 'min -utes log' is automatically augmented with information about meeting starting and ending times, times of members joining and leaving a meeting, information saved on the whiteboard, and information on files that were transferred. 3.7 Structured Documents: Content from Form Some document preparation systems (e.g., Scribe [165] and LaTeX [121]) have made the important distinction between content and form, between what is pre-sented and how it is presented. Such systems make possible the separation of the specification of the contents of a presentation from the specification of how it should look, and though little advantage has been taken thus far of the potential repre-sented by this separation, the ramifications are beginning to be well understood. This methodological separation of content from form prefigures the further sep-aration of intent advocated as part of this thesis. In the same way that the structured document paradigm makes possible new ways for browsers to interact with the pre-sentation space, the intent-based paradigm enables new ways for readers to interact with the information space. Efforts continue to build automatic interface and application generators [149] [148], where the form of the interface need not be decided in advance by the ap-plication designer. Relational data models likewise separate low-level disk storage issues from higher, user-level interpretations of meaning [49] [41]. And , of course, the evolution of high-level programming languages has led to increasing abstrac-70 tion; programmers need know less and less of the underlying machine constructs and can focus instead upon concepts defined at the task level. The leading edge of this trend is to be found in the object-oriented programming paradigm, where system behavior emerges from the interaction of self-contained objects created by a programmer to be isomorphic to concepts in the task domain. Even so, the structured document paradigm [7] strays only little from the con-ventional, traditional notion of authoring (see Section 1.1) in that it is usually the author himself who decides the content of the eventual presentation. A brief re-view of the literature pertaining to structured documents appears in this section. Reid [166] offers what he calls "observations about systems employing struc-tured documents" [pl08]: • Structured documents contain more information than just the text or graphics itself. That information is usually called "the structure." • Most structured documents must be processed or compiled or formatted into some concrete form before they can be printed or displayed. • The compilation process discards information: the same structured docu-ment can be processed into several different concrete documents. Newcomb [146, p67] offers a definition: Structured documents are so named because the hierarchical and sequential structure of the various kinds of information they contain is made explicit by identifying tags. Each tag associates a "generic identifier"— the name of the kind of thing being tagged (e.g., "subsection") — with the data surrounded by a start tag and end tag of the same generic identifier. The familiar (in academic circles) LaTeX [121] document formatter (built as a macro package for the TeX type-setting system [112]) is in this spirit, as was Scribe [165], its acknowledged predecessor. The "structure" and other informa-tion added to create a structured document is sometimes referred to as its "markup" 71 and the dialects used to convey this information have been called "markup lan-guages." Hypertext Markup Language (HTML) is probably the best known, cur-rently in wide use for the authoring of documents on the World Wide Web ( W W W ) (see Section 3.9). 1 8 Such documents generally leave rendering decisions to a run-time browser program such as the popular Netscape and Mosaic products, which are graphical browsers that try to provide a full color high-resolution layout for H T M L files, or Lynx, a text-only browser. Reid [166, p i07] nicely sets out the problems in store for intent-based authors when he points out (the emphasis is mine): There is usually a separation in time or space between the creator of the docu-ment and the user of the document. The difficulty in communicating and stor-ing structured documents comes from the need to make sure that the reader will interpret the stmcture in the way that the writer intended. ... to make sure that there is enough information included with the document that equivalent bindings can be made by the reader, to produce a printed or displayed unstructured document that properly resembles what the writer in-tended. But then he goes on to conclude that these problems are unsolvable: The problem of communicating structured documents is fundamentally un-solvable; it is almost unsolvable by definition. This is because the only way to be certain that the structure will not be misinterpreted, that the binding de-cisions will be made properly, is to make all of the binding decisions and re-move the stmcture from the document. This reduces the problem to one that is quite solvable, namely the communication of an unstructured document, but also that is not nearly as useful. Reid appears not to have considered the application of automated intelligence to supply the missing information. The intent-based approach advocated in this 1 8 H T M L is actually a document type defined in Standard Generalized Markup Language ( S G M L ) . S G M L (ISO 8879-1986) has been adopted by many of the world's largest publishers and by many governments, including the U S and the E C . 72 thesis is actually twofold: the author supplies an intent as an extension to the struc-ture referred to in the structured document paradigm, and a user model is consulted at run time by an intelligent browser that supplies what Reid thinks of as the miss-ing rendering information. What Reid takes as a weakness of the structured document paradigm may well be one of its greatest strengths. The late-binding referred to by Reid is what makes possible the user-tailored presentation of the author's intention. That the receiv-ing station, the presentation system, may be ill-equipped to cope with the task at hand is something computational systems wi l l have to deal with; human interlocu-tors have been misunderstanding each other since before they even developed lan-guage, or speech, yet enough effective communication has taken place to permit the development of human civilization. 3.8 Hypertext and Hypermedia Hypermedia provides one solution to the problem with traditional authoring iden-tified in this thesis, of being unable to tailor presentations to individual users at run-time. In hypermedia systems, the viewer completes the job of the author by selecting and ordering the information to be viewed through the process of nav-igating at run-time the links established by the author. But this only pushes the problem from one person (the author) to another person (the viewer) and it dra-matically increases the demands on the author who must provide the explicit nav-igation cues. Reducing the amount of human effort required from the author and viewer is still a significant problem with current approaches to video authoring. These effects can be mitigated by the intent-based authoring approach advocated in this dissertation. A hypertext document or hypertext is a collection of distinct nodes of informa-tion connected via a network of links. When nodes contain information of differ-73 ent types, such as graphical, auditory or video sequences, the term hypermedia is often applied to the network. Innumerable variations exist on the kinds of links employed, and the way they are activated. The links both impose a structure on the document, and permit run-time determination of the order in which a reader accesses the information. Most irksome of major problems with the hypertext approach that w i l l not d i -minish with mere technological advance is the 'lost in space' problem: in such a huge information space, it is easy to get lost, or sidetracked. A promising solution to this problem is the application of user modelling to hypertext systems. "Hypertext" is a term coined by Nelson in 1965 [145] and covers a wide range of concepts, many of which had their inception in the vision of the memex by Van-nevarBush [33, p i90] . Bush's vision went further, and were it not for technological limitations of his time, he would have tried to build hypertext systems very much along the lines we see today. 1 9 The idea of modelling the user of such systems appears in Bush's writing, as well as in A lan Kay's vision of the Dynabook [110]. Negroponte's [142] musings are also easy to interpret in the context of user-modelling: Imagine a machine that can follow your design methodology and at the same time discern and assimilate your conversational idiosyncrasies. This same machine, after observing your behavior, could build a predictive model of your conversational performance. Such a machine could then reinforce the dialogue by using the predictive model to respond to you in a manner that is in rhythm with your personal behavior and conversational idiosyncrasies. These and other hyper-X concepts appear in many of the systems reviewed in this chapter; some of these were described under multimedia, others under C S C W 1 9 S e e "Memex Revisi ted," in Science is not Enough, 1967, as well as the original Memex article, " A s We may Think," The Atlantic Monthly, vo l . 176, July 1945, pages 101-108, reprinted in [87, p i 7 - 3 4 ] and available at publication time on the World Wide Web at U R L h t t p : / / w w w . c s i . u o t t a w a . c a / d d u c h i e r / m i s c / v b u s h / a s - w e - m a y - t h i n k . h t m l 74 topic headings. This choice was based in each case upon the original design in -tention. Thus, F R E S S , E D S , Intermedia and InterNote, - w h i c h were not explic-itly designed to support collaboration- appear under multimedia, while Z O G and K M S are described under C S C W . 2 0 The rest of this section reviews several hypertext systems conspicuous in the history and the literature of the field. Xanadu [147, p33] is the embodiment of Nelson's vision of a future in which a single hypertext system provides access to all of the world's literature.2 1 The Xanadu model permits access to any part of any document from any document. Nothing is ever deleted in Xanadu, so that links to specific versions of a document are guaranteed to persist. 2 2 Copyright protection wi l l be supported in commercial versions of the system [51]. NLS was developed by Engelbart in the nineteen-sixties, as the first (computa-tional) hypertext-like system, and was highly successful within the research envi-ronment at SRI. It provided the impetus toward interactive computing that drove much of the research for the next two decades [147, p32]. The Symbolics Document Examiner is considered the first 'real-world' hy-pertext system [147, p38]. It serves as the on-line interface to the extensive on-line documentation for the Symbolics system. It is of note to this survey because it is representative of systems which enforce a modal boundary between authors and readers, and it does so in the most direct fashion: Documents are authored with a separate application called Concordia. This strategy is at the extreme opposite end of the spectrum from systems like Intermedia, which do their best to blur these boundaries. 2 0Interesting observations on how to make hypermedia systems more user friendly are to be found in [77], [133], and [206]. 2 1 Parts of the vision are implemented and are available commercially from the Xanadu Operating Company. 2 2 T h i s feature is relevant to the notion of 'thickness' discussed elsewhere in this document, as wel l as to issues of literary crit icism and deconstructionism. 75 The most notable difference between the features provided to users of Docu-ment Examiner, the reading tool and Concordia, the writing tool, is the l ink mech-anism. While readers are limited to uni-directional links, authors in the Symbolics environment are provided with links that are bi-directional. The designers of the system felt that authors needed to know the possible paths readers might take to ar-rive at a node in order to provide a useful "rhetoric of arrival" [147, p i67] . A user of Document Examiner can not make changes to the underlying hypertext docu-ment, though he can save sets of pointers to nodes called bookmarks. The popular HyperCard system shipped free by the Apple Corporation with all Macintosh computers permits both authoring and browsing of hypertexts, but distinguishes these uses via a user-level mechanism. The system operates in vari-ous modes: browse-only, authoring, and programming [9]. A likely reason for the fertility of the area of hypertext is the syncretic, inter-disciplinary backdrop against which the work has taken place: computer scientists, management information specialists, sociologists, anthropologists, artists, educa-tors, mathematicians and other groups have all contributed to the development of concepts and implementations in hypertext. 3.9 Putting it all Together: Cyberspace? There is much talk today about cyberspace, the semi-mythical electronic environ-ment in which we and our computational surrogates w i l l one day meet to work, play, and think. Networks spring up every day now, and everyone is anxious to be connected. Why all the sudden interest? Conklin [51, p454] cautions against the easy answer that recently empowering technology has made these visions of the near future more realistic. He suggests that there has been a gradually growing awareness of the potential benefits of hy-pertext, and that where the computer industry showed no interest twenty years ago 76 in demonstration systems running on state of the art, dedicated hardware, experts today fawn over the potential of a basic home computer connected to the network. Social changes are at the heart of the information revolution, and it w i l l be further such changes at the individual and collective level that wi l l fuel continued "elec-tronification" of information and practice. The rapid acceptance of the global network makes it the ideal carrier for the intent-based authoring and presentation paradigm. The World Wide Web com-bines unified access to the different kinds of information on the Internet, with elec-tronic publishing in Hypertext Markup Language ( H T M L ) , a format for which browser client programs exist on all major computer platforms, from laptop to mainframe. It is a new and continually evolving publishing medium that allows reading and writing and interlinking of documents irrespective of topic or geogra-phy, but with the overwhelming size of the information space come difficult search and retrieval problems. Nelson's vision of a literary hypertext network spanning the globe is rapidly becoming reality, but what form wi l l it take? How wi l l familiar notions carry over from the paper and pencil documentation preparation tradition? The answers to these questions spell out the future for the intent-based authoring paradigm, where the role of the author is completely redefined. Who owns the copyright on a docu-ment rendered by a browser based upon a run-time model of the reader, where the only contribution of the "author" is an intent? 2 3 David Sewell of Rochester University's English Department writes [184]: The traditional view of an "author" as a single autonomous agent, the sole intentional creator of a work, is a product of the age of the codex book, when writing was both material and unalterable. But the electronic medium... "denies the fixity of the text, and... questions the authority of the author..." Curiously, though, electronic communication has tended to hang on tena-ciously to the single, identifiable author: on-line journals have conventional 23Authors in the intent-based authoring paradigm can contribute content, but they are required only to contribute intent. 77 tables of contents and author attributions, nearly all e-mail and news-posting systems identify message senders, and on networks like Usenet the elabo-rate ".sig" or signature appended to one's postings has become a way of tran-scending the uniformity of the medium... Despite the network's potential to allow anonymous collaboration, it is rare for even experimental network art and participatory projects to be anony-mous... Those of us actually taking part in the on-going 'electronification' of the global information space surely recognize the veracity of Sewell's observations. How many of us would be wil l ing under the existing social structures to devote our time and energy to producing anonymous documents that can be borrowed, modified and claimed by others? Lamenting the death of the author [18] was obviously pre-mature. The message for systems developers of the near future is that support is still required for version control, access control, copyright control. Control is the op-erative word; i f the author is dead, his ghost still wants the rights to his work. 78 Scenario: Information Gathering This scenario considers how the implementation described in Chapter 6 functions in the context of the UBC Department of Computer Science Hyperbrochure, an actual application under development in conjunction with this thesis. Information Gathering John, a prospective graduate student, starts up Valhalla after signing on as g u e s t . The user model window pops up with the system's a priori hypotheses about John. Since usage of the guest account carries little information beyond the reasonable assumption that the user is not a current member of the department, some default hypotheses are based upon the knowledge that the terminal John is using is located in a faculty office, and that the departmental on-line calendar lists a faculty recruit-ing seminar that day. These coincidences conspire to produce the false assumption that John is a prospective faculty member. If John notices by looking at the user model window that Valhalla thinks he is a prospective faculty member, he may correct this false assumption at this point by clicking on the button that represents that he is prospective student. John can interact with the user model window immediately, or he may wait until after press-ing the show button and perhaps wondering why the presentation is not meeting his needs as a prospective student. In either case, after correcting the system's mis-conception, he is presented with a brief introduction to the department by its head, and then with a number of clips designed to motivate and increase his interest in the department. Valhalla makes numerous assumptions here about the interests of students and instantiates these goals with footage about sports facilities on cam-pus, regular social events in the department, and a brief overview of research ac-tivities. John's hypothesized age, which is influenced by whether he is a student or faculty member, has an influence upon whether the system assumes he is single or married, which in turn influences content selection. John lets the presentation play to conclusion and logs out. Mary, a prospective faculty member, signs on at the same terminal, also as guest, and consults Valhalla. This time, the a priori assumptions are more rele-vant. (Mary is, in fact, the visiting faculty scheduled for that day.) Mary sees the 79 introduction, and then an overview of each of the laboratories in the department. She replays the clip of the Laboratory for Computational Intelligence (LCI) several times; this usage information is passed on by Valhalla's interface to the reasoner, which infers that Mary is more interested in AI research than other activities in the department (although there could be other explanations). Mary asks for another presentation (either before or after the current one runs to completion) and is then presented with more detailed footage about the LCI, as well as with interviews with key AI researchers in the department. This second presentation is shortened to ac-commodate Mary's optimal viewing time, as represented in the system's model of her. Both John's and Mary's presentations include clips about the Vancouver area, because it is considered by many to be very attractive. This kind of information can even be acquired automatically, by noticing, for instance, that out-of-town users tend to linger over scenic shots in the video presentations much more than do locals (who can just look out the window).2i Had they been assumed by Valhalla to be current, rather than prospective members of the department, John and Mary would not have been presented with this extra information. On the other hand, a user accessing Valhalla over the Internet might receive a presentation with even more emphasis on the local geography, on the assumption that they had never before been to Vancouver. I.e., the a priori probabilities of assumables can be upgraded according to well-known learning algorithms [201]. Chapter 4 Formal Background This chapter provides background material upon which the contribution of this dis-sertation is based. The subject of symbolic logic and default reasoning is introduced first, and then a particular formalism for hypothetical reasoning is pursued in Section 4.1.1, which is the basis from which the formalism of this thesis as well as the prototype implementation are later built. Decision theory has been advanced as a normative tool for design under uncer-tainty. It is briefly reviewed in Section 4.2. Decision theoretic approaches involve averaging over some or all models of the world to produce a "compromise' > design that maximizes expected utility. Speech Act Theory is introduced in Section 4.3. 4.1 Symbolic Logic and Default Reasoning Symbolic logic was intended originally as a language for unambiguously describ-ing mathematical entities, but with recent technological developments has come a growing interest in the use of logical systems for reasoning as well as for rep-resentation. Given a set of axioms or formulae which are true, a logic is a set of 80 81 syntactic rules for deriving new statements from existing statements. Statements that conform to the syntactic rules of the language are called well formed formulae or wffs. First-order logic is both a language for expressing knowledge and a means by which further statements can be derived. Assuming that the available knowledge is complete, consistent and monotonic, the derived statements can be regarded as true. Completeness with respect to a particular domain is the property that all facts needed to solve the problem at hand are present in the system or derivable from those that are. Consistency is the property that all the axioms are true and cannot lead to contradictions. The property of monotonicity holds when the addition of new facts is guaranteed not to lead to contradictions; the size of the knowledge base in terms of the number of statements in it can only grow. Non-monotonic reasoning systems are those which are designed to solve prob-lems and manipulate representations in which one or more of these properties do not necessarily hold. The inferences made in such systems are said to be defea-sible, because new information (observations about the world or environment, for example) may invalidate earlier conclusions, which may in turn have to be retracted. Research in the field of non-monotonic reasoning sometimes goes under the rubrics "default reasoning" or "hypothetical reasoning," and is broadly characterized by the common goal of achieving reasoning behavior in closer correspondence with intuition. Numerous formal approaches have been developed in support of making in -ferences in the absence of complete and reliable information (see, for instance, Brewka [32], Konolige [117], Geffner [81], Reiter [167], Etherington [67] and Kautz [109]). The contribution in this thesis is built upon the Theorist formalism developed by Poole [159]. 82 4 .1 .1 D e f a u l t - P r o g r a m m i n g w i t h T h e o r i s t To make the following presentation more precise, the simple hypothetical reason-ing framework of Theorist [162] is used. "Vani l la" Theorist is defined in terms of F, a set of closed formulae, called the "facts", and H, a set of (possibly open) formulae called "possible hypotheses," or assumables. The following definitions are relevant: Definition 1 (Scenario) A scenario is F U D where D is a set of ground instances of elements ofH such that F Li D is consistent. Definition 2 (Explanation) If g is a closed formula, an explanation of g is a scenario that implies g. Such a g is referred to here as an explanandum (the plural being, of course, explananda). Definition 3 (Extension) an extension is the set of logical consequences of a maximal (with respect to set inclusion) scenario. There's more than one way to use a hypothetical reasoning formalism. It can be used at least for prediction and for abduction,1 often in a single domain or problem. Theorist is particularly of interest because the same formal definition allows for both default and abductive reasoning [159]. It is also implemented; the examples provided in this thesis have been tested on a running version of the program. These different uses of Theorist can be characterized along two dimensions: • Status of Explananda, and • Status of Assumptions These two dimensions are the rows and columns, respectively, of Table 4.1, which is referred to henceforth as the Domain-Formulation Grid, reflecting its in -tended use as an aid in the formulation of problems and domains for Theorist and other formalisms for hypothetical reasoning. x T h e Encyclopaedia o f Philosophy defines abduction as: " C . S . Peirce's name for the type of reasoning that yields from a given set of facts an explanatory hypothesis for them" [64, p5-57]. The term "abduction" is used throughout this thesis in the formal sense of Csinger and Poole [61], which is consistent with Peirce's treatment; refer to Section 5.4 for details. 83 Status of Explananda The first dimension concerns whether the explanandum is known or not. This dis-tinction corresponds to a choice between the following: Abduction: The system regards the explanandum (the observation of the world or the design objective) as given, and needs to find an explanation for it. The idea is to find assumptions that imply the goal. We consider all explanations of the goal as possible descriptions of the world. Prediction: The system does not know if the explanandum is true, and the idea is to determine what can be predicted from the facts (the general knowledge and the observation or design objective). The issue is whether the explanandum is known to be true or whether it is some-thing that has to be determined. For instance, i f a reasoning agent knows (or has as defaults) that a —»• b and that a, the agent can predict b from its knowledge. A n -other agent who also knows that a —> b, as well as 6, might be able to assume in the absence of contradictory evidence that a. The first agent is using prediction, while the second agent is using abduction. One interesting difference between abduction and prediction is in the relevance of counter-arguments. For instance, when predicting g, it is important to know i f ->g can also be explained. In abduction, however, an explanation of ->g is irrele-vant [159]. Status of Assumptions Along the other dimension we can distinguish between the two types of tasks: Design tasks [74] are those in which the system can choose any hypotheses it wants. For example, a system can choose the components of the design in order to 84 Explanandum Known Abduction Unknown Prediction Who Design User Recognition Nature Table 4.1: Domain-Formulation fulfi l l its design objective, or choose utterances to make in order to achieve a discourse goal. A consistency check is used to rule out impossible designs. A l l other sets of components that fulfi l l the goal are possible, and the system can choose the "best design" to suit its goal. Design can be done abductively to try to hypothesize components in order to imply a design goal [74]. Alternatively, design can be done predictively to derive a design from goals and any hypotheses we care to choose. Recognition tasks are those in which the underlying reality is unknown, and all we can do is to guess at it based on the observations we make about it. This definition includes diagnosis, scene recognition and plan recognition. Recognition can also be performed abductively or predictively [160]. In an abductive framework, each explanation is a possible description of the world, while the disjunction of all explanations is the description of the world. In the predictive framework, an appealing strategy is to predict something only if it is explained from the observations even when an adversary chooses the hypotheses [159], which corresponds to membership in all extensions. 85 This distinction turns on whether the system is free to choose any hypothesis that it wants or whether it must try to "guess" some hypothesis that "nature" or an adversary has already chosen. For example, agents who know only that there is to be a meeting on the hour, sometime between 09:00 and 12:30, but not at 10:00, are able to make only disjunc-tive statements about when it wi l l take place; they recognize that the meeting is at 09:00 or 11:00 or 12:00. In contrast, the agent organizing (designing) the meeting is free to pick any time that is consistent with its own knowledge. The planning agent is free to choose that the meeting wi l l take place at 11:00. Note that these frameworks are different ways to use the same formal system for different purposes. In order to use the system, we have to choose one way to implement our domain. In general, there are not enough constraints in a domain to uniquely determine the approach that the reasoning system should take in formalizing its characteris-tics [160]. The 'causality' in the domain does not uniquely constrain its default-reasoning axiomatization. These choices are succinctly represented by the number of ways of situating the problem into the domain-formulation grid of Table 4.1 ? 4.1.2 Summary and Conclusions Formalisms for hypothetical reasoning can be used abductively or predictively. Theorist is one such formalism. Finding enough constraints in a domain to uniquely define its default axiomati-zation is not usually possible. Default implementations can be classified along (at least) two dimensions: the assumption and explananda status dimensions, which we have represented as the rows and columns of the domain-formulation grid shown 2 The grid merely summarizes some of the different possible uses of the hypothetical reasoning formalism; different problems/domains fall into different boxes, corresponding to different uses of the reasoning system. 86 in Table 4.1. The domain formulation task can be superficially regarded as one of finding how the domain fits into the grid's representation framework. 4.2 Decision Making Under Uncertainty Making decisions in the absence of a complete description of the world is a com-plicated task. Many arguments have been advanced to support the popular claim that decision theory yields optimal results (see, for instance, Savage [179]). 4.2.1 Bayesian Decision Theory Various models of decision making under uncertainty have been proposed with the goal of attaining an optimal decision, all of which embody the notion of maximiz-ing expected utility over a probability distribution of states of the world. Decision theory offers a kind of normative standard for decision making under uncertainty, and has been applied to design tasks under uncertainty. Some of this literature (see Cheeseman [40] for a discussion) argues that the best design is the one that results from averaging over all possible models; it wi l l be argued in Section 5.7.1 that classical decision theory is not the right approach for the intent-based authoring paradigm being advanced in this thesis. One model [179] consists of a set 5" of possible states of the world, a set O of possible observations, and a set of decision alternatives. A conditional proba-bility distribution P(o\s) describes how likely it is to observe o when the state of the world is s, and a prior probability distribution P(s) describes how likely the world is to be in state s. The utility function p,(d,s) represents the reward to the decision maker for selecting decision d G Q,d when the world is in state s G S. The general problem is to decide on a mapping from O to ttd which dictates the action to take for each observation; such a mapping is usually referred to as a policy. The expected utility Es induced by the policy S : O —>• D,d is defined by 87 ES= p(S(o),s)P(o\s)P(s) ses,oeo The principle of maximizing expected utility states that a rational decision maker chooses the policy 5' that satisfies E$i = maxsEs where the maximization is over all possible policies. The quantity maxsEs is called the optimal expected value of the decision problem. 4.2.2 Example A simple example follows. Let S = {rain,-'rain}, O — {wet^wet}. The conditional probability distribution P(0\S) and the prior probability of S are em-pirically measurable, but assume here that they are as follows: wet dry rain 0.8 0.2 no rain 0.1 0.9 Because we are in Vancouver, P(rain) = 0.9. Utilities might be as follows: rain no rain take umbrella 0 -10 don't take umbrella -100 0 The problem is to decide whether or not to bring the umbrella given the obser-vations. There are four possible policies: 88 Si(wet) = take, Si(dry) = ->take (4.1) 82(10 et) = -*take, 82(dry) = take 8z(wei) = take, 8i(dry) = take 84(wet) = ^take, 82(dry) — -<take Es1 = n(8i(wei), rain) * P(wet\rain) * P'(rain) + (4.2) fi(8\(dry),rain) * P(dry\rain) * P(rain) + fj,(8i(wet), -irain) * P(wet\-^rain) * P(->rain) + fi(8i(dry),~irain) * P(dry\^rain) * P(-<rain) = - 1 8 . 1 E$2 = /j,(82(wet), rain) * P(wet\rain) * P(rain) + (4.3) /j,(82(dry),rain) * P(dry\rain) * P(rain) + li(82(wet), ->rain) * P(wet\^rain) * P(->rain) + /j,(82(dry),-<rain) * P(dry\-*rain) * P(->rain) = - 7 2 . 9 Similarly, £ , $ 3 = — l . O a n d E ^ = —90. We would choose policy £1 over policy 82 because Esx > E$2, which corresponds to our intuitions. Interestingly, the decision maker prefers, under the given utilities and proba-bility distributions, the policy 83 of always taking the umbrella regardless of the observation; no observation can improve his outcome. This is because these uti l -ities reflect a strong aversion to getting wet, and only a small nuisance factor for 89 being unnecessarily encumbered with an umbrella on a sunny day. A different uti l -ity function would, of course, yield different decisions. The lesson here is that it is often difficult to operationalize our intuitions with meaningful utility values. 4.2.3 Decision Analysis The preceding formalization has considerable representational power, but at con-siderable computational cost for real-world problems. It can be used to select an optimal sequence of actions (a policy) from many possible sequences (policies). It takes the position that one cannot anticipate future observations, and must there-fore decide what to do by averaging over possible future observations. Such power is not always required. If the observables or some subset thereof were available to the decision maker before a policy needed to be formed, then using the available information could reduce computational requirements. We can write an expected utility expression that refers not to a policy, but to an action S(o) in the presence of observation o: s Elements of decision theory are used in this dissertation for the sensitivity anal-ysis described in Section 5.5. 4.3 Speech Acts Speech act theory [182] [14] distinguishes different categories of communicative acts. Searle [181] divided speech acts into five general classes, from which dif-ferent hierarchies have been developed (see, for instance, Bach and Harnish [15]). Searle identified: 90 1. Representatives: acts that make a statement about the world, and can be judged to have a truth value (e.g., inform, boast, deny) 2. Directives: acts that involve influencing another agent's intentions or behav-ior (e.g., request, beg, suggest, command) 3. Commissives: acts that commit the speaker to some intention or behavior (e.g., promise) 4. Expressives: acts that express the speaker's attitude toward something (e.g., apologize) 5. Declaratives: acts that explicitly involve language as part of their execution (e.g., quit, fire, marry) The authorial intentions referred to in this thesis are generally of the first type, but it is not hard to see how a system like the one described in this thesis could be used to encode communicative goals and perform communicative acts from other categories in Searle's hierarchy. Interesting work has been done to extend speech act theory to other modali -ties (e.g., Novitz [152] gives convincing arguments for the ability of pictures to accomplish the equivalent of illocutionary acts) and multimodal presentation sys-tems have made use of this work [5] [10] [11]. The notion of communicative intent espoused in this thesis is similar to any one of many alternative formulations of the speech act. In particular, the illocutionary act corresponds quite well with authorial intent: the understanding that the author is trying to convey. A n utterance is said to be successful or felicitous when it results in the appropriate, intended perlocutionary act, or effect. When the hearer of an utterance is convinced where the speaker's intention was to convince, the utterance is felicitous. 91 Humans make many assumptions about their interlocutors while communicat-ing with each other, and this assumption-making strategy has mapped well onto some formalisms for default reasoning [56]. When these assumptions are incor-rect, they can lead to perlocutionary failure and an infelicitous utterance. Much of the speech act literature refers to attempts at defining a meaningful taxonomy of these acts into requests, statements, indirect, direct, and so on. More recent work in automatic presentation [5] consider the generation of documents as a sequence of speech acts to achieve a more complex communicative goal. Some of these recent studies have also widened the scope of the inquiry to include non-linguistic "utterances" like graphics. Operationalization of Rhetorical Structure Theory by Moore and Paris [138] is another effort to build taxonomies that might prove useful in automatic generation systems [139], and Andre [6] has extended and adapted RST theory to suit the planning of multimodal presentations. The approach described in this dissertation is insensitive to the details of a par-ticular theory of speech acts, but is informed by these on-going efforts to structure communicative acts from underlying speech act components; see Section 5.3. Part III Contribution 92 93 Scenario: Back to School Steve and Peter are both considering going to grad school, and both of them are using a future, enhanced version of Valhalla, the application which is described in this thesis, to help them make their decision. As they are distinct individuals— and they are very different individuals—the system supports them in different ways. The intent of this scenario is not to imply that users can be characterized to any de-gree by their .newsrc files, or more generally, by their news reading habits, but that even in such data is potentially useful —and potentially abusable— information. Steve goes back to school 'Wow that my coop program is finished, let's see what grad school has to offer!" Steve relishes the idea of pursuing some of the questions his recent experiences have raised, and wants a new intellectual challenge. He glances at his watch. "Just enough time before that lecture on database theory. Great!" He logs into a machine on the University of British Columbia's computer network, starts up his favorite web browser, and asks to receive a presentation from Valhalla, the auto-mated on-line departmental hyperbrochure. Because Steve is signed on to a departmental machine with an account that identifies him as a visitor of Kellogg Booth, a professor in the department, Valhalla reasons that Steve is a visiting researcher in afield related to computer graphics or human-computer interaction, since these are the principal interests of his hosting professor. Steve watches the departmental introduction by its head, and pays attention to several clips on human- computer interaction before skipping over the next few clips on computer graphics research within the department. Valhalla re-designs the presentation to emphasize human- computer interaction research, but Steve grows restive and clicks on a button that reveals elements of the user model upon which Valhalla is predicating its designs. "Ah," breathes Steve, "It thinks I'm a visitor." Steve is driven to look at the display of the user model because he can't under-stand why Valhalla is showing him so much material on graphics and HCI, when in fact his interests lie in applied physics and numerical analysis. He sees from 94 the model display that Valhalla thinks he is a visiting researcher, and simply clicks on a radio button to inform the system that he is an undergraduate student. The system recalculates the best model and designs a new presentation that includes an overview of sporting and social events on campus. Peter goes back to school "I'm tired of my job", thought Peter; "I hate my boss, my life is going nowhere. Hmmm. I may as well go back to grad school." Since it's the mid-nineties, the most effective way to find up-to-date informa-tion on anything is to surf the net, and Peter fires up his favorite browser. Peter's browser incorporates the latest version of a user-modelling add-on developed at the University of British Columbia, It knows quite a bit about Peter and is able to tailor interaction to suit him. A variety of university names appear on the moni-tor, but of the thousands of educational institutions offering post-graduate studies, only those appear which offer courses in computer science, and which are also close to either skiing or surfing. "Let's see now... Grad school...Montreal: too cold in winter. MIT, nah. Stan-ford, too many earthquakes. UBC. Yeah, let's take a look at UBC... " He wishes to himself that he had a coffee right now, but can't quite muster the energy to go to the kitchen and make one. He has just enough time to stroke the stubble of his week's growth, before the home page for UBC's Department of Computer Science fills the display. Peter clicks on a hot-link promising a departmental overview. This future version of Valhalla tries to negotiate an exchange of user-modelling information with Peter's net browser program, which refuses because Peter has instructed all of his software agents never to divulge anything. Valhalla then asks for a copy of Peter's .new src file. "No harm in that, I guess," mumbles Peter, and releases his .newsrc with a click. Valhalla correlates the new information with a vast database of .newsrc files and derived user models in order to arrive at some reasonable hypotheses about Peter and his interests. Valhalla counts the number of active newsgroups and no-tices that Peter is up to date on at least fifty groups including alt.fan.monty-python, alt.rec.humor, alt.tv.simpsons, alt.jokes, and talkpolitics. Valhalla concludes that Peter has time to spare (Peter's version of Valhalla has no sense of humor and does 9 5 not know how to avail itself of this opportunity for cynicism), and revises its default assumptions about presentation time limits. Valhalla infers from the domain suf-fix of Peter's network address that he is not currently local to the Vancouver area, and also finds that interest in leisure activities correlates well with the active news-groups in his .newsrc file. There is no perceivable delay before Peter is shown a breath-taking video se-quence of professional skiers zooming through fantastic vistas, maps of the ski re-gions in the Vancouver area, and an introduction to some of its interesting hiking trails. Because Peter lingers over sequences about the bars on campus, Valhalla tries to test some alternative hypotheses that might explain his action at the inter-face. An overview of the department's beer brewing club prompts Peter to fast-forward over the material, but he rewinds and plays twice in its entirety a walk-through of night-life possibilities. Valhalla infers that Peter wants to party. "All right! Let's party!" says Peter out loud, actually relishing the prospect of graduate school. The presentation ends with a minute or two about the research facilities of the department, during which Peter begins to doze off. "Yeah, UBC. That's the one forme. Gosh. I hope my marks are good enough... " Chapter 5 User Models for Intent-based Authoring 5.1 Overview Following the brief motivation in Section 5.2, where the argument for Intent-based Authoring is recapped as the solution to problems with traditional authoring tech-niques, Section 5.3 introduces theoretical elements of the solution and their appli-cation. The approach, using probabilistic recognition and cost-based design, is advanc-ed in Section 5.4; this approach is an extension of the formalism described in Sec-tion 4.1.1. Users need to understand the behavior of the system on their own terms in ac-cordance with what is called in this thesis the scrutability1 desideratum, which the system supports with a sensitivity metric. Scrutability and the sensitivity metric are both described in Section 5.5. Section 5.7 considers some alternatives to the approach advanced herein. The 1scru-ta-ble (adj) [ L L scrutabilis searchable, fr. L scrutari to search, investigate, examine -more at S C R U T I N Y ] (1600) xapable of being deciphered: C O M P R E H E N S I B L E (Webster's 7th Dictionary, on-line copy). 96 97 use of decision theoretic techniques is considered in Section 5.7.1 for the domain of multimedia presentation design, and it is concluded that these "compromise" designs are inappropriate for interactive presentation environments. Section 5.6 is a detailed example of the operation of Valhalla, and Section 5.8 summarizes this Chapter. 5.2 Motivation Except for some of the work by Feiner [183], and the WIP system at D F K I [195] (see Section 3.5), the presence of the user/reader has little effect on the form and usually no effect on the content of the presentation produced by the system. This early commitment to form and content is a severe limitation; there is no reason in general to assume that a single document can serve the purposes of multiple read-ers, and certainly any single document wi l l be sub-optimal for some reader [104]. Authors have grown accustomed to having complete, dictatorial control over the form and content of their document product, and equally accustomed to the complete loss of control ensuing from the production process. Unti l now, we have not questioned this traditional model of authoring, in which authors as knowledge workers labor over their intellectual child until it is ready, finally to launch it into the world via a printing press or a C D R O M burner; then, like hopeful parents, they wait and watch to see if their book or multimedia product has the desired effect. This traditional model is so entrenched because, until now, there has been no alternative. Technological advances thus far have merely amplified the strengths and the weaknesses of existing approaches to authoring, and it is only recently that processors have become powerful enough for us to consider applying intelligence, rather than mere horsepower, via computation. The Intent-based authoring paradigm advanced in this thesis permits authors to send their work out into the world not as a rigid block of content, sealed forever 98 inside the cover of a book or the shrink-wrap of a C D R O M release, but as a body of knowledge framed by the author's point of view. Paul Saffo of the Institute of the Future calls this intellectual commodity context, and recognizes that its value surpasses that of mere content.2 Intent-based authoring makes explicit the point of view of the author, so that client-side computational mechanisms can be brought to bear at run-time to design a presentation of the material that conforms to the author's point of view, or intent, while meeting the needs of the individual viewer. 5.3 Components of the Theory Before content can be separated from intent, and the author's compile-time spec-ification task completely decoupled from the user's run-time viewing task, some critical knowledge bases must be developed. Some of these knowledge bases wi l l be extremely labor intensive to create, and the first intent-based presentation has taken years to prepare. The second one wi l l be able to build on the knowledge of the first, the third on the accumulated set of knowledge, and so on. At some point, preparing the n-th intent based presentation wi l l require less work from an author than it would to have prepared a traditional document. Similar to the considera-tions which motivate code re-use in software engineering, this economy of effort wi l l be an important factor in the success of the paradigm. The two key knowledge-based ingredients in the intent-based authoring the-ory are 1) representations of the author's compile-time communicative goals, or intent, and 2) the user's run-time information-seeking needs and goals. These are the components which mediate the new, abstract, extended interaction between au-thor and reader. The first is to be supplied (or selected from an existing set) by the author, the second is to be determined automatically at run time by the system and used in turn to determine the content of the presentation. As wi l l be shown 2 F r o m his keynote address at the W.R.I.T.E. conference in Vancouver, Canada, 16 June 1995. 9 9 throughout this chapter, the author intent and the user model both serve as con-straints on the application of presentation schemata, which are rules that encode notions of stylistic coherence, and reliable organizational principles for effective communication. A simple example of a presentation schema is that a presentation should have a beginning, a middle and an end, and that the beginning should in -troduce, the ending summarize, the material to be found in the middle. In addition to representations of authorial intent, user model, and the presenta-tion schemata, other knowledge bases are also consulted by the system at run-time to prepare the presentation. These knowledge bases may be contributed by the au-thor, the viewer, an annotator, or a knowledge engineer; in the near future, intel-ligent agents may scour the Internet for additional knowledge that could be used in unanticipated ways. Each of these knowledge-based components is considered now in some more detail. Authorial Intent In declaring explicitly his intention, an author can license a presentation system to prepare a presentation that wi l l meet unanticipated run-time contingencies. Even in the absence of the author, a presentation can be made au-tomatically, in accordance with the author's intention, taken by the system to be a specification of the presentation. A n intent is analogous to a speech act (see Section 4.3 for some background). It is a complex communicative goal, arbitrary up to the limits of expressiveness of the language in which it is articulated. The language used here is described in Section 4.1.1 and Section 5.4. A simple example of an intent is "Show the user a presentation about the De-partment of Computer Science, but don't bore the user with irrelevant material and try to accommodate the presentation to the amount of time the user has available. Try to impress the user with the department." This intent is broad, and quite gen-eral. It might be expressed in the underlying representation language by writing a 100 rule called show and telling the system to apply that rule at run-time; assuming that the knowledge base already contains one or more ways to satisfy the rule show, different presentation schemata would compete for dominance until the best pre-sentation is derived for the most likely user model. These processes are described at length in this chapter. User Models Models [115] [196] of the readers or viewers of presentations are also needed to overcome traditional limitations and move into intent-based author-ing. A reader model can be considered to be a representation of the reader's at-tributes relating to his or her information-seeking needs and objectives in consult-ing the system. A reader model consists of a set of hypotheses about that reader, ac-quired at run-time by making observations of the user's interaction at the interface.3 The application of rules from the knowledge bases to derive the eventual presen-tation, must be consistent in a logical sense with the elements of the model. The representational range of the models may vary between domains; different domains wi l l require the representation of different kinds of information. A knowl -edge engineer determines a priori this representation range, by specifying a number of dimensions of analysis along which user attributes can be assigned. The values that these attributes can take are predefined, and are called assumables, or potential hypotheses, and they collectively comprise an ontology for possible user models. As developed in later sections of this chapter, a user model in our theory is a set of assumptions drawn from the set of assumables. The assumables establish the representational range of the possible models, and so should be crafted with their eventual usage in mind. For instance, in the system that has been implemented, the domain of discourse is the Department of Computer Science, and potential users fall into the pre-determined classes of Faculty, Student, or Staff, along the dimen-3 O n c e acquired, individual models may also be stored and retrieved where appropriate; al-though these issues are beyond the scope of this thesis, see Section 7.2 and Section 7.2.2 for some discussion about future work, and privacy issues, respectively. 101 sion of User Type; they are also evaluated on whether they are visiting or local, and whether they are male or female. These and other sets of assumables affect the design of the eventual presentation as described in this chapter. A user modelling system may attribute to a user assumptions at different levels of abstraction; while watching a video of a dinner being prepared by a chef, for instance, if the viewer is also ex hypothesis a chef, the system might assume that the user believes that chicken marinara is being prepared. If the viewer is a typical North American fast food junkie, the system may attribute the belief that the dish involves chicken and some sort of sauce. How these user models are acquired by the system is discussed in this section. Acquisition: Models in our approach are acquired both explicitly and i m -plicitly through an abductive reasoning framework described later in detail. The system makes observations of the user's interaction and tries to explain these ob-servations by making hypothetical attributions of user status on one or more rel-evant dimensions of evaluation. Explicit acquisition takes place when observing the interaction of the user with a description of the user model. Implicit acquisi-tion takes place when observing the interaction of the user with the presentation. The difference lies only in that during explicit acquisition, the user is made aware of the model and is actually called upon to refer to and to manipulate it, while dur-ing implicit acquisition, the user need not be aware of the model at al l , nor even of the fact that modelling is taking place. Explicit acquisition has the advantage of reliability, but it is intrusive, and can distract the user from the task; implicit acqui-sition has the advantage of unobtrusiveness, but suffers from potential inaccuracy. A combination of these techniques is used in the system, with a view to having the best of both. See Chapter 2 for a discussion of the difference between implicit and explicit acquisition of user models. 102 Other Knowledge A rule base of facts and assumables (as described in Sec-tion 4.1.1) is supplied by a knowledge engineer. Part of the database is world knowledge that the system uses in domain de-pendent ("Faculty members don't take courses, typically," etc.) and independent ("There are 24 hours in a day," "Men are mortal," etc.) ways, and there can be knowledge about the characteristics of the media involved ("Don't play the audio track when video is played at less than half-normal speed," "Use still-frame tech-niques when showing a video clip of less than one second duration," etc.) Cate-gorical, contingent, and hypothetical statements are expressible in the language. Presentation plans are the bulk of the knowledge encoded. These are variously elaborated rules for delivering information; a convince plan might present exam-ples as evidence in support of a conclusion; stylistic or cultural factors might gov-ern whether the evidence precedes the conclusion (prefix) or comes after it (post-fix). A schema to describe something might be implemented as a rule which says: "Describe a Thing by Describing its Parts." A n intent-based author might invoke a schema that has been defined to not offend the viewer; such a schema might consult a database of cultural sensibilities, and tailor the presentation to suit the hypothe-sized cultural vagaries of the viewer. (In some cultures, for instance, the viewer might be offended by the tone of voice, the style of address, the dress or even the gender of a speaker; the system can choose the appropriate design element at run-time to suit these constraints. Such just-in-time choices were not possible under the traditional model of authoring.) Presentation schema are instantiated finally with actual content: text, video clips, audio, etc., selected by the application of rules during a proof process de-scribed later. In order to select these content elements, the system must be able to reason about the content. For this reason, another important part of the database is devoted to meta-descriptions of the content from which a presentation is to be made. In the case of the system implemented for this thesis, the content is a co l -103 lection of video clips, a selection of which is assembled by the system at run-time into a presentation tailored to meet the needs of the individual user as represented in the hypothesized user model. To be able to reason about the contents of the video, the system requires a description of its contents, which we call annotations. The annotations l ink keyword descriptions of the contents of specific clips with the as-sociated time codes on the media (video tape, laser disk, digital video streams, etc.) For example, an annotation might carry the information: A n interview with Mar ia Klawe containing overview material is to be found between time codes 1:12:14 and 1:15:01 on the stream called "Department". Figure 5.1 is a tree diagram of a presentation that has been designed to con-vince someone to join the faculty in the Department of Computer Science at U B C . It consists of the application of both the Convince and the Describe schemas just mentioned. The presentation has a description, followed by a conclusion; the de-scription of the department is instantiated with an intro by the department head on video, followed by descriptions of parts of the department (laboratories), which in turn are instantiated by video clips of interviews with representatives from the two labs that were judged by the system to be of most interest to the viewer, and scenes from the labs. The conclusion consists of descriptions judged by the system to be of general appeal; rules have been encoded to assert that it is generally believed that the Vancouver area is very scenic, and that this is a quality that can be used to convince people, so scenic shots of the area are presented. This example is taken from the working version of the system. Later examples in this thesis w i l l show that these scenic overviews are provided only to viewers assumed by the system to be not from the Vancouver area. Finally, individual pieces of the video record must be chosen to f i l l the slots in the now elaborated plan schema; the leaf nodes in the tree representing the plan schema are to be expanded. In Figure 5.1, the logical form Describe(lci) is instan-tiated with the video clips labelled interview(alan-mackworth) and tour-lci, which 104 Convince(join-ubc-faculty) Describe(department) edit-list([(00:01:26, 00:02:27, head-klawe-intro)]) Describe(laboratories) Describe(lci) edit-list([(00:43:41, 00:44:23, interview(alan-mackworth)), (00:12:15, 00:15:54, tour-Ici)]) Describe(imager-lab) edit-list([(00:45:08, 00:46:16, interview(kelly-booth)), (00:03:06, 00:03:13, rendered-dragon-speaks)]) Conclude(general-appeal) Describe(vancouver) edit-list([(00:20:15, 00:20:20, cypress-mountain)]) Describe(campus) edit-list([(00:25:07, 00:26:59, aerial-view-campus), (00:27:43, 00:27:44, faculty-club)]) Figure 5.1: A partially elaborated presentation are to be found on the video tape between absolute time codes 00:43:41-00:44:23 and 00:12:15-00:15:54, respectively. Some part of the database might one day consist of a collection of intentions from which intent-based authors can select, i f they don't want to specify their own intentions (or don't know how, because they aren't programmers...). Intent-based authors could compose the intentions from the database into higher level inten-tions. For instance, i f there already exists a representation of an intention to con-vince as well as a representation of an intention to amuse, an intent-based author might conjoin these to amuse and convince the reader. Only the hint of this future facility is currently available in the system, which still requires that intent-based authors have a strong ability to program their intentions in the underlying repre-sentation language.4 4 A visual editor could also be added, that allows authors to compose intentions through the manipulation of a graphical interface. 105 Roles Inputs Process Outputs Presentation (video edit decision list) Display of User Model Figure 5.2: Roles, Inputs and Outputs 5.3.1 Summary: Inputs, Outputs and Roles The outputs of the system are (1) an edit decision list of video clips which is played under user control on a video display device, and (2) a presentation to the user of parts of the user model derived by the system. Refer to Figure 5.2: Author(s) supply or select intentional descriptions of their communicative goals. K n o w l -edge engineers provide general and specific knowledge, as well as the assumables for model recognition and presentation design. The system calculates the most l ikely user model from observations of the user's activity and uses that to design the "best" presentation. Both the presentation and components of the user model are displayed to the user. Several agents are involved in the Intent-based video authoring model: They are: 106 1) Annotator: the person or artifact who indexes relevant events and intervals in the raw footage 2) Author: the person or artifact who specifies the intent (and optionally ele-ments of the content and form) of the eventual presentation 3) Viewer: the eventual consumer of the presentation 4) Knowledge Engineer: the person or artifact who prepares the knowledge bases, particularly the domain-dependent knowledge base Note that more than one of these roles can be played by a single agent. 5.4 An Abductive Framework for Recognition and Design The language of Theorist introduced in Section 4.1.1 is now extended to include probabilities and costs, and a notion of explanation is presented that reflects a new combination of design and recognition within a single formal framework. The set % of assumables is partitioned into the set 71 of those available for recognition and the set V available for design. Each assumable r in 7c has as-sociated with it a prior probability 0 < P(r) < 1. R is partitioned into dis-joint and covering sets J , which correspond to independent random variables (as in Poole [161]).5 Every assumable d in V is assigned a positive cost p,(d). Representative recognition assumables found in the database include the ones shown in Figure 5.3, where Pn is the prior probability associated with actions of the corresponding classes, £ Pi — 1.0, and the syntax in use is: dis]o'mt([assumption\ : Pi, assumption : Pi, • • •]) Some examples of facts and rules that appear in the database are shown in F ig -ure 5.4. 5 The description here of the recognition process conforms to the discussion of probabilistic Horn abduction provided by Poole [161]. 107 % For recognition of the user model: % faculty/student/staff user disjoint([userType(faculty) :0 . 35, user Type (student) -.0.6, userType(staff) :0 . 05]) . % local/prospective user disjoint([geo(local) :0 . 65, geo(prospective) :0.35]) . Figure 5.3: Representative recognition assumables 5.4.1 Recognizing User Models Observations are made of the actions of the user at an interface. The interface agent communicates this information to the reasoner, which tries to explain 6 the obser-vations, making recognition assumptions along the way. User actions can be of two major types: the user can interact with a control device to manipulate the pre-sentation elements directly (a virtual V C R control panel, for instance, by which the presentation can be replayed, paused, fast-forwarded, etc.), or the user can interact with a representation of the system's model of the user. When the system makes assumptions to explain its observations of the user's behavior at the control panel, we call the recognition process " implicit" acquisition; when the system makes as-sumptions to explain its observations of the user's behavior at the user model win -dow, we call the recognition process "explicit ." 7 Typically, the user model window does not contain the entire user model (as there may be very many assumptions in the user model), but only a salient subset of it, determined by sensitivity analysis (see Section 5.5). We call this subset the salient model. 6Explain is used here in its technical sense, as described in Section 4.1.1. 7 T h e current version of the prototype implements only the explicit approach. 108 % Categorical world knowledge: gender(george_phillips, male). % George i s a male gender(maria_klawe, female). % Maria i s a female partof(cpu, computer). % a cpu i s part of a computer % Annotations: % Maria speaks from 1:26 to 2:26 on the video record: interval(00:01:26, 00:02:26, speaker(maria_klawe) , [ ] ) . % George i s interviewed between 30:57 and 31:19 interval(00:30:57, 00:31:19, interview(george_phillips) , [ ] ) . % There i s a video-only (no-relevant audio track) a e r i a l view % UBC between 25:07 and 26:59 interval(00:25:07, 00:26:59, video_only(ubc_aerial_view) , [ ] ) . % schemata % A video c l i p (or c l i p s ) can be a description of a Thing describe(Thing, Description) <= editList([],Description,description(Thing),0,_L). % a BigThing can be described by describing i t s parts describe(BigThing, Description) <= bagof(Thing, subsumption(Thing, BigThing), Things), desc(Things, [], Description, 0, _Length). desc([], Description, Description, Length, Length) <= true. desc([H|T], InD, Description, InL, Length) <= editList(InD, OutD, description(H), InL, OutLength), desc(T, OutD, Description, OutLength, Length). Figure 5.4: Example facts and rules 109 Example: Perceptual Salience Here is an example that involves explicit acquisition of parts of a user model. When the salient model is displayed to the user, the user's action at the inter-face may lead to observations which in turn lead the system to calculate a new and different model. For instance, i f the user informs the system that the user is a fac-ulty member rather than a student, as had been assumed, the system may reliably retract its initial assumption and assume the corrected value offered by the user (if the user is now telling the truth). Perhaps more interestingly, even the user's inaction may result in changes. For instance, i f the system displays to the user the assumption that the user is a faculty member, and the user does not critique the assumption, it makes sense for the system to re-evaluate the likelihood of the model that includes the assump-tion in question, ostensibly to arrive at a higher ranking for it, under the additional assumptions that the display of the user model has been seen and understood by the user. These additional assumptions are reasonable if: the window in which the assumption is displayed is not obscured, the text in the window is clearly rendered and is large enough to be easily read, the user is not distracted by other events on the desktop, the user is not distracted by other events in the environment (babies crying, cars coll iding, etc.), and so on. The case where a user critiques assumption A but does not critique assumption B is of particular interest. The likelihood of the model can be increased on the basis of the user's direct action as discussed in the previous paragraph, but should also be increased because of the user's inaction with respect to the display of assumption B, on the grounds that the user was actually attending to the appropriate window (because the user did critique assumption A). Although the current prototype merely orders the displayed assumptions ac-cording to a calculated sensitivity value, various other display strategies can be used to draw the user's attention to one or more of the displayed assumptions, 110 thereby licensing the additional assumption that the user has attended to the win -dow in which the user model is being displayed. A particular assumption can be highlighted (with color, font, or animation, for instance) to increase the likelihood that the user is attending to the relevant portion of the desktop, and to the relevant part of the window. Such perceptually salient display techniques can increase the reliability of the system's assumptions about the user, but require that the system model such things as the media capabilities of the user's display and interaction de-vices. The system would also benefit from knowing whether the window in ques-tion is partially or totally obscured by other desktop objects, which would require some degree of integration between the system and the operating system or win -dow manager. A deep analysis of such desiderata for future operating systems re-mains to be conducted, but are mentioned here because they are compatible with the interaction paradigm that is described in this thesis, and can be represented in the reasoning framework. Empirical investigations are needed to decide which are the most effective display techniques. Another way to look at the issue of perceptual salience is to note that the system is engaged in a dialog with the user, and that the presentation of the user model window is a communicative act by the system. The perlocutionary effect [182] of these acts is knowledge, on the part of the user, of the system's user model. The intuition is that the likelihood of achieving this communicative goal is enhanced by using appropriate presentation techniques to highlight the most important parts of the message, and that the use of such techniques licenses an increased commitment by the system to the beliefs which follow from successful communication. The user's action at the interface is just another observation available to the reasoning system, and the state of the display is known as a fact, since it is under control of the system. The system explains these observations by (hypothetically) attributing to the user membership in one of eight interaction classes. The system calculates the model incrementally, yielding a final model with a I l l % Rules for perceptual salience calculations % action(R, V) i s true when user takes action V % with respect to random variable R. % radio(R, V) i s true when the displayed value of random % variable R (via a graphical radio button, e.g.,) i s V. % val(R, V) i s true when the value of random variable R i s V. % rules when displayed value i s correct action(R, V) <= % rule for e x p l i c i t v a l i d a t i o n radio(R, V), val(R, V), ev(R). action(R, A) <= % rule for e x p l i c i t l i e radio(R, V), v a l ( R , V), el(R), A \== V. % this i s a l i e action(R, none) <= % rule for t a c i t v a l i d a t i o n radio(R, V), val(R, V), tv(R). action(R, other) <= % rule for i m p l i c i t v a l i d a t i o n radio(R, V), val(R, V), iv(R). % rules when displayed value i s not correct action(R, V) <= % rule for e x p l i c i t correction radio(R, D), val(R, V), ec(R), D \== V. action(R, A) <= % rule for weird l i e radio(R, D), val(R, D \== V, V \== A. V), wl(R), action(R, none) <= radio(R, D) , val(R, D \== V. % rule for V), t l ( R ) , t a c i t l i e action(R, other) <= radio(R, D) , val(R, D \== V. % rule for V), i l ( R ) , i m p l i c i t l i e Figure 5.5: Rules for perceptual salience 112 cumulative probability influenced by the following rules and assumables; there is one rule for each of eight interaction classes (see Figure 5.5), of the form: action(Var, Action) display(Var, Display) A value(Var, Value) A class(Var) where Var is the name of the random variable under consideration, Action is the ac-tual action taken by the user, and Display is the description of the display. class(Var) is a conjunct that can be satisfied only via assumption, forcing the cumulative prob-ability of the current model to be multiplied by the prior probability of membership in the interaction class. Other conjuncts are included only to ensure that the eight rules are exclusive. Users can explicitly validate the contents of the display, with respect to a par-ticular random variable; a faculty member can do this, for instance, with respect to the "user-type" random variable, by cl icking on the already highlighted "faculty" radio button (see Figure 5.6). Users can tacitly validate the contents of the display by performing some other action, such as requesting that the video presentation be made without further de-lay. Users can implicitly validate the display with respect to one random variable, by performing an action at the interface with respect to some other random vari-able; a faculty member might implicitly validate the assumption of the system that the user is a faculty member (represented to the user by the highlighted "faculty" radio button), for instance, by cl icking on a button that expresses gender or age, or geography. Users can be explicitly lying, by cl icking, for instance, on the "student" button when they are in fact faculty members. 113 Each item below represents an important assumption that the system has maile about yuu. Correctiiiy Uieassumptu «s hehaviour of the system and tiie nature of your presentations. C ijsefType'i'acuityi C userType(stud8rit3 C userTvpefstafl) ei(maie) (~ gsnaeijfemals) BC<IOC*II C geoi'prospectivi?) Figure 5.6: The Valhalla User Model Window A tacit lie is when the user takes no action to correct a false assumption by the system. (The user could obviously be missing the cues in the display by accident; the names attached to the eight interaction classes are purely syntactic, and do not necessarily indicate an intent on the part of the user to deceive the system. The names merely make it easier to remember the different categories.) A n implicit lie is when the user takes some action with respect to some random variable, but ignores an incorrect assumption by the system. Explicit correction is the normative action taken by a user when he or she dis-covers an incorrect assumption by the system; thus the faculty member may click on the "faculty" radio button when the system has incorrectly assumed that the user is a student. The normativity of this action is represented by the high prior proba-bility associated with it. Finally, the weird lie category captures the action with respect to a particular random variable, where the system has made an incorrect assumption, the user takes action to change this value, but changes it not to the correct value but to an-other incorrect value. The faculty member, for instance, upon perceiving that the 114 % assumables for perceptual salience calculations disjoint([ev(R) :0 .15, % e x p l i c i t v a l i d a t i o n tv(R):0 . 4 , % t a c i t v a l i d a t i o n iv(R):0 . 4 , % i m p l i c i t v a l i d a t i o n el(R):0.05]). % e x p l i c i t l i e disjoi n t ( [ t l ( R ) : 0 . 0 5 , % t a c i t l i e % i m p l i c i t l i e % e x p l i c i t correction % weird l i e il(R):0.1, ec(R):0.8, wl(R):0.05]) Figure 5.7: Assumables for perceptual salience calculations system has made the incorrect assumption that the user is a student, might then click on the "staff" radio button to indicate to the system that he or she is a staff member rather than a faculty member or a student. The code fragment in Figure 5.7 shows the assumables which represent the eight possible classes to which user interaction can belong. The syntax in use again here is disjoint{[classi{Var) : Pi,class2(Var) : P2, • • •]), where Pn is the prior probability associated with actions of the corresponding classes. Some of the more cumbersome detail and irrelevant syntactic sugar has been omitted from the preceding, but the point should be clear. When the user changes a value displayed by the system, the assumption by the system that the user's action is an explicit correction, for instance, is preferred over the assumption that the user is explicitly lying to the system; the calculation of the final user model is influenced (as usual) by these incremental assumptions. Perceptual salience is a potentially strong source of knowledge upon which to predicate further reasoning, and wi l l be the subject of much future work. Any system that makes presentations to users can benefit from modelling these interactions at some level of abstraction, even at a very basic level. 115 The probabilities, of course, should be determined through some empirical ap-proach; the values used here are plausible, and serve to order models in which the user has manipulated the user model window in some way. Actual values in a fu-ture system would be determined by observing the behavior of users during testing sessions designed for that purpose. As an example of perceptual salience in the system, let the facts F consist of the rules shown in courier font in this chapter, and let the set of potential hypothe-ses H (the assumables) consist of the disjoint sets shown throughout this thesis. The initial, "empty" set of observations stems from the intent that underlies the presentation; in this case, the "show" predicate (see Figure 5.8) encapsulates the intent, and requires that certain assumptions be made. This is equivalent to saying that the initial set of observations Obsi is {3uTUserType(UT) A 3ogender(G) A 3wgeo(W)}. Assume now that 1) the system has arrived at the model consisting of the as-sumptions that the user is a student, male, and local to the department, (i.e., that UT = student, G = male, W = local) 2) the system has represented this model to the user via the user model window, and 3) the user has explicitly corrected the model by cl icking on the Faculty button. The new observations Obs^ to be ex-plained are: {action(userType, faculty), action(geo, other), action(gender, oth Referring to the rules, the observations Obs now to be explained are: Obs = 065/ U ObsN. The probability of the model before the user takes action is Px = Prudent * Pfnale * ^/oca/- After the user takes action, the following possibilities exist with respect to the user type variable: The user is a student, and is explicitly lying about being a faculty member; the user is a faculty member, and has explicitly corrected the system's error; the user is a staff member and is lying weirdly about being a faculty member. With respect to the locality variable, the following conditions exist: The user 116 is local, and has implicitly validated the system's assumption; the user is remote, and has implicitly lied by ignoring the system's false assumption. With respect to the gender variable, the following conditions exist: The user is male, and has implicitly validated the system's assumption; the user is female, and has implicitly lied by ignoring the system's false assumption. In the absence of other information or observations, the most likely model is the one in which the user is a faculty member, and has explicitly corrected the system's misattribution (and implicitly validated the system's locality and gender assump-tions). The probability of this model is Pfacidty * P t o c a i * Pmaie * 0.8 * 0.4 * 0.4. Here are all the possible explanations (user models) of the observations Obs:s Recognition assumptions with Probability 0 . 0 2 4 7 5 2 are: iv(gender) gender(male) iv(geo) geo(local) ec(userType) userType(faculty) Recognition assumptions with Probability 0 . 0 0 3 3 3 2 are: iv(gender) gender(male) il(geo) geo(prospective) ec(userType) userType(faculty) Recognition assumptions with Probability 0 . 0 0 1 0 9 2 are: il(gender) gender(female) iv(geo) geo(local) ec(userType) userType(faculty) Recognition assumptions with Probability 0 . 0 0 0 5 7 2 are: iv(gender) gender(male) iv(geo) geo(local) el(userType) userType(student) Recognition assumptions with Probability 0 . 0 0 0 1 4 7 are: il(gender) gender(female) The width o f the probability band was set to 0.000005 (see Section 5.4.2). il(geo) geo(prospective) ec(userType) userType(faculty) Recognition assumptions with Probability 0.000117 are il(gender) gender(female) iv(geo) geo(local) el(userType) userType(student) Recognition assumptions with Probability 0.000077 are iv(gender) gender(male) il(geo) geo(prospective) el(userType) userType(student) Recognition assumptions with Probability 0.000022 are iv(gender) gender(male) iv(geo) geo(local) wl(userType) userType(staff) Recognition assumptions with Probability 0.000016 are il(gender) gender(female) il(geo) geo(prospective) el(userType) userType(student) Recognition assumptions with Probability 0.000016 are il(gender) gender(female) iv(geo) geo(local) wl(userType) userType(staff) Recognition assumptions with Probability 0.000003 are iv(gender) gender(male) il(geo) geo(prospective) wl (userType) userType(staff) Recognition assumptions with Probability 0.000002 are il(gender) gender(female) il(geo) geo(prospective) wl(userType) userType(staff) 118 If other evidence suggests that the user is, for instance, a student (e.g., the user's id is known already to the system to belong to a student), the system may end up calculating a new, "best" model which includes a weird lie or tacit lie, etc. Note that the system reports the probability of a model M as a non-normalized prior P(M); this is adequate for the system's purposes because the value is used only to order models. In order to obtain a normalized posterior P(M\Obs), the system would have to calculate all models which explain the observations, to arrive at the sum P(obs) as seen in Equation 5.1. P,M\Ob.\ = P ' M A ° f a ) = I^L = P < M > (5 1) 1 1 1 P(Obs) P(Obs) £ , P ( M . ) P ' " This would place an impossible burden on the reasoner, which is currently asked to calculate only the best model, and not all models. Nonetheless, for purposes of illustration, the normalized posteriors for the current example are provided in Ta-ble 5.1; the value of P(Obs) is 0.030148. The best explanation is seen from this table to account for over 8 0 % of the probability mass. P(M) P(M\Obs) -\og(P(M\Obs)) 0.024752 0.821016 0.197212 0.003332 0.110521 2.202546 0.001092 0.036221 3.318108 0.000572 0.018973 3.964735 Table 5.1: Priors and normalized posteriors. User M o d e l : formal definition Formally, a model is defined as follows: Definition 4 — M o d e l : A model of the user is an explanation R con-sisting only of recognition assumptions which explain observations 119 Obs about the user: R c 71, FUR ]/= ®,FUR (= 06s . The probabil-ity of a user model is the product of the probabilities of its elements: P(R) = TlreR P(r): where we have assumed independence of recog-nition partitions [161]. The 'best' model is the one with the highest probability. 5.4.2 D e s i g n i n g P resenta t ions A single abductive reasoning engine is employed for both recognition of the user model, and for design of the presentation. Design and recognition are interleaved, in the sense that the rule being applied by the reasoner could call at any point for the assumption of either a design or a recognition assumable; a partial model and a partial design are accumulated until either the proof is complete, or it fails. Various design decisions are made by the system in the course of reasoning. Just as models are defined by their constituent recognition assumptions and for-mally explain observations about the user, "designs" are defined by their constituent design assumptions, drawn from the set of design assumables, and formally ex-plain the authorial intent. A n example is the use of design assumptions to induce a preference by the sys-tem for multiple topics rather than a single presentation topic. This preference can be induced by forcing the system to make an assumption whose cost depends upon the relationship between elements in the presentation. Specifically, the cost of the assumption is greater when the two elements are selected in support of a single topic, than when they support multiple topics; the system currently values diver-sity over emphasis, because of the domain in which it is being used: Valhalla tries to provide interesting overviews of the department rather than in-depth detail. In conjunction with the show intent described earlier and shown in its entirety in F i g -ure 5.8, the following design assumables have the desired effect. 120 d i s j o i n t ( [ d i f f e r e n t _ t o p i c _ c o s t : 0 , s a m e _ t o p i c _ c o s t : 1 0 0 ] ) . Another example is how the system arbitrates between showing or not showing scenic shots of Vancouver to users who are local or remote. Here is how a prefer-ence for showing scenic views to prospective (remote) department members can be encoded. The following rule licenses the showing of a scenic clip for the "right" reasons (i.e., the user is thought by the system to be prospective): scenic_clip(prospective, Pin, Pout, Lenin, LenOut) <= e d i t L i s t ( P i n , Pout, mountains, Lenin, LenOut), Pin \== Pout. % make sure mountain c l i p s are non-nil Pin is the presentation thus far designed, Pout contains the new mountain clip. The editList predicate makes relevance assumptions while trying to instan-tiate a clip (or sequence of clips) about the requested subject, in this case moun-tains. Lenin and LenOut are, respectively, the length of the presentation before and after the addition of the scenic clip. The following rule licenses the omission of the scenic clip for the "right" rea-sons (i.e., the user is thought by the system not to be prospective): scenic_clip(Geo, P, P, L, L) <= % no mountains here Geo \== prospective. These two rules capture what are in some sense the "right" actions for the sys-tem to take; it is also possible for the system to take other design actions, but the "right" ones should be preferred. This is accomplished by forcing the system to make (relatively costly) design assumptions in order to take these alternative branches of the search tree. The following rule models the cost of showing a scenic clip for the "wrong" reasons. It forces the assumption of bigcostassumption, at a cost of 200; this is the 121 "nuisance factor" or cost that the system attributes to local users who are shown scenic views they could get by just looking out their windows: scenic_clip(Geo, Pin, Pout, Lenin, LenOut) <= % extra cost of mountain c l i p to [prospective boredByView (200 ) , Geo \== prospective, e d i t L i s t ( P i n , Pout, mountains, Lenin, LenOut), Pin \== Pout. % make sure mountain c l i p s are non-nil Similarly, the next code fragment models the cost of omitting the clip for users who are not local, and who might have benefitted from seeing some nice scenery. scenic_clip(prospective, P, P, L, L) <= % extra cost of no mountains for prospective types boredbyNoView(150) . The mechanics for inducing the appropriate design costs are included in the last code fragment, only for completeness: d i s j o i n t ( [ b o r e d b y V i e w ( X ) : X , b o r e d b y N o V i e w ( X ) : X ] ) . It would not be difficult to generate such rules automatically from a table of preferences, such as the one shown in Table 5.2. Scenic -> Scenic geo(local) 200 0 geo(prospective) 0 150 Table 5.2: M y o p i c T r a d e o f f T a b l e We call Table 5.2 the Myopic Tradeoff Table. It captures the formulation of preference alluded to in this section; the term myopic is used to emphasize that the system does not have a.complete table of utilities, but uses this approach as a 122 "greedy" discriminator between competing design elements. Such myopic strate-gies often work well in practice [176, p490]. The table can be constructed for dis-joint sets of any cardinality, and for any number of design components; here we present only the myopic tradeoff table for the disjoint set {(geo(local),p), (geo(prospective), 1 — p)} and the presentation element Scenic. The cost of showing a scenic clip to a local user (who can look out the window at the mountains any time) is 200; the cost of failing to show such a view to a remote user is 150; there is no cost associated with showing the scenic view to a remote user, or with omitting the scenic view for local users. These costs influence the final proof in the usual sense that lower cost designs are preferred to higher. The overall weight of an assumption, as always, is propor-tional to its magnitude; if a dimension is very important in the current domain, the cost magnitudes of its assumptions should be chosen to be greater than the costs of assumptions on other dimensions. This suggests that the costs of design assump-tions should not be chosen in isolation, but according to some ranking of relative importance. Such a ranking is not likely to be known a priori but w i l l more likely emerge from iterative refinement of the knowledge base, as has been the case with the implementation under review here. Search Strategy The prototype referred to in this thesis employs a Prolog meta-interpreter which implements an iterative deepening search strategy wherein first the probability bound on models and then the cost bound on designs is adjusted to yield solutions in desired probability and cost ranges, or bands. The width of these bands can be adjusted to yield desired precision, trading execution time for precision. In classical iterative deepening search [172], the tree is searched depth-first one 123 level deeper on each iteration. The approach combines the space utilization ad-vantages of depth-first search with the characteristic of breadth-first search that, i f there is a solution, it wi l l be found; in addition, because shorter paths are searched before longer ones, i f there are multiple solutions they wi l l be found in ascending order. The nodes in the search tree of the application are connected by arcs which can be labelled with the probability factor or cost increment incurred in taking that branch. The cumulative cost of a partial proof is measured in terms of the proba-bility of the partial user model Mp and the cost of the partial design Dp. The pair (P(MP),C(DP)) characterizes the partial proof currently being evaluated by the system: P(MP) is given by the product of all probability terms on the path through the search tree to the current node (i.e., the product of the probabilities of all recog-nition assumptions made thus far), C(DP) is given by the sum of all cost terms on the path through the search tree to the current node (i.e., the sum of the costs of all design assumptions made thus far). On the first iteration, the system searches for low cost designs and models with high probability; this is accomplished by setting the probability band to be between 1.0 and (1.0 — p), where p is the width of the probability band, and setting the cost band to be between 0 and c, where c is the width of the cost band. Any partial proofs being considered whose cumulative probabilities drop below the probabil-ity band are discarded (they fail in the Prolog sense). Similarly, any partial proofs being considered whose cumulative costs exceed the cost band are discarded. Any successful proofs (i.e., those where 1.0 > P(M) > p and 0 < C(D) < c) are re-ported in the order in which they are found. When all possible proofs within both bands have been considered, the cost band is advanced and the search mechanism re-engaged so that only proofs whose cost meets the condition c < C(D) < 2c are reported. This process is continued until all proofs have been considered, where-upon the probability band is advanced and the cost band is re-initialized so that 124 only proofs whose probability and cost meet the condition (1 — p > P(M) > 1 — 2p) A (0 < C(D) < c) are reported. The control structure is essentially a nested loop, where the probability band is decremented and the cost band is incre-mented as the outer and the inner loops, respectively, as in the following algorithm: repeat pmax = 1 ; pmin = pmax - deltaP ; repeat cmin = 0 ; cmax = deltaC ; repeat iterative-deepening-search(pmin, pmax, cmin, cmax, p f a i l , c f a i l , f a i l , Model, Design, intent(Presentation)) ; report(Model, Design, Presentation) u n t i l c f a i l cmin = cmax ; cmax = cmax + deltaC ; u n t i l p f a i l pmax = pmin ; pmin = pmin - deltaP ; cmin = 0 ; cmax = deltaC ; u n t i l f a i l This control structure determines the order in which the solutions are found, and can be tuned to effect a tradeoff between precision and time; the narrower the band(s), the more reliable the ordering of the solutions. For instance, i f there are five proofs with design costs {6,7,8,8,9}, and the width of the cost band c is 5, all of these solutions wi l l be found on the second iteration of the cost loop, but not necessarily in best-first order. On the other hand, i f c = 1, the solutions are guar-anteed by the algorithm to be found in increasing order of cost, but at considerable added computation (many more iterations, or passes through the search space, are required). The iterative deepening meta-interpreter approach ensures that the first solu-tion to be found is the lowest cost design for the most probable model. Note that 125 setting the initial probability bound to zero results in getting the explanations in an order that depends only upon the design costs. Setting the initial cost bound to infinity results in an order that depends only upon the recognition probabilities. The underlying representation is compatible with an existing implementation of Poole's Probabilistic Horn Abduction Framework, which maintains an ordered queue of partial proofs [161]; in that implementation, partial proofs are not dis-carded, but suspended and queued when some other partial proof becomes pre-ferred. Some small changes are required to the existing implementation of Poole's queuing mechanism before it can accommodate the separation of recognition and design assumptions advanced in this thesis; future prototypes may include these modifications. Intent: formal definition Intent-based authors are free to deploy the full power of the underlying represen-tation language to specify their intents. Formally, an intent is defined as follows: Definition 5 — Intent: A n intent I(Pr) is a predicate over presen-tations which is true when the presentation Pr satisfies the author's intent. For instance, the intent to convince the user might be captured in a c o n v i n c e predicate that encodes an argument structure consisting of an introduction, a body, and a summary (each of these with its own rules). As an example of the specification of an intent, see the more detailed descrip-tion of the show intent provided in Figure 5.8. Design: formal definition Formally, a design is defined as follows: 126 show(P) <= % UserType i s instantiated i n the following l i n e : interested(UserType, Topicl), % a f i r s t topic interested(UserType, Topic2), % a second topic % both Topicl and Topic2 interest users of type UserType % Topicl and Topic2 are not the same: di f f e r e n t _ t o p i c ( T o p i c l , Topic2), % make a bunch of assumptions about the user and then f i n d % model-specific topics to present... gender(Gender) , geo(Geo), availableTime(UserType, Time), % accumulate an e d i t - l i s t e d i t L i s t ( [ ] , PI, intro, 0, LI), % an introductory subsequenc ed i t L i s t ( P I , P2, Topicl, LI, L2), % sequences for f i r s t and editList(P2, P3, Topic2, L2, L3), % second topics % show an interview with someone of the same gender, % i n the same area or orientation interview_clip(X, P3, P4, L3, L4), % interview gender(X, XGender), same_gender(XGender, Gender), % also, i f the user i s not l o c a l , show inte r e s t i n g Vancouver % mountains otherwise, skip i t scenic_clip(Geo, P4, P, L4, L), costLength(L, Time). Figure 5.8: Show: an example intent 127 Definition 6 — Design: A design (given a model R) is an explana-tion D of I(Pr) consisting of only design assumptions: D C V, T U RU D tf= V),!7 U RU D \= 3PrI(Pr). The cost of a design is the sum of the costs of the design assumptions of which it is composed: (*(D) — J2deD ^(d)- The 'best' design for model R is the least cost design that is consistent with R. A design "produces" a presentation of information (e.g., graphs, video, text). Note that different designs may produce the same presentation (there may be dif-ferent reasons for presenting the same information). The best presentation in the context of model R is the presentation Pr supported by the least cost design that is consistent with R. Definition 7 — produces(D, M, Pr): We define for notational con-venience the relationproduces(D, M, Pr), which is true when design D and model M lead to presentation Pr as described here, i.e., produces(D, M, Pr) S f U M U D ^ I j f U M U D |= 3PrI(Pr) The partitioning of H partitions each explanation of Obs A 3prI(Pr) into a model and a design component which we denote as (R, D). We define a preference relation yp over explanations such that: (Ri,Di) yp (R2,D2) iff (P(R1) > P(R2) or (P( J R 1 ) = P(R2) and ^(D,) < p(D2))) This results in a lexicographic ordering of explanations. So, the best explana-tion consists of the most plausible model of the user and the lowest cost design. 128 Note that the design D which explains presentation Pr'ml(Pr) for some model R is not necessarily generated for some other model R', i.e., there is no reason why produces(D, R, Pr) should imply produces(D, R', Pr). The logic is a means of "weeding out" incoherent designs, and hence presentations. This separation of model recognition and presentation design assumables also results in interesting ramifications for the way presentations are chosen; in par-ticular, using yp means that we do not give up good models for which we can find only bad designs. For instance, consider the case where we have disjoint as-sumables (student, Ps) and (faculty, Pj), where Ps^> Pj, but the lowest cost de-sign in the context of a model that assumes the user is a student is greater than the one in the context of a model that assumes the user is a faculty member (i.e., fi(Dbest | • • • student • • •) 3> fi(Dbest | • • • faculty • • •). We do not give up the as-sumption that the user is a student; the reasons for deciding in favor of student are not affected by the system's inability to find a good (low-cost) presentation. This behavior is a direct consequence of the methodological separation of design from recognition assumables. Were there only a single space of assumables, the system would not be able to make these distinctions, and would simply select the best model/design combination according to the ordering metric. 5.5 Interactivity and Scrutable Models One criticism of systems which attempt to model their users is their inscrutability to these users (see, for instance, Cook and Kay [52] and Orwant [154]). If the sys-tem acts in an infelicitous manner which does not meet the needs of the user, and where this action results from an error in the user model, a desirable property of the system is scrutability. Users should also be given the means by which to express their dissatisfaction with a presentation (or with elements of a presentation), always with an eye to pro-129 viding users with better presentations. This interaction paradigm supports both the scrutability and the dissention con-ditions described above. In calculating a model that represents the user, the system makes observations upon which further reasoning is conditioned. The user is pro-vided with direct access (via a graphical user interface) to salient elements of the user model, as well as with the ability to criticize these elements. It is this inter-action which is used to evolve the user model from a basic model initially derived from such information as the location of the user's terminal, login id, Internet do-main, and so on. Sensitivity Analysis A problem with displaying the model to the user in support of scrutability is that the model may be huge. Certainly, there may be too many recognition assump-tions to be effectively displayed on a screen, and far too many for quick assimila-tion by users. This cognitive comprehension task for the user is characterized by focused, serially-directed visual attention. Research in psychology has shown that the time taken for such tasks is proportional to the number of objects in the visual field [193]. A solution is to display a salient subset C of R, to which we refer as the salient partial model, consisting of those assumptions which have the greatest effect on the design (and therefore upon the presentation as well). The cardinality of C can be varied to maintain an appropriate pace of interaction; i f the user is taking too long to evaluate the alternatives, the number of alternatives in the next iteration can be reduced. In addition, since the alternatives can be ranked, [graphical] display techniques can be used to render this ordering to the viewer. In this section we describe the sensitivity analysis currently used by the system to select the critical recognition assumptions. In the following, let R = { r i , r 2 , • • •, rn} represent the currently most plau-130 sible model, D the best design that is consistent with R, and Pr the presentation generated. We require a total ordering of the assumptions r, G R by which to rank these assumptions for display to the user. Some useful notation follows. Definition 8 — Cp(Pr, a): Let Cp(Pr, a) be the (lowest) cost of the (best) design that produces presentation Pr in the context of some model M that contains assumption a, i.e., Cp(Pr,a) = min C(D) D,M:a£MAproduces(D,M,Pr) To sort the r; G R from most to least influential, we now need an expression parameterized by the assumptions r,- G R, which we can use as a measure of the influence of r, on the cost of D; we want to know how much of a mistake we would be making if r,- does not correctly model the user, e.g., i f we have assumed that the user is a faculty member whereas in fact the user is a graduate student. A n expression that serves this purpose is defined as follows: Definition 9 — Sensitivity: If produces(D, R, Pr), then the sensi-tivity of Pr to an assumption r G R is the expected cost of the pre-sentation Pr (to users which are incorrectly modelled by r). Let J be the disjoint set {h!,h2, • • •} that contains r: S(r,Pr)= Y,GP{Pr,hl)P{hi) h.eJ Definition 9 does not consider all possible alternatives to R; i f more than one r{ G R do not correctly model the user, this formalism wi l l not be able to diagnose the multiple fault. This "myopic evaluation" [176, p490] is employed instead of joint sensitivity analysis which would consider multiple simultaneous errors in the 131 model because: 1) single faults are more likely than multiple faults, 9 2) account-ing for multiple faults is exponential in the size of the model whereas the myopic approach is linear, and 3) the effects of multiple faults would be very difficult for users to track, resulting in a huge cognitive burden that compromises the scrutabil-ity requirement. The assumptions in the model R are sorted on the sensitivity of the presentation to each of the assumptions (i.e., to each of the r; G R for all i). The salient partial model C = {c i , c 2 , • • •, c^}, referred to above, consists of the first k assumptions in this sorted list. These are the assumptions to which the design, given the current user model, is the most sensitive. In the interaction paradigm related here, the user is shown the assumptions in C, in the context of the disjoint sets to which they belong; i.e., where c; = (Namei, Pi), the system displays Namei along with all the names of the assum-ables in the disjoint set to which c2 belongs. For example, if Namei = faculty and some J = {(faculty, 0.6), (student, 0.3), (staff, 0.1)}, the user sees a set of names faculty, staff, student with the actual assumption highlighted. The number of hypotheses displayed, k, is set to some small integer like 5 or 7. In sample implementations so far, simple text is attached to radio buttons, so that users see not only the assumption that was made, but the other members of the disjoint sets to which the assumption belongs; a simple click on a radio button instructs the system to recalculate the user model with the new assumption. See Section 5.6 for a description of the G U I by which C is communicated to the user. 5.6 Example The following example is taken directly from the prototype implementation de-scribed in the next chapter; specifically, the reasoning and modelling strategies de-9I.e., if p < l,thenp n 132 scribed in this thesis have been embedded in an application called Valhalla, an au-tomatic video presentation tool that uses the reasoning strategies we have described to select, order and play video segments from the Departmental Hyperbrochure de-scribed in Section 1.4. This application uses the authorial intent called "show," the predicate presented in Figure 5.8. In this example, the system was started from scratch after the user had logged in with the personal user id , c s i n g e r . The user immediately requested a presen-tation from the system, which responded as follows (comments have been added for clarity): The initial model, with likelihood 0.029494, is composed of the following recog-nition assumptions: m a r r i e d ( y e s ) o r i e n t a t i o n ( t h e o r y , y ) g e o ( l o c a l ) a r e a ( a i , y ) c _ g e n d e r ( m a l e , s t u d e n t ) u s e r T y p e ( s t u d e n t ) the user is married orientation: theoretical is local to region research area is ai is student, and consequently male is student These are based upon prior probabilities and other contextual information such as the user's login id. A t U B C , we consult a local personnel database to make a best guess about the user's membership in a variety of categories; this database informs us, for instance, that user c s i n g e r is a graduate student, and that user p o o l e is a faculty member. Using this source to justify the assumption that a user with a certain login id belongs either to the class faculty or student is reasonable, but not infallible. User c s i n g e r might have logged in a visiting faculty member, for instance, so that he or she could use Valhalla to learn more about the department while Csinger attended to an unavoidable interruption in the meeting they were having. 133 The best design based upon this model "costs" 51 and is composed of the fo l -lowing design assumptions: overLength(25) rel(topic(lei),topic(research),20) rel(video(sports),topic(sports),5) rel(topic(introduction),intro,1) different_topic_cost amount by which exceeds opti-mal presentation time cost of relevance assumption: lei is relevant to research cost of relevance assumption: a video clip about sports is relevant to the topic of sports cost of relevance assumption: a clip called intro is relevant to the topic "introduction" the topics included are not identical Associated with each design assumption is a cost. The cost of the assumption d i f f e r e n t _ t o p i c _ c o s t , for instance, is less than the cost of the assumption s a m e _ t o p i c _ c o s t , from the same disjoint set; this arrangement prefers presen-tations with multiple topics over those with single topics. Some costs are context dependent: the cost of the o v e r 1 e n g t h assumption is proportional to the amount by which the length of the current presentation exceeds the optimal length for a user of this type. The corresponding presentation Pi is: ( 0 : 1 : 2 6 , 0 : 2 : 2 7 , t o p i c ( i n t r o d u c t i o n ) ) , ( 0 : 2 4 : 4 4 , 0 : 2 5 : 3 , v i d e o ( s p o r t s ) ) , ( 0 : 1 2 : 1 5 , 0 : 1 5 : 5 4 , t o p i c ( l e i ) ) , ( 0 : 3 1 : 5 4 , 0 : 3 2 : 4 4 , i n t e r v i e w ( j i m _ k e n n e d y ) ) Specifically, Py consists of four clips from the video archive with the indicated absolute time-codes and topic identifiers. 134 The user at this point indicates dissatisfaction with the current presentation, perhaps after viewing part of it v ia the Virtual V C R interface, by cl icking on the button provided for that purpose. The system interprets this action as a request for another presentation using the same user model. The next best design based upon this model "costs" 56 and is composed of the following design assumptions: overLength(35.0) r e l e v a n t ( t o p i c ( l e i ) , t o p i c ( r e s e a r c h ) , 2 0 ) r e l e v a n t ( t o p i c ( i n t r o d u c t i o n ) , i n t r o , 1 ) d i f f e r e n t _ t o p i c _ c o s t The corresponding presentation is: ( 0 : 1 : 2 6 , 0 : 2 : 2 7 , t o p i c ( i n t r o d u c t i o n ) ) , ( 0 : 1 2 : 1 5 , 0 : 1 5 : 5 4 , t o p i c ( l e i ) ) , ( 0 : 3 1 : 5 4 , 0 : 3 2 : 4 4 , i n t e r v i e w ( j im_kennedy)) The costs of the two designs differ because there is a preference built into the system for presentations of an optimal duration, defined by the user's membership in certain categories. The user can now navigate through the presentations with a virtual V C R inter-face. Which assumptions are displayed in the user model window? This is deter-mined by the sensitivity analysis algorithm, the results of which are shown in Ta-ble 5.3; this table indicates sensitivity calculation results for the first model and design pair described in this example. The second column shows the calculated sensitivity for the assumption in the first column. The presentation is most sensi-135 S(r,-,Pi) geo(local) c .gender (male, student) userType(student) area(ai,y) orientation(theory,y) married(yes) 104 96 61 51 51 51 Table 5.3: Results of Sensitivity Analysis before user action tive to the assumption that the user is local, and completely insensitive to his re-search area or orientation, or to his marital status.1 0 Because the assumptions are only hypothetical, provision must be made for their revision, both in terms of the underlying reasoning engine, and in terms of a mechanism whereby such revised data can be acquired. In particular, in support of the scrutability desideratum, users can interact with the system to validate or correct the salient assumptions of the model. Valhalla gives the user the means by which to validate or correct assumptions that the system has made by displaying critical assumptions in what we call the user model window, an instance of which is shown in Figure 6.5. Returning to our example, the user informs the system that he is not, in fact, local, but a prospective student by cl icking on the appropriate radio button. The system then re-calculates, with the result that the new model, with proba-bility 0.045375, is: m a r r i e d ( y e s ) o r i e n t a t i o n ( t h e o r y , y ) 1 0 / .e. , S(r, Pi) = C(D) = 51, when r is an area, orientation, or marital status variable. In other words, the cost of the original best design is the same as the cost o f designs induced by the sensitivity metric with respect to these assumptions in the context of presentation Pi. 136 a r e a ( a i , y ) c _ g e n d e r ( m a l e , s t u d e n t ) u s e r T y p e ( s t u d e n t ) g e o ( p r o s p e c t i v e ) # u s e r i s not from the r e g i o n The new design, with cost 137, is: r e l e v a n t ( t o p i c ( u b c _ s c e n i c ) , m o u n t a i n s , 1 ) r e l e v a n t ( t o p i c ( i n t r o d u c t i o n ) , i n t r o , 1 ) d i f f e r e n t _ t o p i c _ c o s t The corresponding presentation P2 is: ( 0 : 1 : 2 6 , 0 : 2 : 2 7 , t o p i c ( i n t r o d u c t i o n ) ) , ( 0 : 1 8 : 1 7 , 0 : 2 8 : 2 7 , t o p i c ( u b c _ s c e n i c ) ) S(ri,P2) geo(prospective) 267 c_gender(male,student) 242 userType(student) 237 area(ai,y) 237 orientation(theory,y) 227 married(yes) 227 Table 5.4: Results of Sensitivity Analysis after user action The contents of the user model window depend again upon the results of the sensitivity analysis, shown in Table 5.4, where the second column shows the cal -culated sensitivity for the assumption in the first column. In presentation Pi, a preference for multiple topics is in evidence. This pref-erence stems from forcing the system to make an assumption whose cost depends 137 upon the relationship between elements in the presentation. The system currently values diversity over emphasis, as described in Section 5.4.2. In presentation P2, the sensitivity to the assumption of locality is demonstrated. When the user was assumed to be local (in P i ) , there was no clip in support of scenic views of U B C , but when the user is assumed (in this case because the user actually informed the system) to be prospective and therefore not local, a scenic clip is included. Note that this clip is included at the expense of the earlier research and sports clips; the system currently places a greater value on showing U B C ' s scenic merits to prospective department members, than on any other design ele-ment. 5.7 Alternative Approaches 5.7.1 Decision Theory for Multimedia Presentation Design Here we explore how decision theory could be used in the service of the stated goals of intent-based authoring, and argue that expected value (see Section 4.2 for background) is probably not the right approach. In this section we focus on the interaction paradigm in support of what has been referred to in this thesis as scrutability: a scrutable system is one that makes clear to users the relationship between the model of the user maintained by the system, and the behavior of the system. The means by which scrutability is achieved in the system is an approach to interaction that permits users to critique the user model in pursuit of better sys-tem behavior. A n important element of this approach is the approach to sensitivity analysis, described in Section 5.5. Cheeseman [40] discusses the use of decision theory for design, and has sug-gested that the best design is the one that results from averaging over all possible models and maximizing expected utility or minimizing expected cost; in Equa-tion 5.2 the expected value of the cost function C over designs is to be minimized, 138 and the best design is D : E(C(D)) is minimal. E(C(D)) = n(D\M1)P(M1) + f,(D\M2)P(M2) + --- + fi(D\Mn)P(Mn) (5.2) u(D\M) is the cost of design D in the context of model M, and P(M) is the poste-rior probability of model M (refer to Section 5.4 for formal definitions of the terms "model" and "design," which are used informally here). There are reasons in the current application why this may be the wrong thing to do. For example, consider the following scenario. Assume, for simplicity, that users of the Departmental Hyperbrochure (see Section 1.4) can be only faculty members with prior probability 0.6 or students with prior probability 0.4, respectively, and that the following costs are known: ti(sports\faculty) = 10, ii(sports\student) = 5, ii(research\faculty) = 5, fj,(research\student) = 15. Assume also that a current user of the system is a faculty member, that the system has assumed that the user is a faculty member, and that this assumption is clearly indicated to the user. Then, E(C (sports)) = 8, and E(C(research)) = 9. There is no indica-tion to the user why the former is chosen for presentation over the latter, and the user may wonder why he is watching an irrelevant basketball game. The faculty member using the system may not be aware of the high degree of aversion on the part of students to research—a value not shown to the user (because there may be a very large number of such values). The behavior of the system is inscrutable. It may be better to make a mistake and admit it, giving the user the means to correct the contextual error, rather than average over all possible mistakes. There are exceptions, and it is not difficult to conjure counterexamples under slightly dif-ferent assumptions from the ones that guide our own effort; consider a scenario where it is possible to present material that is known to be highly offensive to a particular category of user, but which is known to be greatly appreciated by an-other; pornography is one such delicate issue today, and policies are under review at academic departments around the globe. Another example: one would not want 139 to present a viewer with sensitive corporate data on the assumption that he or she is a joint-venture partner, only to find out later that the viewer is from a competitor's firm. The system calculates the most likely user model, and then uses it to design the best presentation. Refer to Section 4.1.1 for background material about the formal-ism. 5.7.2 Costs and Utilities This section is a discussion of how the values associated with the design assum-ables should be interpreted. These quantities can be interpreted in a number of implementation-dependent ways: it could be an estimate, for instance, of how hard it is for the system (from a computational point of view) to realize the design element, or of how much cog-nitive or perceptual effort (the system thinks) w i l l be required from a human to comprehend some manifestation of the design element. The values are used to constrain choices in the design space (for instance, to prefer some alternatives over others as in the example provided in Section 5.6); this interpretation sees the val -ues as a measure, attributed by the system on behalf of the user, of how relevant a design element is to the user modeled by the current user model. The investigation in this thesis has been formulated in terms of a system which minimizes positive design cost; this approach encourages designs which are "min -imal " with respect to the number of design assumptions made. A l l costs associated with design assumables are positive, and the function used to accumulate total cost is simple summation, which means that the total cost function is monotonic in the sense that it can only increase as additional assumptions are made over the course of the proof. It is also "unbiased" in the sense that it is possible only to express relative preference (see Expressiveness, below). Another possibility would be to implement the dual system which maximizes 140 positive utility; this approach would produce designs which are "maximal" in the number of design assumptions. These distinctions are important because they af-fect the expressiveness of the underlying knowledge-representation language, as well as the computational complexity of the implementation that is built with it. Expressiveness: It is not the same thing to say that 'presentations that do not depict sports events are preferred over those which do' and to say that 'do not show sports events.' The first can be expressed with a system that maximizes positive utility by setting the utility of sports events to be lower than any other utilities, or with a system that minimizes positive cost by setting the cost of sports events to be higher than the costs associated with presenting other kinds of events. The second requires an ability to express negative bias ('user hates sports') which cannot be represented in a system that uses a monotonic utility function; making additional assumptions can only increase cumulative cost (or decrease utility) and decrease cumulative probability. Complexity: The aforementioned expressive power is given up in favor of mono-tonicity 1 1 because non-monotonic utility functions would require the system to gen-erate the entire search tree before reporting the best solution: branches would have to be followed to their leaves because arcs could be labelled with negative values, and all branches would have to be followed before it could be known which of them lead to the best design. (Partial proofs would do the system no good at all.) As the search space may be very large, this would be an unacceptable strategy in our presentation domain, which currently benefits from a best-first search strategy (see Section 5.4.2). These issues can not be simply dismissed by arguing that the system could shift the scale of costs or utilities associated with the design assumables so that the low-1 : T h e term "monotonicity" is used here not in the logical sense of non-monotonic reasoning, but in the classical mathematical sense of a function whose derivative does not change sign. 141 est cost or the lowest utility is set to zero. This compromise strategy taken to permit the expression of negative bias while retaining favorable complexity parameters, introduces complications of its own. Consider, for instance, a candidate design with a "research" component (with utility of 100) and ten "sports" components (each with utility of —10); such a design would have a net utility of zero, and any other design with a positive utility would be preferred to it. For instance, a candi-date design with a single "entertainment" component with total utility 5 would be preferred. Assume also that there are no design assumables in the database with associated utilities that are "worse" (less than) —10. If the assumables in the database were preprocessed by the system as discussed above, the assumption associated with the "research" component would now have a utility of 110, and the assumption associated with the "sports" component would have a utility of zero. The candidate design would now have a net utility of 110, and would now be preferred to the "entertainment" candidate, which is now worth 15. Such changes in design preference would be difficult for knowledge engineers and intent-based authors to foresee, and so this scaling remedy does not appear to be a good solution. Again, the ability to explicitly express bias is given up in favor of mathematical monotonicity, which supports the best-first search strategy. 5.7.3 Other similar approaches W u [200] uses passive recognition techniques to build a model of a dialog partner, but decision theoretic measures are used to decide when the system should inter-vene to make direct inquiries of the user. The system is said in such an event to have adopted an active acquisition goal; these A A G s are isomorphic to the salient set of assumptions described above, in the sense that they are the ones judged by the system using decision theoretic measures to be the (only) ones with which it might be worth distracting the user. The approach taken in this thesis differs sig-nificantly, however, in that the user can continue to ignore the user model window, 142 and in that even this inaction can be used as a basis for drawing further conclusions. 5.8 Conclusions Guided by a desire to build cooperative systems, a way was sought to acquire, rep-resent, and exploit simple models of users. The models are acquired with an ab-ductive reasoning strategy, and consist of assumptions about the user, drawn from a pool of possible hypotheses called recognition assumables. Associated with each of these assumables is a probability, and the probability of a model is the product of the probabilities of the assumptions composing the model. The model explains observations of the actions of the user at the interface to the system. Design assumptions must be consistent with model assumptions, and are drawn from a pool of possible hypotheses called design assumables. Associated with each of these assumables is a cost, and the cost of a design is the sum of the costs of the assumptions composing the design. The design and the model together explain a presentation that satisfies the communicative goal, or intention, of the author. The requirement of scrutability was identified, which is the constraint that the behavior of the system should be seen by its users to be a consequence of the user model. It was found that this goal is not met by expected value approaches to de-sign. The minimal A I approach to user modelling described here services the scrutabil-ity condition by providing the means to perform a kind of sensitivity analysis on components of the user model, so that only the most critical elements are displayed to the user, who can then criticize them. The contribution of this thesis is both an interaction paradigm that permits the user to "debug" the user model, and a prob-abilistic Horn abduction approach to reasoning that implements the paradigm. Chapter 6 Implementation But men do not begin to act upon theories. It is always some real danger, some prac-tical necessity, that produces action; and it is only after action has destroyed old re-lationships and produced a new and perplexing state of affairs that theory comes to its own. Then it is that theory is put to the test. — H . G . Wells Outline of History, page 693 A prototype of the system described in the rest of this thesis has been devel-oped. It embodies the intent-based authoring ideas advanced in this dissertation: Valhalla is a scrutable system that selects and orders video clips from a repository of material describing the Department of Computer Science at the University of British Columbia, and the examples in this chapter are drawn from the Departmen-tal Hyperbrochure, described in Section 1.4. Valhalla decouples the specification from the presentation tasks of authoring, abandoning the traditional model in favor of the intent-based paradigm. The author brings an intent, and information he thinks wi l l be relevant to the eventual presentation. After annotation, a representation of this intent, and a set of indices into the raw video reside in a "document." This is all done at compile-time, in the absence of the viewer. Later, at run-time, the reasoner uses the docu-143 144 ment, along with the user model and other knowledge, to produce an edit-list. The viewer, even in the absence of the author, sees only the most relevant portions of the video when the user model is accurate, and can take remedial action when it is not. 6.1 Architecture A distributed architecture was chosen for the prototype implementation. Populat-ing Valhalla's framework are a number of autonomous agents that fall into the fo l -lowing classes: client applications, network-based multimedia servers, and other service providers such as reasoning engines and annotated video databases. F i g -ure 6.1 is taken from Gribble [88] and illustrates agents within the Valhalla appli-cation framework. client applications reasoning engines and other service providers 9 9 9 network H 3 multimedia servers Figure 6.1: Valhalla's Distributed Architecture 145 Each class of agent has associated with it a single communications protocol, which allows client applications to transparently communicate with all instantia-tions of that class. It is this "plug and play" compatibility between members of a class that makes the Valhalla framework flexible and powerful. One member of each class of application has been currently implemented in the prototype framework. A client application, also named Valhalla, provides a user with intent-based authoring services. This client uses the services of a reason-ing engine to generate descriptions of video presentations, and displays the video sequences within the presentations using the services of a distributed multimedia server. The Multimedia Server From the perspective of client applications, the server is a single entity providing virtual V C R - l i k e control over multiple media sources. The server has the ability to simultaneously operate in two modes: in local mode, media sources are con-ventional analog devices, whose output may be routed to a number of available display devices using an RS-232 controlled switch. In remote mode, digital media is transmitted over the network to the client application from one or more remote sites, and is controlled using media playback applications present on the client's host. The server is implemented as a series of increasingly abstract application pro-gramming interfaces (APIs). Each interface can be directly accessed by a client application, but typical operation of the server would only involve access from the highest (and most abstract) layer. A higher layer interface uses the services of a lower layer in order to provide its own services. Figure 6.2 illustrates the relation-ship between these interfaces and the various components of the server. The lowest layer, called the device layer, is accessible only through T C P socket communications and is composed of a series of device drivers. There is one device 146 Figure 6.2: The hierarchy of interfaces to the video server 147 driver for each physical device, and one driver responsible for dispensing a class of digital data. The next most abstract layer (the class level) is accessible to client applications via an A P I to a library that directly communicates with the server on behalf of the client. This intermediate layer provides completely separate interfaces for each class of device. Devices of the same class share characteristics particular to that class. Providing a separate interface for each class allows client applications to take advantage of these particular characteristics. Supported classes currently in -clude digital video, digital audio, random-access analog video (optical video disc), and tape-based analog video. Future extensions to the server w i l l support other de-vice classes, such as text, MID I or digital audio. The top layer, called the v i r tual -VCR level, is again accessible via an A P I and provides an interface that client applications can use to control any device. Char-acteristics of different device classes have been abstracted away in order to provide a single V C R - l i k e interface. Using an A P I to communicate with the more abstract layers of the server is advantageous for a number of reasons. The inclusion of a library into the client executable facilitates the distributed architecture of the server; each library can be considered to be an agent of the server executing on the client's host machine. The presence of the server on the client's machine allows the server to directly control applications on the client's host. Such applications would be used for the playback and manipulation of digital media. Secondly, the AP I itself has been designed to provide a uniform method of interacting with multiple media sources and formats, allowing the differences between classes of media to be partially abstracted away. The video delivery component of the system is designed to handle tape, video disk and digital video through a video server mechanism currently implemented on a Sun architecture [88]. 148 The Valhalla Client Applications The first Valhalla client prototype has been completed, and another is currently un-der development. Figure 6.3: The Valhalla Control Window Figure 6.3 depicts the Valhalla control window, the main human interface to the Valhalla client, which is implemented in Objective C and currently resides on a N e X T cube equipped with a N e X T Dimension board. The control window con-tains — i n addition to the familiar virtual V C R control panel at the lower l e f t — controls to advance to the next clip in the current edit-list, to return to the previ-ous clip in the current edit-list, to replay the current cl ip, and to proceed with the presentation ("Go"). Depression of the "Show" button is interpreted as a request to calculate the next best presentation for the current model, and is passed on to the reasoner. The "Show" button is "wired" to a predicate in the reasoner's knowledge base that en-codes a specific authorial intent; for the Departmental Hyperbrochure project de-149 scribed in Section 1.4, the underlying intent of all presentations is always to inform the viewer, to make maximally relevant presentations of the research and other departmental activities to individual viewers. Other applications using the same client interface program may have the "Show" button wired to other predicates encoding other intentions, and it would not be difficult to provide multiple but-tons on the interface so that the user could select different (pre-defined) intentions at run-time. It is also possible in the current version of Valhalla to author new in -tentions at run-time through a special query window that is not shown or discussed further in this document; this avenue is not pursued because the system does not demand that the human in the role of viewer be able to program in Prolog, and to understand first order logic. (It does currently require the authors and knowledge engineers involved to have these skills.) " N o ! " is merely a direct way for the user to express dissatisfaction with the current presentation, freeing the reasoner to recalculate both model and design as required. Any activity at the control window is echoed to the reasoner, which can use plan recognition techniques to infer the motives of the user from these obser-vations of user behavior. The label of the current interval as provided in the an-notation database is displayed. Manual laser disk controls include absolute frame indexing. The services of the multimedia server are used to display clips contained in an editlist. Currently, all media associated with the Hyperbrochure is in the form of (synchronized) analog video and audio stored on C A V laserdiscs. Analog signals from two laserdisc players are routed to a video digitizer board within the N e X T computer, and the resulting digital video is displayed in a window on the NeXT 's display. A n example frame from a campus sporting event is shown in Figure 6.4. The needs of individual users are met by referring to the user model, which is arrived at by the reasoning method outlined earlier in this thesis. As discussed earlier, the user is given the opportunity to critique a selected subset of the model 150 Figure 6.4: A frame from the Departmental Hyperbrochure via Valhalla's user-model window, shown in Figure 6.5. The hypotheses actually displayed to the user are context-dependent, selected and ranked by a sensitiv-ity analysis algorithm (described in Section 5.5) that reflects the degree of impor-tance to the design, of each assumption in the model. In addition, the techniques employed to display these assumptions reflect their relative importance; quanti-ties to which the design is most sensitive can be shown, for example, in bolder fonts, brighter colors, larger characters, and so on (cf. perceptual salience, Sec-tion 5.4.1). Every effort is made to sanction the further assumption by the sys-tem that the user has actually seen and attended to the display in the user model window. The Valhalla User Model window implements the interactivity paradigm advocated in Section 5.5. Cl icking on any element of the User Model window first sends a message to the Reasoner which causes it to succeed in its processing loop; this message is followed by an instruction that encodes the change to the user model that the user has just specified. The Reasoner makes the requested change to the user model and then calculates a new presentation. In Figure 6.5, the screen shows a number of sets of radio buttons, which is Valhalla's display technique for 151 User Mode! Each item below represents an important assumption that the system has made about you. Correcting the assumptions will change the behaviour of the system and the nature of your presentations. C userTyp8(faculty) C userType(stueier!f) C userType(stafl) C gertdet(maJ6) C gendeiflemaie) C ge^local) C gec^prospective) ;„, ; I 1 Figure 6.5: The Valhalla User M o d e l Window variables whose values are drawn from an exclusive (disjoint) set. Here, Valhalla believes the user is a faculty member; a student can correct the system's miscon-ception with a single click. The pertinent elements of the user model selected by the reasoning engine are transferred to the hyperbrochure application in an abstract form. Instead of spec-ifying a particular G U I "widget", a particular hypothesis may be specified as an element of a (finite) discrete set, as possessing a value within a particular range, or as having a boolean value. It is left to the hyperbrochure to determine an ap-plicable "widget" with which to display this to the user. This method of providing abstract, indirect control over the reasoning process adds to the flexibility of the framework. 6.1.1 The Reasoner The Reasoner is a best-first Sicstus Prolog implementation of the assumption-based reasoning framework introduced in this paper. The knowledge bases are all writ-ten in a Prolog-l ike Horn clause language extended with assumptions (as described 152 in Section 5.4), and the annotation database consists of only definite clause asser-tions. User ± User Interface User Model 7\ Reasoner \ Video Server Edit List Video Disk Video Tape Media Figure 6.6: Valhalla implementation Reasoner Interface Internet Figure 6.7: The Valhalla Agents These aspects of the design can be seen in Figure 6.6. Connections between the video server, user interface and reasoning engine are all client/server (TCP/IP) links using standard Unix sockets, giving flexibility and platform independence. Figure 6.7 illustrates these relationships. 6.2 Functionality 153 ? area(Student, graphics), % Student studies graphics, supervises(FacultyMember, Student), % i s supervised by FacultyMember relevant(FacultyMember, Topic), % to whom Topic i s relevant e d i t L i s t ( [ ] , Presentation, Topic, 0 , L), % Get a video e d i t - l i s t costLength(L, 3 0 0 ) . % close to 3 0 0 seconds Figure 6.8: Authorial Intent as a Prolog query Presentation is decoupled from specification by having the system prepare an edit list of relevant events and intervals subject to the constraints in the available knowledge bases. The generation of this edit-list is performed at run-time, rather than compile-time, so that the author need not be physically present to ensure that the presentation is suitable. The intent of the author is currently encapsulated in a distinguished predicate that is attached to a Show button on the interface, whose intended interpretation is that viewers should be given a basic overview of the ma-terial available, followed by a body which is relevant to their immediate informa-tion retrieval goals, and then by a conclusion; other author intentions could be sim-ilarly encapsulated and connected to the Show button, or to other buttons on some custom interface. A number of constraints are applied to the design of the presentation, including for instance that its length not exceed a certain amount of time. The user's inter-action with the system is restricted in this way to factor out variables that would make it difficult to test the impact of our user modelling approach. Note, however, that there is an additional window, not pursued in this dissertation, that can be used to make arbitrary queries of the reasoning engine, in the underlying representation language described in this thesis; the user can take the role of intent-based author by specifying his or her own intent in this window and instructing the system to find an appropriate presentation. For example, in Figure 6.8, an author might form this 154 query to ask for a presentation of footage (optimal length of five minutes), relevant in some way to a departmental supervisor of a graduate student associated with the Computer Graphics research laboratory, and so on. Obviously, the full power of intent-based authoring is not realizable in the prototype without some facility with Prolog, and with the underlying reasoning and representation methodology; this is why we expect the user testing of Valhalla to make use of the Show button, which abstracts away from these complexities. The system has been tested with the Departmental Hyperbrochure, introduced in Section 1.4. Potential viewers of the material are prospective and current grad-uate and undergraduate students, faculty and staff, funding agencies and industrial collaborators. A l l these are potential users of Valhalla, and each brings idiosyn-cratic goals and interests that the system attempts to meet with tailored presenta-tions. Author Structured Video Document Annotation Viewer Figure 6.9: Knowledge-based Video Annotation and Presentation The way the intent-based approach is mapped into the video authoring domain is shown schematically in Figure 6.9: the author brings an intent, and information he or she thinks wi l l be relevant to the eventual presentation. After annotation, 155 a representation of this intent, and a set of indices into the raw video reside in a "document." This is all done at compile-time, in the absence of the viewer. Later, at run-time, the reasoner uses the document, along with the user model and other knowledge, to produce an edit-list. The happy viewer, even in the absence of the author, sees only relevant portions of the raw video. Although the current implementation has the Reasoner acting as a server to the client Interface, the simple protocol underlying communications between these agents permits the Reasoner to view the Interface as just another Prolog predi-cate: an "ok" from the Interface (caused by successful completion of a presenta-tion to the user) allows successful completion of the Reasoner's evaluation loop; any other message from the Interface (caused, for instance, by the user expressing dissatisfaction with the current presentation by cl icking on the No! button in the Control window) forces the Reasoner to initiate a Prolog backtrack to find other solutions. Cl icking on the Show button in the first place initiates calculation of the first presentation, using only prior probabilities for recognition assumptions. 156 Scenario: Company Meeting Company XYZ holds a meeting to decide whether or not to build a new production facility. The meeting is supported by advanced Group Support System (GSS) tools, and complete records (minutes, keystroke logging, video, etc.) are kept. Someone subsequently gets the job of creating a presentation of the meeting in order to document and justify the process by which the decision to build the new plant was reached. The intent of the ensuing presentations is to convince the viewer that the decision was well motivated; different viewers get different renditions of the argument. The marketing manager, for instance, gets a tailored account that focusses on sales forecasts for new plant operations. The accounting manager sees a spreadsheet with emphasis on cash-flow and detailed cost-benefit analyses. Po-tential investors see glossy images and video backup, along with statistics demon-strating improved delivery schedules—but nothing that would be useful to a com-petitor. Any individual viewer of one of these presentations might be inspired to query the system for further information on any subject. He thereby becomes an author in his own right and specifies his own intent. A customer, for instance, might want to see how the decision to build the new plant will affect prices. He authors his own presentation of this information; whether or not the intent of the original author (to convince) plays any role in this last presentation is an ethical and practical issue of some interest, but unfortunately outside the scope of this thesis. An example: InterSpect Consulting Corp. 's 1999 Yearly Report includes the following fiscal information (in millions of dollars): • Land usage costs: 12 • Data storage costs: 75 • Doughnuts and coffee: 105 • Communications costs: 23 The presentation system has available to it knowledge about graphical presen-tation strategies and languages [129] [194], and reasons its way to an optimal 157 presentation for the president oflnterSpect based upon this as well as a continually updated model of the president's goals, beliefs, desires and preferences. If the au-thor's intention in preparing the year-end report was to fairly and accurately por-tray the company's expenses, the president might see a simple bar chart in which the suspiciously high doughnut and coffee bill would be immediately apparent. If the author's intention was actually an elaborate argument for even more resources to be allotted to the doughnut and coffee account, such a presentation element may not be prudent. Instead, another graphic would be designed at presentation time in which the offending quantity is de-emphasized, taking full advantage of the equip-ment available at run-time. Part IV Conclusions 158 Chapter 7 Conclusions The whole duty of a writer is to please and satisfy himself, and the true writer always plays to an audience of one. —The Elements of Style, page 85. Most systems which are currently available to support authoring inherit the limitations of the traditional model of authoring, which were presented at length in Section 1.1 and recapped in Section 5.2. The foremost such limitation stems from the early commitment to content, which makes it impossible to provide user-tailored presentations at run-time. The problems are exacerbated in the video medium because it is temporally linear (and because humans have so little time), and because current techniques for automatic speech and visual recognition leave the contents uninterpreted. A knowledge-based solution called intent-based authoring was described in Chapter 5 which mitigates the serious effects of these problems, by separating in-tent from content, analogously to how the structured document paradigm now sep-arates content from form. The separation of intent from content and form enforced by the intent-based authoring paradigm frees the intent-based author to work at compile-time in the absence of the viewer to specify the intent, or communicative 159 160 goal underlying the eventual presentation. A run-time presentation system then makes the presentation by first deriving a model of the viewer and then tailoring a presentation that meets the author's objectives as given in the specification of intent, and the interests of the user as represented in the model. The entire frame-work is represented in a Horn clause logic dialect, described in Section 5.4, which supports the recognition of user models, and the design of presentations, by abduc-tion. Other components of the framework described include a scrutability desidera-tum, to offset the possibility that the model contains errors. This desideratum re-quires that users be shown relevant aspects of the user model, so that they can un-derstand the (mis)behavior of the system. Because the model may be very large, only a salient subset can be shown in general; this subset is determined by a sen-sitivity analysis, described in Section 5.5. A prototype called Valhalla was implemented and tested, which manifests the intent-based authoring framework. The central focus of this project continues to be the deployment of artificial in -telligence techniques for user modelling. This work is performed within the limits of what has been called "minimal A I " to explore the simplest useful applications of probability, logic and decision theoretic reasoning strategies to the problems of modelling users of computer systems. The approach is very simple, based as it is upon well-tested notions from decision theory and the A I literature. The intent-based authoring paradigm described in this thesis can be applied to different media, domains, and tasks. It has potential to circumvent limitations of the traditional model of authoring. Intent-based authoring requires the application of computational intelligence, a prospect which is only today becoming realistic; the future of authoring looks promising from this vantage point. 161 7.1 Implications Authoring is hard, which is why technology has been used in the past to aid authors with their toi l , and why this research has been undertaken to investigate the issues surrounding the task. However, it seems on the surface that intent-based authoring as advocated in this thesis is hard too—harder than traditional authoring, which is at least well understood, if limited in the aforementioned ways. So why should modern-day authors engage in the intent-based authoring paradigm? In addition to the reasons already discussed (cf. individualized presentations for individuals), is the reuse factor. Intent-based schemata, once authored, can be reused. Knowledge, once ap-propriately indexed, can be reused, often in unanticipated ways. The principle of code re-use in software engineering is motivated in the same way [119]. While it is true today that an author wi l l need to invest a great deal more work into the spec-ification of an intent-based presentation than in the design of a traditional book or conference paper, the intent-based author and reader benefits from a more effective deliverable: the dynamic, intent-based document.1 7.2 Future Work Intent-based authoring captures the notion that authoring is distinct from view-ing/reading, that the design of the document is not the same as its presentation. These processes have been separated and the functions of distinct sources of knowl -edge and information elaborated in the authoring and reading cycles. The author provides some of the information to be presented, and describes his intention as regards this information, but we can go further. intent-based authors of tomorrow w i l l benefit from the presentation strategies and knowledge bases of all of his or her forebears, and, one day, the n-th intent-based presentation may take less effort from a human author than the corresponding traditional presentation would have. 162 The authoring system can support the author in this design or specification task by referring to knowledge sources that may prompt further elaboration on the part of the author. If the authoring system knows, for example, that the presentation facilities include video imaging capabilities, it may prompt the author to provide video information, if available. The authoring system may even have a partially instantiated model, or an aggregate model of the set of intended viewers of the pre-sentation; this too may solicit additional contributions from the author. Viewing, in turn, can be mediated by these same information sources. The model of the reader is consulted at run-time to design the presentation, and spe-cific presentation elements are determined then by available resources: if the color printer is down, the presentation may have to be optimized for delivery on a black-and-white printer; presentations on an interactive display might differ significantly from those destined for hard-copy devices.2 The effectiveness of these and other techniques wi l l be evaluated in future work. Empirical testing of the Valhalla interface is being undertaken to see i f the user modelling techniques it encapsulates help users accomplish certain well-defined information retrieval tasks, as it is believed it w i l l . Empirical studies wi l l also yield insight into the notion of perceptual salience advanced in this dissertation. 7.2.1 Learning: Updating Prior Probabilities It is possible within the representation described in this thesis (see Section 5.4) to encode dependencies between random variables [161]. We could render explicit the relationship between sex and category, for instance, and continue to add other dependencies we consider important. The problem is that there may be many such relationships, and it wi l l be difficult to decide which are important enough to en-2 T h e W I P system uses just this approach; combined with the intent-based paradigm, it is a promising direction. See also the work of Mackin lay [129]. 163 code, and which can be safely ignored. Not only is this a knowledge engineering task which it is desirable to avoid, but the computational complexity of the result-ing Bayesian network rises dramatically with the number of arcs representing de-pendence. Instead, probability values can be obtained from an episodic knowledge base (EKB) [53] that tracks the user population over time. Values thus obtained wi l l reflect actual relationships in the user population, with a computational complexity much lower than that for a complete Bayesian analysis, and the significant costs of error-prone knowledge engineering are completely avoided. At the end of a session, the E K B is supplied with a new data point, a new in -dividual; the information supplied is in fact the current, most l ikely user model, consisting of the highest-probability assumptions. This data point in the E K B can move over time, with continued experience with the same user. When the reasoning agent requires a prior value, a call could be made to the E K B (as a procedural attachment, perhaps). P(male) would be originally supplied by the E K B as simply the number of individuals in the E K B who are male, divided by the total number of individuals. Later, after the user has indicated that he or she is a faculty member, the E K B would be queried for the value of P'(mal e\faculty). This value is the number of individuals in the E K B who are male faculty members, divided by the number of individuals who are faculty members. The E K B could be used as a kind of "user model server," a repository of in -formation about the user population that can be queried by all applications serv-ing that population. Multiple local E K B s could be located within an institution to serve different subsets of the complete user population, and global E K B s could serve queries over the entire population by querying all of the relevant local E K B s . The E K B would function as a separate agent, communicating with other agents via TCP/IP. Craddock [53] describes an E K B that might be employed in the manner suggested here, and Orwant [154] [155] has recently implemented a system called 164 Doppelgdnger that uses a similar mechanism. Since the information in the E K B s is intended to be persistent, it is a design issue of some social impact whether the individuals stored in the E K B should be identifiable. If they are, then persistent models of individuals would result in highly accurate probabilities at the start of a new session with a user that had already used the system, or who had used another system served by the same E K B . If not, then the system must begin to model the user anonymously, from scratch, albeit with the help of a growing E K B . Individual tracking wi l l keep the number of individuals stored in the E K B down to the number of actual users accessing the systems served by the E K B , while anonymous storage wi l l result in a new entry in the E K B for every session by every user at every system served by the E K B . There are clearly significant advantages to individual tracking, but these must be weighed against its (still unclear) social repercussions (but see Section 7.2.2). There is an unexplored relationship between the notion, well-entrenched in the user modelling literature, of user stereotypes [169] [45] [44] and the E K B ap-proach to determination of prior probabilities. This relationship is hinted at by Rich: " . . . stereotypes can be viewed as concepts, and then they can be learned with statistical concept-learning methods." [171] The current discussion can be seen in just this light. Once something is known about the user, this information can be used as in conventional stereotyping approaches to "trigger" the application of a stereotype. Say, for instance, that the user has indicated, or we have reasoned by plan recog-nition, that he or she is a faculty member. The traditional approach to the use of stereotypes would have us apply the 'faculty-member stereotype' and attribute thereby to the user the defaults therein. The approach described in this thesis com-putes using best-first abduction other consistent aspects of the user model as needed. The E K B approach would accomplish the same result, by finding the closest point in the feature space. 165 As with the approach used in this thesis, there is no need in the E K B approach to foresee all the possible combinations of triggers; the E K B can provide, on de-mand, a 'stereotype' for any conceivable combination of 'triggers.' Thus, we may encounter, in the course of interaction with the user population, an individual who is both a faculty member and a smoker, as well as an avid baseball fan. Although multiple inheritance from separate stereotypes for faculty, smoker and baseball can accommodate this particular example, considerable attention must still be given to preference for attribute values when these disagree between stereotypes. The E K B is immune from these problems, and provides values for any conjunction of attributes on demand. A simpler approach In this subsection a simpler approach is described that does not require the mech-anism of an E K B , but which does not capture all the dependencies that may exist in a user population. The knowledge engineer can "seed" the representation of prior probabilities. For instance, he or she can set the value of the random variable describing whether a user is a student or faculty member as follows: {(student, a/c), (faculty, b/c}}; instead of storing priors as single real numbers, he can store pairs of numerators and denominators whose quotients are the priors. The value of the denominator can be interpreted as the size of the sample space from which the prior is determined, and the numerator as the number of positive instances found in that sample. Over usage, additional instances are encountered, which can be seen as extend-ing the sample space, and consequently having an effect on the values of the priors. A positive instance increments the numerator of the corresponding alternative, and the denominator of all alternatives. There are obviously many pairs of numbers which, when divided, result in the same quotient. For instance, there are many a, b such that a/b = 0.5, but the mag-166 nitudes of the numbers chosen to represent the priors influences their sensitivity to new information. The larger the denominator applied to the representation of a random variable, the more confidence implied, and the less sensitive it w i l l be to observations. (See [201], cf. Dirichlet Priors.) 7.2.2 Privacy The text of this thesis has consistently evaded the serious issue of privacy, with all of its complex social and ethical ramifications. Though outside the scope of the current work, any serious elaboration or broader application of the technology of user modelling wi l l bring attention to the matter. Obvious questions include: Who or what has access to the models? Are the models anonymous, or are they individual? Are they persistent, or discarded after each session with the system? Is the user aware of the modelling activity, and has he or she consented to such activity? The design of any user modelling system must necessarily answer these and other questions. In this research, the model is derived during the user's session, is available only to the reasoning engine and the user, and is discarded when the user terminates the session. A l l users of the system are made aware of the nature and scope of the modelling activity taking place, and they are, because of the scrutabil-ity desideratum advanced in Section 5.5, able to investigate the model itself via the user model window of the application interface. As mentioned in Section 7.2, there is considerable advantage to maintaining persistent, individual models of users. If users return for multiple sessions, the modelling activity can begin where it left off at the end of the last session, thereby making the user's time at the interface more efficient, and more effective. Even anonymous models can have this salutary effect, to a lesser degree. User models are likely to become important elements of future applications, as well as marketable commodities in themselves, very much like conventional mai l -167 ing lists are today. Today, it is polite and correct to ask people if they wish to be on a list before it is released and used to send them solicitations. Some people feel that their membership in a mailing list is to their advantage, others feel it is an in -trusion into their private lives; this dichotomy wi l l l ikely apply to the use of user models as wel l , but with greater potential, perhaps, for misuse. Conceivably, models acquired over long periods could contain extremely de-tailed information about interests, habits and beliefs of individuals which, although valuable in the service of a cooperative application interface, could result in ad-verse effects ranging from unsolicited (though hopefully well-targeted) junk mail through to discrimination based on any of the many attributes of the model. The aforementioned concerns are addressed by one of the objectives of the pri -vacy protection policy enunciated by the privacy protection study commission [2], referred to as the principle of maximizing fairness. Government databases are sub-ject to these and other guidelines, but no laws have yet been passed in North Amer i -can courts to protect the rights of individual users of computer systems (see Rosen-berg [174] for a broader view of privacy considerations in legal and social con-texts). Solutions to the problems created by the use of persistent, individual mod-els could take many directions. Users can be given control over access to their models by having them specify which attributes or groups of attributes should be made available to which services and at what degree of authentication [58]. Users could maintain physical control of their models by carrying them on smart cards, P C M C I A cards (as in Doppelganger [155]) or similar technology. Advanced en-cryption techniques with multiple levels of security may become accepted for net-worked transactions involving user models. Privacy is an important issue that must be addressed not only from legal and ethical imperative, but because users themselves wi l l be reluctant to use systems that do not treat sensitive personal data with due attention. Emerging secure tech-168 nologies wi l l hopefully be up to this task. 7.2.3 Future Development A number of extensions to the hyperbrochure client are being planned. In addition to directly gaining knowledge via the user model window, details of the user's in -teraction with a presentation are provided to the reasoning engine. The reasoner does not currently use this information, but could bring resources to bear on it. For instance, i f a user uses the navigation buttons in order to skip the viewing of the remainder of a particular cl ip, it may be deduced that the user doesn't have any interest in the contents of that clip and the user model can be updated accordingly (cf. plan recognition). As was previously mentioned, items in the user model window are chosen based on their degree of sensitivity. Sensitivity analysis is defined (See Definition 9) as the act of determining how much a presentation wi l l be changed when the user modifies a given set of assumptions in the user model. Given a set of user model items, sensitivity analysis currently provides a quantitative measure of these items, which is used by the client application to present the user model items in a reason-able, sorted order. Other visual clues from the reasoning engine can also be considered. L inking the sensitivity metric with color (for instance) might help to persuade the user to notice and correct faulty assumptions. As an example, the client application might choose to present all assumptions with a high sensitivity in bright red, thereby pro-viding the suggestion of uncertainty or danger. These perceptual salience metrics need to be determined by empirical study. One of the goals of the intent-based authoring paradigm was to save time and effort by reducing the impact of the temporal nature of the video medium. How-ever, viewers must still watch entire presentations in order to gauge their relevance and provide useful feedback to the reasoning engine. If the temporal portions of 169 presentations could be summarized in a non-temporal format, the viewer may be able to form an opinion of the presentation in a more timely manner. One possibil-ity is to construct a graphical representation of the presentation and use a fisheye view (Noik [150]) of the representation in order to highlight the relevant features. If the user is in the process of browsing multiple presentations, the differences be-tween presentations may be candidates for relevance. The current system does not take into account what the user has already seen; there is no notion of "viewing history," but something like it could be added. Users could then be given summaries of clips they had already seen, or shown relevant alternatives instead to avoid any form of repetition and boredom. The system could be pressed into service in at least two different ways. First, after the fashion of information kiosks, Valhalla could be used on-line to provide information at varying levels of detail from overview to in-depth, in accordance with the dynamically evolving model of the information-seeking user. Second, af-ter the fashion of a montage table, Valhalla can support a (human) video editor in the preparation of a video presentation that may be intended for a third party. These usage styles are distinguished for a number of reasons. Information seekers w i l l be limited in the time available, so the system must be real-time. The quality of the presentation is not paramount, as it w i l l be seen only once by a single viewer. Response must be very good, however, i f the typical kiosk user is to be expected to wait for the presentation, let alone watch it. In this usage pattern, the system is modelling the user-viewer. Editors may be wil l ing to wait for the system to search huge video indices and knowledge bases i f the result is better content selection, or more consistent presen-tation style. The effort wi l l pay off with multiple viewings by multiple viewers. In this usage pattern, the system is modelling the editor's model of the eventual group of users. Only the first of these ways of using the system is illustrated in this thesis with 170 a number of realistic usage scenarios. A new Valhalla client is under development at the Department of Computer Science of the University of British Columbia. It is being implemented by Kurt Hoppe, a masters student there, as a CGI-compliant set of H T M L files to be used in conjunction with industry standard World Wide Web browser programs like Netscape and Mosaic. The video server agent has been extended to interact with industry-standard H T T P server programs; in particular, it has been tested with the N C S A 1.3 H T T P daemon, and can deliver digital video to client applications via the daemon. A n alternate reasoner is also under development by other researchers in the department, also for use with the Hyperbrochure. 171 Postscript Content is everywhere. There is a widespread media-induced misconception that a lack of content lies behind current disenchantment with the world wide web, and that the remedy is in the hands of Hollywood conglomerates who wi l l spew out incredible volumes of this content. What is in fact missing is a recognition of the primacy of intent. There are not enough authors in existence to tailor the presen-tation of all that content for individual readers. This is why intent-based authoring is pursued in this dissertation, and why the world wide web, which connects au-thors and readers as never before in human literacy, wi l l one day become a delivery mechanism not for content, but for intent. Bibliography [1] The Role of Intelligent Systems in the National Information Infrastructure. Artificial Intelligence Magazine, 16(3), 1995. [2] Public Law 93-579. Section s.2(b) of the Privacy Act of 1974, 5 United States Code, section s.552a, 1974. [3] Robert M . Akscyn, Donald L. McCracken, and Elise A . Yoder. K M S : A Distributed Hypermedia System for Managing Knowledge in Organi-zations. Communications of the Association for Computing Machinery, 31(7):820-835, 1988. [4] Elisabeth Andre, Robin Cohen, Winfried Graf, Bob Kass, Cecile Paris, and Wolfgang Wahlster. Proceedings of the third international workshop on user modelling. Proceedings D-92-17, Deutsches Forschungszentrum fur Kunstliche Intelligenz, Stuhlsatzenhausweg 3, D-6600 Saarbriicken 11, Germany, August 1992. [5] Elisabeth Andre and Thomas Rist. Towards a plan-based synthe-sis of illustrated documents. Research Report RR -90 -11 , Deutsches Forschungszentrum fur Kunstliche Intelligenz, Stuhlsatzenhausweg 3, D-6600 Saarbriicken 11, Germany, September 1990. [6] Elisabeth Andre and Thomas Rist. The design of illustrated documents as a planning task. In M . Maybury, editor, Intelligent Multimedia Interfaces. A A A I Press, 1993. [7] J. Andre, R. Furuta, and Quint V., editors. The Cambridge Series on Elec-tronic Publishing. Cambridge University Press, 1989. 172 References 173 [8] Douglas E. Appelt and Martha E. Pollack. Weighted abduction for plan as-cription. User Modeling and User-Adapted Interaction, 2(1-2): 1-25,1992. [9] Apple Computer, Inc., 20525 Mariani Ave., Cupertino, C A 95014. Hyper-Card User's Guide, 1987. [10] Yigal Arens and Eduard Hovy. How to describe what? towards a theory of modality utilization. In Cognitive Science Society Meeting, Cambridge, M A , August 1990. [11] Yigal Arens, Eduard Hovy, and Susanne van Mulken. Structure and Rules in Automated Multimedia Presentation Planning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pages 1253-1259, Chambery, France, September 1993. [12] Paul Van Arragon. User modeling bibliography. Technical Report C S - 8 7 -22, Department of Computer Science, University of Waterloo, March 1987. [13] Paul Van Arragon. Nested Default Reasoning for User Modelling. PhD thesis, University of Waterloo, 1990. [14] I. L. Austin. How to do Things with Words. Oxford University Press, 1962. [15] Kent Bach and Robert M . Harnish. Linguistic Communication and Speech Acts. M I T Press, Cambridge, M A , 1979. [16] Afza l Bal l im. ViewFinder: A Framework for Representing, Ascribing and Maintaining Nested Beliefs of Interacting Agents. PhD thesis, l'Universite de Geneve, 1992. [17] Afza l Ba l l im and Yorick Wilks. Beliefs, stereotypes and dynamic agent modelling. User Modelling and User-Adapted Interaction, 1(1):33—65, 1991. [18] Roland Barthes. The death of the author. In Image, Music, Text. H i l l and Wang, New York, NY , 1977. [19] L. Bartram, R. Ovans, I. D i l l , M . Dyck, A . Ho, and W.S. Havens. Con-textual assistance in user interfaces to complex, time-critical systems: The intelligent zoom. In GI '94: Graphics Interface 1994, Banff, A L , Canada, May 1994. References 174 [20] M . Bauer, S. Biundo, D. Dengler, M . Hecking, J. Koehler, and G. Merziger. Integrated plan generation and recognition. Research Report R R - 9 1 -26, Deutsches Forschungszentrum fur Kunstliche Intelligenz, Stuhlsatzen-hausweg 3, D-6600 Saarbriicken 11, Germany, August 1991. [21] Richard J. Beach. Setting tables and illustrations with style. CS 45, Un i -versity of Waterloo Computer Science Department, Waterloo, Canada, May 1985. PhD thesis. [22] Izak Benbasat, Gerardine DeSanctis, and Barrie R. Nault. Empirical re-search in managerial support systems: A review and assessment, June 1991. [23] Izak Benbasat, Albert S. Dexter, and Peter Todd. The influence of color and graphical information presentation in a managerial decision simulation. Human-Computer Interaction, 2 :65-92, 1986. [24] Jacques Bertin. La Graphique et le Traitement Graphique de ITnformation. Flammarion, 1977. [25] Jacques Bertin. Graphics and Information Processing. Walter de Gruyter, 1981. [26] Jacques Bertin. Semiology of Graphics: diagrams, networks, maps. U n i -versity of Wisconsin Press, 1983. [27] Sara Bly. Shared workspaces: A look at supporting distributed workgroups. C ICSR invited lecture, September 1991. [28] Sara A . B l y and Scott L. Minneman. Commune: A shared drawing sur-face. In Proceedings of the Conference on Office Information Systems, pages 184-192, Boston, M A , Apr i l 1990. [29] Kellogg S. Booth and W. Morven Gentleman. A symbiotic approach to v i -sualization and the user interface. Unpublished, 1990. [30] G. Brajnik and C. Tasso. A flexible tool for developing user modeling ap-plications with non-monotonic reasoning capabilities. In Proceedings of the Third International Workshop on User Modelling, pages 4 2 - 6 6 , August 1992. References 175 [31] Giorgio Brajnik, Carlo Tasso, and Antonio Vaccher. UMT User Manual -Version 1.0, September 1992. Universita degli Studi di Udine, Laboratorio di Intelligenza Artificiale, Dipartimento di Matematica e Informatica, V ia Zanon, 6, 33100 Udine, ITALY, 1992. [32] Gerhard Brewka. Preferred subtheories: A n extended logical framework for logical reasoning. IJCAI, 1989. [33] Vannevar Bush. Pieces of the Action. Wi l l iam Morrow and Company, 1970. [34] W. Buxton and T. Moran. EuroPARC's Integrated Interactive Intermedia Facility (IIIF): Early Experiences. In S. Gibbs and A . A . Verrijn-Stuart, ed-itors, Multi-user Interfaces and Applications, pages 11-34. 1990. [35] Sandra Carberry. Model l ing the user's plans and goals. Computational Lin-guistics, 14(3):23, September 1988. [36] Sandra Carberry. Plan Recognition in Natural Language Dialogue. M I T Press (A Bradford Book), 1990. [37] T.A. Cargil l . A view of source text for diversely configurable software. Technical Report CS -79-28, University of Waterloo Computer Science De-partment, 1979. [38] Stephen M . Casner. A task-analytic approach to the automated design of graphic presentations. Association for Computing Machinery Transactions on Graphics, 10(2), Apr i l 1991. [39] Timothy Catlin, Paulette Bush, and Nicole Yankelovich. Internote: Extend-ing a hypermedia framework to support annotative collaboration. In Hyper-text '89 Proceedings, pages 365-378, 1989. [40] P. Cheeseman. On finding the most probable model. In J. Shranger and P. Langley, editors, Computational Models of Scientific Discovery and The-ory Formation, chapter 3, pages 73 -95 . Morgan Kaufmann, San Mateo, 1990. [41] P.P. Chen. The entity-relationship model: Toward a unified view of data. ACM Transactions on Database Systems, 1(1):9—36, 1976. References 176 [42] Mourad Cherfaoui and Christian Bertin. Video Documents : Towards Auto-matic Summaries. In Workshop Proceedings of IEEE Visual Processing and Communications, pages 295-298, Melbourne, Australia, September 1993. [43] H. Chernoff. The use of faces to represent points in k-dimensional space graphically. Journal of the American Statistical Association, 68:361-368, 1973. [44] David N. Chin. K N O M E : Modeling What the User Knows in U C . In A . Kobsa and W. Wahlster, editors, User Models in Dialog Systems, pages 74-107. Springer-Verlag, 1989. [45] David Ngi Chin. Intelligent Agents as a Basis for Natural Language Inter-faces. PhD thesis, Computer Science Division (EECS) , University of Ca l i -fornia, Berkeley, C A 94720, jan 1988. [46] Wi l l iam S. Cleveland. The Elements of Graphing Data. Wadsworth and Brooks/Cole Advanced Books and Software, Pacific Grove, C A , 1985. [47] Wi l l iam S. Cleveland. A model for graphical perception, June 1990. Per-sonal correspondence. [48] Wi l l iam S. Cleveland and Robert M c G i l l . Graphical perception: theory, experimentation and application to the development of graphical methods. American Statistical Association, 79:531-554, September 1984. [49] E.F. Codd. A relational model of data for large shared data banks. Com-munications of the Association for Computing Machinery, 13(6):377-387, June 1970. [50] Robin Cohen and Marlene Jones. Incorporating User Models Into Educa-tional Diagnosis. Technical Report C S - 8 6 - 3 7 , Faculty of Mathematics, University of Waterloo, Waterloo, Ontario, September 1986. Also pub-lished as Chapter 11 of User Models in Dialog Systems, edited by Kobsa and Wahlster (Springer-Verlag, 1989). [51] Jeff Conklin. Hypertext: A n introduction and survey. In Irene Greif, editor, Computer Supported Cooperative Work: A Book of Readings, pages 4 2 3 -475. Morgan Kaufman, 1988. References 111 [52] R. Cook and J. Kay. The Justified User Model . In Proceedings of the Fourth International Conference on User Modelling, pages 145-150, Hyan-nis, M A , August 1994. [53] A . Julian Craddock. Induction and the Reference Class Problem. PhD the-sis, University of British Columbia, 1993. [54] A . Csinger, H. da Costa, and B. Forghani. A general-purpose programmable command decoder. In IEEE Proceedings, Conference Compint, pages 139— 41, November 1987. [55] Andrew Csinger. From utterance to belief via presupposition: Default reasoning in user-modelling. In S. Ramani, R. Chandrasekar, and K.S.R Anjaneyulu, editors, Lecture Notes in Artificial Intelligence, number 444, pages 407^117. Springer-Verlag, 1989. [56] Andrew Csinger. Implementing a theory of communications in a default reasoning framework. TR 91-30, Department of Computer Science, 1991. [57] Andrew Csinger. The Psychology of Visualization. Technical Report 28, Department of Computer Science, November 1992. [58] Andrew Csinger. OpenMed: Open Systems for Secure Health Care Infor-mation Transaction. A Joint C A N A R I E / T A D Proposal submitted by Cy -berstore Systems Inc., C H A R A Health Care Society, Lions Gate Hospital, Mission Memorial Hospital, InterSpect Systems Consulting Corp., Network Systems Group Inc., Health Informatics Research Group of U B C , Septem-ber 1995. [59] Andrew Csinger and Kellogg S. Booth. Reasoning about Video: Knowledge-based Transcription and Presentation. In Jay F. Nunamaker and Ralph H. Sprague, editors, 27th Annual Hawaii International Conference on System Sciences, volume III: Information Systems: Decision Support and Knowledge-based Systems, pages 599-608, Mau i , HI, January 1994. [60] Andrew Csinger and David Poole. From utterance to belief via presuppo-sition: Default reasoning in user-modelling. In Proceedings of the Confer-ence for Knowledge Based Computing Systems, KBCS-89, pages 408-419, Bombay, India, December 1989. Also appeared in Springer-Verlag Lecture Notes in Artificial Intelligence, number 444, pages 407-417. References 178 [61] Andrew Csinger and David Poole. Hypothetically Speaking: Default Rea-soning and Discourse Structure. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pages 1179-1184, Chambery, France, September 1993. [62] Glorianna Davenport, Ryan Evans, and Mark Halliday. Orchestrating D i g -ital Micromovies. Leonardo, 26(4):283-288, 1993. [63] Glorianna Davenport, Thomas Aguierre Smith, and Natalio Pincever. C i n -ematic Primitives for Multimedia. IEEE Computer Graphics and Applica-tions, pages 61-14, July 1991. [64] P. Edwards, editor. The Encyclopaedia of Philosophy. Macmil lan and The Free Press, 1967. [65] Douglas C. Engelbart. Knowledge-domain interoperability and an open hyperdocument system. In Proceedings of the Conference on Computer-Supported Cooperative Work, October 7-10, 1990 Los Angeles, CA, pages 143-156, 1990. [66] James T. Enns. The promise of finding effective geometric codes. Confer-ence Presentation, SIGGRAPH'90, 1990. [67] David W. Etherington. Formalizing non-monotonic reasoning systems. Technical Report 1, University of British Columbia, Vancouver, Canada, V 6 T 1W5, 1983. [68] Ryan George Evans. LogBoy Meets Fi l terGir l : A Toolkit for Multivariant Movies. Master's thesis, MIT, February 1994. [69] Steven Feiner. Apex: A n experiment in the automated creation of pictorial explanations. IEEE Computer Graphics and Applications, 5(11), November 1985. [70] Steven Feiner. Research issues in generating graphical explanations. In Proceedings Graphics Interface '85, pages 117-122, 1985. [71] Steven Feiner and Clifford Beshers. Worlds within worlds: Metaphors for exploring n-dimensional worlds. In Proceedings Symposium on User Inter-face Software and Technology, Snowbird, UT, October 1990. References 179 [72] Steven Feiner, Sandor Nagy, and Andries van Dam. A n experimental sys-tem for creating and presenting interactive graphical documents. Associa-tion/or Computing Machinery Transactions on Graphics, 1(1):59—77, Jan-uary 1982. [73] Steven K. Feiner and Kathleen R. McKeown. Coordinating text and graphics in explanation generation. In Proceedings AAAI, pages AA2-AA9, Boston, M A , July 1990. [74] J. J. Finger and M . R. Genesereth. Residue: A Deductive Approach to De-sign Synthesis. Technical Report STAN-CS -85 -1035, Department of C o m -puter Science, Stanford University, Stanford, Cal . , 1985. [75] T.W. Finin. G U M S : A General User Modell ing Shell. In A . Kobsa and W. Wahlster, editors, User Models in Dialog Systems, pages 411^ -30. Springer, 1989. [76] Gerhard Fischer. Shared knowledge in cooperative problem-solving sys-tems: Integrating adaptive and adaptable systems. In Proceedings of the Third International Workshop on User Modelling, pages 148-161, Dagstuhl, Germany, August 1992. [77] Gerhard Fischer, Raymond M c C a l l , and Anders Morch. Janus: Integrating hypertext with a knowledge-based design environment. In Hypertext '89 Proceedings, pages 105-117, Pittsburgh, PA, November 1989. [78] Edward A . Fox. The coming revolution in interactive digital video. Com-munications of the Association for Computing Machinery, 32(7):794-801, July 1989. [79] Mark Friedell. Context-sensitive, graphic presentation of information. ACM Computer Graphics, 16(3): 181-188, July 1982. [80] Mark Friedell. Automatic synthesis of graphical object descriptions. ACM Computer Graphics, 18(3):53-62, July 1984. [81] Hector Geffner. Default reasoning, minimality and coherence. In KR89, page 137, Toronto, Canada, May 1989. References 180 [82] R ick i Goldman-Segall. Thick Descriptions: A Tool for Designing Ethno-graphic Interactive Videodiscs. SIGCHI Bulletin, 21(2), 1989. [83] R ick i Goldman-Segall. A Multimedia Research Tool for Ethnographic In-vestigation. In I. Harel and S. Papert, editors, Constructionism. Ablex Pub-lishing Corporation, Norwood, NJ , 1991. [84] Robert C. Goldstein, V. Srinivasan Rao, and Andrew W. Trice. Meeting-Place: The Customizable Group Support Environment. Technical Report 93-MIS-006, Faculty of Commerce and Business Administration Univer-sity of British Columbia, 1993. [85] Bradley A . Goodman. Multimedia Explanations for Intelligent Training Systems. In Mark T. Maybury, editor, Intelligent Multimedia Interfaces, chapter 7, pages 148-171. A A A I Press - M I T Press, 1993. [86] Bradley A . Goodman and Diane J. Litman. On the interaction between plan recognition and intelligent interfaces. User Modeling and User-Adapted In-teraction, 2 ( l -2 ) :83-115, 1992. [87] Irene Greif, editor. Computer Supported Cooperative Work. M o r g a n -Kaufmann, 1988. [88] Steven Gribble, Andrew Csinger, and Kellogg S. Booth. A Distributed M u l -timedia Architecture for Intent-based Video Authoring and Presentation. In Proceedings of MultiComm'94, Vancouver, Canada, November 1994. [89] H.P. Grice. Logic and conversation. In P. Cole and J.L. Morgan, editors, Syntax and Semantics: Speech Acts, vol 3, pages 4 7 - 5 8 . Academic Press, New York, 1975. [90] Georges Grinstein, Ronald Pickett, and Marian G. Wil l iams. Exvis: A n ex-ploratory visualization environment. In Graphics Interface '89, London, Canada, 1989. [91] B.J. Grosz and C.L. Sidner. Attention, Intentions, and the Structure of Dis -course. Computational Linguistics, 12(3): 175-204, 1986. [92] Bernard J. Haan, Paul Kahn, Victor A . Riley, James H. Coombs, and Nor-man K. Meyrowitz. Hypermedia services. Communications of the Associ-ation for Computing Machinery, 35(1), January 1992. References 181 [93] Frank G. Halasz. Reflections on notecards: Seven issues for the next gener-ation of hypermedia systems. Communications of the Association for Com-puting Machinery, 31(7):836-852, 1988. [94] Lynda Hardman, Guido van Rossum, and Dick C. A . Bulterman. Structured Multimedia Authoring. In Proceedings ACM Multimedia 93, pages 2 8 3 -289, August 1993. [95] Beverly L. Harrison and Ronald M . Baecker. Designing Video Annotation and Analysis Systems. In Graphics Interface '92 Proceedings, pages 157— 166, Vancouver, B C , May 1992. [96] Rich Helms. Distributed Knowledge Worker (DKW) : A Personal Confer-encing System. In Proceedings of the 1991 CAS Conference, pages 115— 125, Toronto, Canada, October 1991. I B M Canada, Ltd. Laboratories, Cen-ter for Advanced Studies. [97] Tyson R. Henry and Scott E. Hudson. Interactive graph layout. In Proceed-ings User Interface Software and Technology '91. A C M Press, 1991. [98] Xueming Huang, Gordon I. McCal la , and J im E. Greer. Student model revi-sion: Evolution and revolution. In Proceedings of the Eighth Biennial Con-ference of the Canadian Society for Computational Studies of Intelligence, pages 98 -105, 1990. [99] Hiroshi Ishii. Teamworkstation: Towards a seamless shared workspace. In Proceedings of the Conference on Computer-Supported Cooperative Work, October 7-10, 1990 Los Angeles, CA, pages 13-26, 1990. [100] Hiroshi Ishii and Naomi Miyake. Toward an open shared workspace: C o m -puter and video fusion approach of teamworkstation. Communications of the Association for Computing Machinery, 34(12): 37 -50 , December 1991. [101] P. N. Johnson-Laird. Mental models of meaning. In Joshi, Webber, and Sag, editors, Elements of Discourse Structure. Cambridge University Press, 1981. [102] Phillippe Joly and Mourad Cherfaoui. Survey of automatical tools for the content analysis of video. IRIT 93-36-R, Bibliotheque de 1TRIT, U P S , 118 route de Narbonne, 31062 T O U L O U S E C E D E X , 1993. Also available References 182 by anonymous F T P from ftp.irit.fr in PostScript, ascii and M S Word for-mats as private/svideo.[ps,as,wd], or by email direct from the authors (cher-faoui@ccett.fr orjoly@irit.fr). [ 103] Marlene Jones and David Poole. A n expert system for educational diagnosis based on default logic. In Proceedings Expert Systems and their Applica-tions, pages 673-683, 1985. [104] Wi l l iam P. Jones. How do we distinguish the hyper from the hype in non-linear text? In INTERACT87, pages 1107-1113, 1987. [105] Peter Karp and Steven Feiner. Issues in the automated generation of ani-mated presentations. In Proceedings Graphics Interface, pages 39 -48 , Ha l -ifax, May 1990. [106] Peter Karp and Steven Feiner. Automated Presentation Planning of A n i -mation Using Task Decomposition with Heuristic Planning. In Graphics Interface '93, pages 118-127, Toronto, Canada, May 1993. [107] Robert Kass and T im Finin. Model l ing the user in natural language systems. Computational Linguistics, 14(3):5, September 1988. [108] Robert Kass and Irene Stadnyk. Using user models to improve organiza-tional communication. In Proceedings of the Third International Workshop on User Modelling, pages 135-147, Dagstuhl, Germany, August 1992. [109] Henry A . Kautz. A formal theory of plan recognition. Technical Report 215, Dept. of Comp. Sci . , University of Rochester, Rochester, N Y 14627, 1987. [110] A . Kay and A . Goldberg. Personal dynamic media. IEEE Computer, 10(3):31^12, March 1977. [ I l l ] J. Kay. um: A toolkit for user modelling. In Proc. of the Second Interna-tional Workshop on User Modelling, pages 1-11, Honolulu HI, 1990. [112] D. Knuth. The TeX Book. Addison-Wesley, Reading, M A , 1984. [113] A . Kobsa. Modeling the user's conceptual knowledge in bgp-ms, a user modeling shell system. Computational Intelligence, 6:193-208, 1990. References 183 [114] A . Kobsa. Towards inferences in bgp-ms: Combining modal logic and parti-tion hierarchies for user modeling. In Proceedings of the Third International Workshop on User Modelling, Dagstuhl, Germany, 1992. [115] Alfred Kobsa. User modelling: Recent work, prospects and hazards. In Proceedings of the Workshop on User Adapted Interaction, Bar i , Italy, May 1992. Also available as a June 1992 Technical Report from Universitat K o n -stanz Informationswissenschaft. [116] Alfred Kobsa and Wolfgang Wahlster, editors. User Models in Dialog Sys-tems. Springer-Verlag, 1989. [117] Kurt Konolige. A computational theory of belief introspection. In IJCAI85, pages 502-508, 1985. [118] Robert R. Korfhage. Intelligent information retrieval: Issues in user mod-elling. Technical Report 8 5 - C S E - 9 , Dept. of Computer Science and Engi -neering, Southern Methodist University, Dallas, Texas, May 1985. [119] C.W. Krueger. Models of reuse in software engineering. Technical Report C M U - C S - 8 9 - 1 8 8 , Carnegie Mel lon U., December 1989. [120] David Kurlander and Steven Feiner. A visual language for browsing, un-doing, and redoing graphical interface commands. In S. K. Chang, editor, Visual Languages and Visual Programming, pages 257-275. Plenum Press, New York, 1990. [121] L. Lamport. A Document Preparation System. User's Guide And Reference Manual. Addison-Wesley, Reading, M A , 1986. [122] Brenda Laurel. Issues in multimedia interface design: Media integration and interface agents. In CHI'90 Proceedings, pages 133-139, 1990. [123] Brenda Laurel. Computers as Theatre. Addison-Wesley, 1991. [124] JintaeLee. Sibyl : A tool for managing group decision rationale. In Proceed-ings of the Conference on Computer-Supported Cooperative Work, October 7-10, 1990 Los Angeles, CA, pages 7 9 - 9 2 , 1990. References 184 [125] Francis J. L i m and Izak Benbasat. A communication-based framework for group interfaces in computer-supported collaboration. In 24th Hawaii Con-ference on System Sciences, Kauai, Hawaii , January 1991. [126] Lai-Huat L i m and Izak Benbasat. The effects of group support system on group meeting process and outcomes: A meta-analysis. U B C Working Pa-per 91 -MIS -020, 1991. [127] W.E. Mac Kay and D.G. Tatar. Special issue on video as a research and de-sign tool. ACM SIGCHI Bulletin, 21(2), October 1989. [128] Wendy E. Mackay and Glorianna Davenport. Virtual video editing in in -teractive multimedia applications. Communications of the Association for Computing Machinery, 32(7):802-810, July 1989. [129] Jock D. Mackinlay. Automatic Design of Graphical Presentations. Tech-nical Report STAN-CS-86-1138, Stanford University Department of C o m -puter Science, December 1986. [130] Jock D. Mackinlay. Automating the Design of Graphical Presentations of Relational Information. Association for Computing Machinery Transac-tions on Graphics, 5(2): 110—141, Apr i l 1986. [131] Mitchel l P. Marcus. A Theory of Syntactic Recognition for Natural Lan-guage. The MIT Press, 1980. [132] Joseph Marks and Ehud Reiter. Avoiding unwanted conversational impl i -catures in text and graphics. In AAAI-90, pages 450-456, Boston, M A , July 1990. [133] Catherine C. Marshall and Peggy M . Irish. Guided tours and on-line pre-sentations: How authors make existing hypertext intelligible for readers. In Hypertext '89 Proceedings, pages 15-26, November 1989. [134] Christian Metz. Film Language: A Semiotics of the Cinema. Oxford Un i -versity Press, 1974. Translated by Michael Taylor. [135] Scott Minneman and Sara A . Bly. Managing a trois: a study of a multi -user drawing tool in distributed design work. In CHF91 Proceedings, New Orleans, L A , 1991. References 185 [136] Fanya S. Montalvo. Knowledge visualizer: A graphic interface-building tool kit. Technical report, M IT Media Laboratory, 20 Ames Street, Cam-bridge M A 02139, September 1988. [137] Fanya S. Montalvo. Diagram understanding: The symbolic descriptions behind the scenes. In Tadao Ichikawa, Erland Jungert, and Robert R. Korfhage, editors, visual Languages and Applications. Plenum Publishing Corporation, 1990. [138] J.D. Moore and C.L. Paris. Planning text for advisory dialogues. In Pro-ceedings of the 27th Annual Meeting of the Association for Computational Linguistics, pages 203-211, 1989. [139] Johanna D. Moore and Cecile Paris. Exploiting User Feedback to Compen-sate for the Unreliability of User Models. User Modeling and User-Adapted Interaction, 2(4):287-330, 1992. . [140] Gerald M . Murch. Color graphics - blessing or ballyhoo? In Ronald M . Baecker and Wi l l iam A . S. Buxton, editors, Readings in Human-computer Interaction: A Multidisciplinary Approach, pages 333-341. Morgan Kauf-mann, 1985. [141] Sung Myaeng and Robert Korfhage. Towards an intelligent and personal-ized information retrieval system. Technical Report 8 6 - C S E - 1 0 , Dept. of Computer Science and Engineering, Southern Methodist University, Dallas, Texas, March 1986. [142] Nicholas Negroponte. Soft Architecture Machines. The M I T Press, Cam-bridge, Mass., 1975. [143] Daniel Neiman. Graphical animation from knowledge. In AAA! '82, pages 373-376, 1982. [144] U. Neisser. Decision time without reaction time: Experiments in visual scanning. American Journal of Psychology, 76:376-385, 1963. [145] Ted Nelson. A file structure for the complex, the changing, and the inde-terminate. In Proceedings of the ACM National Conference, pages 84-100, 1965. References 186 [146] Steven R. Newcomb, Nei l l A . K ipp , and Victoria T. Newcomb. The "Hy-Time" Hypermedia/Time-based Document Structuring Language. Com-munications of the Association for Computing Machinery, 3 4 ( l l ) : 6 7 - 8 3 , November 1991. [147] Jakob Nielsen. Hypertext and Hypermedia. Academic Press, Inc, San Diego, C A , 1989. [148] E.G. Noik. Automating the generation of interactive applications. Technical Report 90-21, U. of British Columbia, June 1990. Master's Thesis. [149] E.G. Noik. Reducing the validation task by adding conformance at the i m -plementation level. In 24th Annual Hawaiian Intl. Conf. on System Sci-ences, Kauai, HI, January 1991. [150] Emanuel G. Noik. Layout-independent fisheye views of nested graphs. In Proceedings IEEE/CS Symposium on Visual Languages, Bergen, Norway, August 1993. [151] Donald A . Norman. The Psychology of Everyday Things. New York : Basic Books, 1988. [152] David Novitz. Pictures and their Use in Communication. Martinus Nijhoff, The Hague, 1977. [153] Margrethe H. Olson and Sara A . Bly. The portland experience: a report on a distributed research group. International Journal of Man-Machine Studies, 34:211-228, 1991. [ 154] Jon Orwant. Apprising the User of User Models: Doppelganger's Interface. In Proceedings of the Fourth International Conference on User Modelling, pages 151-156, Hyannis, M A , August 1994. [155] Jon Orwant. Heterogeneous Learning in the Doppelganger User Model -ing System. User Modeling and User-Adapted Interaction, 4(2): 107-130, 1995. [156] Ronald Pickett. Integrated displays of multispectral imagery at air force geophysics laboratory. Draft, March 1991. References 187 [157] Ronald Pickett and Georges Grinstein. Iconographic displays for visualiz-ing multidimensional data. In Proceedings of the 1988 IEEE Conference on Systems, Man and Cybernetics, pages 514-519, Beij ing and Shenyang, People's Republic of China, 1988. [158] David Poole. A logical framework for default reasoning. Artificial Intelli-gence, 36( l ) :27-47, 1987. [159] David Poole. Explanation and Prediction: an Architecture for Default and Abductive Reasoning. Computational Intelligence, 5(2):97-110, 1989. [ 160] David Poole. Normality and faults in logic-based diagnosis. In IJCAI, pages 1304-1310, Detroit, M I , August 1989. [161] David Poole. Probabilistic Horn Abduction and Bayesian Networks. Arti-ficial Intelligence, 64:81-129, 1993. [ 162] David Poole, Randy Goebel, and Romas Aleliunas. A logical framework for default reasoning. In The Knowledge Frontier: Essays in the Representation of Knowledge, pages 331-352. Springer Verlag, New York, NY , 1987. [163] Bhavani Raskutti and Ingrid Zukerman. Query and Response Generation during Information- Seeking Interactions. In Proceedings of the Fourth International Conference on User Modelling, pages 2 5 -3 0 , Hyannis, M A , August 1994. [164] Susan E. Rathie. SAVANT : Video Annotation Support for Meeting M e m -ory. Master's thesis, University of British Columbia, 1995. [165] Brian K. Reid. Scribe: A Document Specification Language and its Com-piler. PhD thesis, Carnegie-Mellon University, 1980. Also issued as Tech-nical Report C M U - C S - 8 1 - 1 0 0 . [166] Brian K. Reid. Electronic M a i l of Structured Documents: Representation, transmission, and archiving. In J. Andre, R. Furuta, and V. Quint, editors, Structured Documents. Cambridge University Press, 1989. [167] R. Reiter. A logic for default reasoning. Artificial Intelligence, 13(1,2):81-132, 1980. References 188 [168] Ronald A . Rensink and Gregory Provan. The analysis of resource-limited vision systems. In Proceedings of the 13th Annual Conference of the Cog-nitive Science Society, pages 311-316, 1991. [169] Elaine Rich. User modelling via stereotypes. Cognitive Science, 3:329-354, 1979. [170] Elaine Rich. Users are individuals: Individualizing user models. Interna-tional Journal of Man-Machine Studies, 18:199-214, 1983. [171] Elaine Rich. Stereotypes and User Modeling. In A . Kobsa and W. Wahlster, editors, User Models in Dialog Systems, pages 3 5 - 5 1 . Springer, 1989. [172] Elaine Rich. Artificial Intelligence. McGraw H i l l , 1991. [173] Thomas Rist and Elisabeth Andre. Incorporating graphics design and real-ization into the multimodal presentation system wip. In Advanced Visual Interfaces 92, Rome, 1992. [174] Richard Rosenberg. The Social Impact of Computers. Academic Press, 1992. [175] Steven F. Roth and Joe Mattis. Data Characterization for Intelligent Graph-ics Presentation. In CHI'90 Proceedings, pages 193-200, Seattle, W A , Apr i l 1990. [176] Stuart J. Russell and Peter Norvig. Artificial Intelligence: A Modern Ap-proach. Prentice Hal l , Englewood Cliffs, New Jersey 07632, 1995. [177] Sunil Sarin and Irene Greif. Computer-based real-time conferencing sys-tems. IEEE Computer, 18(10):33-45, October 1985. [178] Margaret H. Sarner and Sandra Carberry. Tailoring definitions using a mul -tifaceted user model. In Proceedings of the Eighth Biennial Conference of the Canadian Society for Computational Studies of Intelligence, pages 106-113, 1990. [179] Leonard J. Savage. The Foundations of Statistics. Dover Publications, Inc., New York, 1972. References 189 [180] John L. Schnase and John H. Leggett. Computational hypertext in biolog-ical modelling. In Hypertext '89 Proceedings, pages 181-197, Pittsburgh, PA, November 1989. [181] John R. Searle. A Taxonomy of Elocutionary Acts. In Expression and Meaning, pages 1-29. Cambridge University Press, London, 1979. [182] John R. Searle and Daniel Vanderveken. Foundations of Illocutionary Logic. Cambridge University Press, 1985. [183] Doree Duncan Seligmann and Steven Feiner. Automated generation of intent-based 3d illustrations. Computer Graphics, 25(4): 123-132, July 1991. Proceedings of S I G G R A P H '91 (Las Vegas, Nevada, lu ly 28-August 2, 1991). [184] David Sewell. The Usenet oracle: Virtual authors and network commu-nity, December 1992. To subscribe, send S U B E J R N L your name to L ISTSERV@ALBANY .b i tne t ; to get contents or abstracts of previous is-sues send: G E T E J R N L C O N T E N T S ; to get Volume 1, Number 1, send G E T E J R N L V 1 N 1 . EJournal is an all-electronic, Matrix distributed, peer-reviewed, academic periodical. [185] Stuart Smith, R. Daniel Bergeron, and Georges G. Grinstein. Stereophonic and surface sound generation for exploratory data analysis. In CHF90 Pro-ceedings, pages 125-132, Apr i l 1990. [186] Oliviero Stock. Natural Lnaguage and Exploration of an Information Space: the ALFresco Interactive System. In Proceedings of the 12th International Joint Conference on Artificial Intelligence, 1991. [187] Oliviero Stock. ALFRESCO: Enjoying the Combination of Natural Lan -guage Processing and Hypermedia for Information Exploration. In Mark T. Maybury, editor, Intelligent Multimedia Interfaces, pages 197-224. A A A I Press/MIT Press, 1993. [188] L. Suchman and R. Trigg. Understanding Practice: Video as a Medium for Reflection and Design. In Greenbaum and Kyng , editors, Design at Work: Cooperative Design of Computer Systems, pages 210-213. 1991. References 190 [ 189] Markus A . Thies and Frank Berger. Intelligent user support in graphical user interfaces: Plan-based graphical help in object-oriented user-interfaces. Technical report, Deutsches Forschungszentrum fur Kiinstliche Intelligenz, Stuhlsatzenhausweg 3, D-6600 Saarbriicken 11, Germany, March 1992. [190] Anne Treisman. Properties, parts, and objects. In Boff, Kaufmann, and Thomas, editors, Handbook of Perception, volume II, pages 3 5 - 1 to 35 -70. 1986. [191] Anne Treisman. Features and objects in visual processing. I n l r v i n R o c k , editor, The Perceptual World: Readings from Scientific American. W. H. Freeman and Company, New York, 1990. Originally November, 1986 Issue of Scientific American. [192] Anne Treisman, Patrick Cavanaugh, Burkhart Fischer, V.S. Ramachandran, and Rudiger von der Heydt. Form perception and attention: Striate cortex and beyond. In Lothar Spillmann and John S. Werner, editors, Visual Per-ception: The Neurophysiological Foundations, pages 273-316. Academic Press, 1990. [193] Anne Treisman and Stephen Gormican. Feature analysis in early vision: Ev -idence from search asymmetries. Psychological Review, 95 :15^ -8 , 1988. [194] Wolfgang Wahlster, Elisabeth Andre, Som Bandyopadhyay, Winfried Graf, and Thomas Rist. WIP: The Coordinated Generation of Mult imodal Pre-sentations from a Common Representation. Research Report R R - 9 1 -08, Deutsches Forschungszentrum fiir Kiinstliche Intelligenz, Stuhlsatzen-hausweg 3, D-6600 Saarbriicken 11, Germany, February 1991. [195] Wolfgang Wahlster, Elisabeth Andre, Wolfgang Finkler, Hans-Jiirgen Prof-it l ich, and Thomas Rist. Plan-Based Integration of Natural Language and Graphics Generation. Artificial Intelligence, 63(1-2) :387-427, October 1993. Also available from D F K I as a technical report. . [196] Wolfgang Wahlster and Alfred Kobsa. User Models in Dialog Systems. Springer-Verlag, 1990. [197] Col in Ware and John C. Beatty. Using color as a tool in discrete data anal-ysis. CS 21, University of Waterloo Computer Science Department, Water-loo, Canada, August 1985. References 191 [198] Etienne Wenger. Artificial Intelligence and Tutoring Systems: Compu-tational and Cognitive Approaches to the Communication of Knowledge. Morgan Kaufmann, Inc., 1987. [199] Robert Wilensky, David N. Chin, Marc Luria, James Martin, James May -field, and Dekai Wu. The berkeley unix consultant project. Computational Linguistics, 14(4):35-84, December 1988. [200] Dekai Wu. Active Acquisition of User Models: Implications for Decision-Theoretic Dialog Planning and Plan Recognition. User Modeling and User-Adapted Interaction, 1:149-172, 1991. [201] Yang Xiang, Michael P. Beddoes, and David Poole. Sequential Updating Conditional Probability in Bayesian Networks by Posterior Probability. In Proceedings of the Eighth Biennial Conference of the Canadian Society for Computational Studies of Intelligence, pages 21 -27 , 1990. [202] Nicole Yankelovich, Norman Meyrowitz, and Andries van Dam. Reading and writing the electronic book. IEEE Computer, 18(10): 15-30, October 1985. [203] Elise Yoder, Robert Akscyn, and Donald McCracken. Collaboration in kms, a shared hypermedia system. In CHI'89 Proceedings, pages 3 7 - 4 2 , May 1989. [204] Richard M . Young, T.R.G. Green, and Tony Simon. Programmable user models for predictive evaluation of interface designs. In CHI'89 Proceed-ings, pages 15-19, May 1989. [205] Frank Zdybel, Norton R. Greenfeld, Martin D. Yonke, and Jeff Gibbons. A n information presentation system. In IJCAI '81, pages 978-984, 1981. [206] Polle T. Zellweger. Scripted documents: A hypermedia path mechanism. In Hypertext '89 Proceedings, pages 1-14, November 1989. [207] Ingrid Zukerman. Content planning based on a model of a user's beliefs and inferences. In Proceedings of the Third International Workshop on User Modelling, pages 162-173, Dagstuhl, Germany, August 1992. 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items