UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

PyRemote : object mobility in the Python programming language Häggholm, Petter 2007

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_2007-0422.pdf [ 2.19MB ]
JSON: 831-1.0052061.json
JSON-LD: 831-1.0052061-ld.json
RDF/XML (Pretty): 831-1.0052061-rdf.xml
RDF/JSON: 831-1.0052061-rdf.json
Turtle: 831-1.0052061-turtle.txt
N-Triples: 831-1.0052061-rdf-ntriples.txt
Original Record: 831-1.0052061-source.json
Full Text

Full Text

PyRemote: Object mobility in the Python programming language by Petter Haggholm B.Sc, Bishop's University, 2005  A THESIS SUBMITTED  IN P A R T I A L F U L F I L L M E N T O F  THE REQUIREMENTS  FOR T H E DEGREE OF  Master of Science in T H E F A C U L T Y O F G R A D U A T E STUDIES (Computer Science)  T H E U N I V E R S I T Y OF BRITISH C O L U M B I A August 2007 © Petter Haggholm, 2007  Abstract The current trend in computation is one of concurrency and multiprocessors. Large supercomputers have long been eclipsed in popularity by cheaper clusters of small computers.  In recent years, desktop processors w i t h multiple cores have become  commonplace.  A plethora of languages, tools, and techniques for dealing w i t h  concurrency already exist. Where concurrency and multiprocessors meet modern, highly dynamic languages, however, is uncharted territory. Traditional distributed systems, however complex, tend to be simplified by assumptions of type consistency. Even in systems where types and not merely i n stances and primitive objects can be serialised and distributed, it is usually the case that such types are assumed to be static. The P y t h o n programming language, as an example of a modern language w i t h highly dynamic types, presents novel challenges, in that classes may be altered at runtime, both through the addition, removal, or modification of attributes such as member variables and methods, and through modifications to the type's inheritance hierarchy. T h i s thesis presents a system called P y R e m o t e which aims to explore some of the issues surrounding type semantics in this environment.  ii  Contents  Abstract  ii  Contents  iii  L i s t of Tables  vi  Listings  v u  Acknowledgments  viii  Dedication  Chapter 1 1.1  2.1  2.2  ix  Introduction  1  S e c o n d a r y objective"  Chapter 2  *  2  Related work  Distributed  4  programming languages '  4  2.1.1  Emerald  4  2.1.2  Distributed Smalltalk  5  2.1.3  Erlang  6  2.1.4  RTSync  6  D i s t r i b u t e d t o o l k i t s for P y t h o n  7  iii  2.2.1  PYRO  7  2.2.2  RPyC  9  2.2.3  PyRCall  10  2.2.4  Thoughts on distributed toolkits  10  Chapter 3  3.1  3.2  Python objects  4.2  12  12  3.1.1  PyObject  12  3.1.2  PyTypeObject  14  Name resolution  16  3.2.1  Method resolution order  16  Object mobility design  18  Chapter 4  4.1  The C P y t h o n implementation  PyRemote in action  19  4.1.1  Example 1: Remote calls  19  4.1.2  Example 2: Object mobility  21  4.1.3  Example 3: Object consistency  21  Copying objects  22  4.2.1  23  Immutable objects  4.3  Moving objects  24  4.4  Proxies  25  4.4.1  Client-side proxies  26  4.4.2  Server-side object stubs  28  4.4.3  Method lookup and proxies  30  4.5  Augmentation of Python objects  30  4.6  Type and object consistency  31  iv  4.6.1  Chapter 5 5.1  5.2  5.3  6.2  32  Results a n d evaluation  34  D e s i g n c h o i c e s a n d tradeoffs  35  5.1.1  I m m u t a b l e types  35  5.1.2  P r o x y consistency policies  35  5.1.3  M o v i n g objects  36  5.1.4  GUIDs  37  Local, performance  38  5.2.1  PyBench  39  5.2.2  Pyrex  40  Distributed performance  Chapter 6 6.1  R e a l object, c o n s i s t e n c y  41  Conclusion and discussion  44  T h e c h a l l e n g e of C P y t h o n  44  6.1.1  T y p e equivalence  45  6.1.2  Pickling limitations  46  6.1.3  O l d a n d n e w classes  46  6.1.4  U n s u p p o r t e d operations  47  Future work  47  6.2.1  G a r b a g e collection  47  6.2.2  O b j e c t m o b i l i t y policies  48  6.2.3  Transactional consistency semantics  49  6.2.4  Nameserver  49  6.3  F i n a l thoughts  50  6.4  O b t a i n i n g the c o d e  50  v  List of Tables 5.1  P y B e n c h results  39  5.2  Cost of P y G u i d functions  40  5.3  Pyrex timings  41  5.4  Distributed performance  43  vi  Listings 3.1  PyObject  13  3.2  PyTypeObject  15  4.1  PyRemote example 1: Remote calls  20  4.2  PyRemote example 2: Object mobility  21  4.3  PyRemote example 3: Object mobility  21  4.4  PyObjectProxy_Type  28  4.5  PyObjectProxy  28  4.6  Server side dispatch  29  4.7  New PyObject_HEAD macro  31  4.8  Type reassignment  33  5.1  Pyrex testing code  40  vii  Acknowledgments T h i s space would not be complete without thanks to one's supervisor, and due credit is definitely due to N o r m Hutchinson, who was not only able to draw on his own experiences i n the field to guide me through some of the shoals and reefs as I was trying to navigate the conceptual waters of object mobility, but also understanding during some of the rockier spots of my graduate studies. M a n y thanks also go to Brett A . Cannon, because having a friend and housemate who is a member of the P y t h o n development team was a tremendous help i n getting started and getting over the warts and bumps of the C P y t h o n source. Traditionally, this space is also used to thank one's parents (Thank you, Mum!), but while I don't want to seem filially ungrateful, I'd like to break away from the trend and give some particular credit to those who so rarely get any i n these hallowed thesis spaces: M y grandparents, without whose love and assistance I might never have made it to C a n a d a at all, and my friends (also a distributed system, a n d more fault tolerant than any ever deployed on computers), without whose support I could not have stayed (more or less) sane during the difficult times since, let alone whilst working on m y thesis (if sanity this can be called).  viii  To my dad, who first got me interested in computers. I wish you were here to read this.  Chapter 1  Introduction T h e p r o b l e m of object  d i s t r i b u t i o n a n d d i s t r i b u t e d c o m p u t a t i o n s is i n n o w a y  a  n o v e l o n e . H o w e v e r , few a p p l i c a t i o n s a r e v e r y w i d e l y k n o w n a n d u s e d . T h e r e a s o n s for t h i s are m a n i f o l d . T h e m o s t o b v i o u s a n d c r i t i c a l o n e is t h a t it is a n a r e a r i d d l e d with very  difficult p r o b l e m s  that have p e r h a p s never been solved satisfactorily.  This  thesis d o e s n o t a s p i r e to a d d r e s s t h e s e f u n d a m e n t a l issues o f o b j e c t d i s t r i b u t i o n . H o w e v e r , other reasons w h y d i s t r i b u t e d objects have never 'caught o n ' m a y i n c l u d e t h e fact t h a t m o s t a t t e m p t s at t a c k l i n g t h e p r o b l e m h a v e b e e n e i t h e r v e r y a c a d e m i c i n n a t u r e , o r v e r y f i n e l y t u n e d t o h i g h l y specific a p p l i c a t i o n s . W h i l e o b j e c t d i s t r i b u t i o n h a s b e e n i m p l e m e n t e d i n p r o g r a m m i n g l a n g u a g e s , s u c h as D i s t r i b u t e d S m a l l t a l k o r E m e r a l d , i n t h e p a s t , t h e s e are n o t w i d e l y u s e d l a n g u a g e s a n d so h a v e n o t b e e n a b l e t o h a r n e s s a l a r g e d e v e l o p e r b a s e , n o r t h e benefits o f p r e - e x i s t i n g , p o w e r f u l t o o l s a n d l i b r a r i e s . T h e r e a r e also provide distributed object capabilities.  libraries for  e x i s t i n g l a n g u a g e s t h a t a s p i r e to  H o w e v e r , t h e s e are i n e v i t a b l y r e s t r i c t e d b y  t h e f u n d a m e n t a l l i m i t a t i o n s of t h e l a n g u a g e s , a n d e v e n l a n g u a g e  implementations,  a l o n g w h i c h t h e y are u s e d . T h e P y R e m o t e p r o j e c t a i m s to t a c k l e a n o l d p r o b l e m f r o m a different d i -  1  rection: Neither the creation of a novel language, which—though it may be better designed from the ground up to be compatible with distribution requirements— suffers the inertia of any new programming language, most of which never meet with wide acceptance; nor the addition of a library, which will generally have no syntactic support and be limited by the underlying implementation; but by the modification of an existing, dynamic, widely used programming language, Python, to support the necessary features and capabilities.  1.1  Secondary objective  A secondary objective of the PyRemote project is to address certain limitations within the Python language, particularly in its canonical interpretation; the Python interpreter, written in C (and so usually referred to as CPythori) available from h t t p : / / p y t h o n . o r g . Although the Python standard library contains mechanisms for dealing with multiple threads of control, the ability of a Python program to scale with multiple processors or C P U cores is limited by the global interpreter lock, or GIL, which effectively prevents the underlying operating system from running more than one interpreter thread at once even where multiple execution units are available. Past attempts to eliminate the controversial G I L have been either prohibitively difficult or so inefficient that the effort has been counterproductive. There are also arguments that multi threading is not the optimal form of concurrency, and that a system based on multiple concurrent processes offers many benefits—in terms of stability, in terms of scaling on multi-core or multi-CPU systems where processors may not have symmetrical access to memory, et cetera. Although there is nothing to prevent Python programs from executing in a multi-process context, an obvious and basic requirement to use this mechanism for 2  distributed computation is a method of interprocess communication. Barring the socket interface and a few X M L marshalling libraries, no such method exists in the standard P y t h o n distribution; in particular, there is nothing to provide transparent inter-process communication (which is to say, convenient, consistent w i t h ordinary P y t h o n syntax and semantics, and intuitive). B y providing a location-agnostic method of I P C , the PyRemote system attempts to fill this role, and thereby make it easy—or at least easier—to write P y t h o n programs that scale to take advantage of multiple processors.  3  Chapter 2  R e l a t e d work 2.1 2.1.1  Distributed programming languages Emerald  The Emerald language [1, 2] was developed in the early 1980's with the original goal of addressing shortcomings in the Eden system, which inspired it. Its primary goals were to offer object-based distributed computation with uniform semantics for local and remote objects, and to do so efficiently, in stark contrast to existing systems, where distributed objects tended to be extremely heavy-weight. Emerald achieved its stated goals, and produced several incidental novelties in terms of its type structure, which can be broadly characterised as prototype-based rather than class-based, and its use of type specifications to which objects conformed without concern to internal implementation details (that is, what would today be called interfaces)—it is even argued that true polymorphism is only achieved when a type specification is polymorphic with respect to implementation [2]. The PyRemote project empathically does not aspire in any academic or theoretical way to transcend Emerald, which presented a very clean solution to problems 4  in distribution. Rather, it is one of few known attempts to provide a reasonable subset of its features—uniform local and remote semantics, object mobility, et cetera— in a more widely used programming language; languages born in academia, such as Emerald, may boast pure and elegant solutions to theoretical problems, but are often stillborn in terms of widespread adoption, as any new programming language necessarily lacks comprehensive libraries, platform bindings, and so forth. Emerald was one of the primary inspirations for the PyRemote project.  2.1.2  Distributed Smalltalk  John Bennett's work on Distributed Smalltalk ('DS') [3] represents one of the projects most significant to the PyRemote project, and raised many of the issues that still face designers of distributed systems today. It provides transparent remote invocation, automatic marshalling, object mobility (by means of move and copy primitives), and distributed garbage collection, among many other features. The one important feature that DS appears to lack, as far as [3] is concerned (and in our view), is class sharing: Objects and their classes must be co-resident; an object's class must be present at a node where it is moved; and no support for class sharing is provided, beyond a weak guarantee that identically named classes, if already present on both nodes in a transaction, are compatible. DS was particularly interesting, given the goals of the PyRemote project, in that it too was an endeavour to add distributed computation facilities to an existing language. (It was arguably a simpler task: Smalltalk semantics are explicitly centered around message passing, and the interpreter already had an object reference indirection table.)  5  2.1.3  Erlang  Erlang is a distributed programming language developed by the telecommunications company, Ericsson, to support large-scale, soft realtime applications [4]. Its syntax and semantics, in its 'local' subset, are declarative rather than imperative; it is historically based on Prolog [5], but is in appearance similar to functional programming languages. Concurrency is strictly explicit. An interesting point to note is that declarative languages by 'nature' and by design tend to have very few statements with side effects—which is to say that they tend to modify very few data structures save perhaps by assignment from return values. This also makes them intrinsically easier to modify to be distributed languages: Most of the problems that make distributed programming so hard stem from such side effects and the difficulties of keeping data structures synchronised across peers in a distributed system; non-existent problems in a purely functional language except perhaps in maintaining a namespace hierarchy. Functional subsets of languages are therefore interesting sections to inspect for possible exploitation in parallel and distributed languages.  2.1.4  RTSync  The RTSync project [6] represented an attempt to create a programming language offering soft real-time constraints in a distributed system. It was implemented by Petter Haggholm and Scott Stoddard in the summer of 2004 to support theoretical work by Dr. Stefan D. Bruda. It offered a multi-threaded, actor-based programming environment with object messages passed through synchronizer entities. Programmers can also express end-to-end timing constraints; synchronizer entities schedule message passing in order to satisfy these constraints, and programmers may specify 6  the consequences of timing failures. The RTSync compiler infers, wherever possible, local 'sub-constraints'. Although the language itself (if not the runtime system!) was fairly simple, it is mentioned here as representing the author's earliest experience with implementing a distributed programming language.  2.2  Distributed toolkits for Python  There exist numerous tools for providing distributed computation in Python (in fact, several new ones have emerged since the PyRemote project was begun). Typically, however, they are limited in scope, in particular as pertains to dealing with  types,  and they tend to be written purely in Python. From an implementation point of view this is a considerably simpler task, but it may come at a cost since Python code will tend to run more slowly. (Although it is often said that Python code will run more slowly than C or C+-1- code by more than an order of magnitude, properly written Python can be quite efficient—sometimes even faster than Java and almost as fast as C++, if startup costs are ignored or amortised [7]. However, the issue here is that implementing proxies in Python would involve more lookups and indirections than code that can be injected directly into the type structures of the interpreter.)  2.2.1  PYRO  P Y R O (Python Remote Objects) is a distributed object system written in pure Python [8]. It appears to present a very functional distributed object system (by far the most substantial project we found when we surveyed the field to see what efforts had been made in this area for the Python language). It supports sharing of arbitrary objects, so long as they can be "pickled" by the standard p i c k l e module 7  (for object serialisation), and handles semantics in an intuitive fashion (there are no special restrictions on objects as function parameters, et cetera). Exceptions are handled elegantly: Not only are they propagated to the caller, but the stack trace (attached to Python exceptions) is transmitted as well, to avoid confusing a programmer attempting to debug software with a stack trace in which the 'bottom' frame is a proxy call, with no hint as to the stack on the node where the actual invocation happened. P Y R O provides an event server system (publisher/subscriber) to ease eventdriven programming. There is also support for asynchronous function calls. To provide some security, P Y R O offers connection validators to restrict clients allowed to connect to services, and code validators to control whether mobile code is allowed from any given client. It should be noted, however, that Python itself is not a secure programming language.  Limitations P Y R O has two different kinds of proxies: 'Simple' proxies, which are the default type, and 'dynamic' proxies, which support attribute access. It follows, therefore, that unless one explicitly requests such a proxy, the operations performed on proxies are constrained. The distinction exists because dynamic proxies are much slower; however, this may confuse programmers and makes P Y R O code much less transparent. Proxies also cannot be shared between threads. It is not clear from the documentation why this is so. Furthermore, while in a sense arbitrary objects can be shared, objects must be either derived from a P Y R O base class or decorated with a P Y R O delegate object. There are some exceptions: Not all objects can be 'pickled'. For example, objects such as file and socket objects cannot be shared at  8  all. Although PYRO has limited support for mobility, it is not without problems and subtleties. For example, dynamic proxy objects can cause unexpected exceptions to be raised if types the involved are not present on the caller side of a remote invocation: There is no support for automatic transmission of classes not present on one peer in a distributed system. (Classes that are present but not loaded are imported automatically; this is a regular feature of Python's pickle module.) Classes can be shared across different nodes, but only if they are available in separate modules—classes in the __builtin__ namespace cannot be shared at all. More unexpected problems arise when accessing nested attributes (an_ob.an_attr.sub_attr). Very importantly, there is no guarantee of consistency. Once a class is copied from one host to another, PYRO will do nothing to ensure that the two stay consistent if one side is altered. 2.2.2  RPyC  RPyC (Remote Python Call) is a Python library providing remote calls [9]. It aims to provide (largely) transparent access to a remote 'slave' interpreter, though it does not provide any form of code mobility. Remote namespaces are brought into variables in the local namespace, allowing invocations to be made to slave interpreters. Remote objects are represented by proxies, or (in the case of immutable objects) copied by value. Exceptions thrown by remote calls propagate to the caller side in an intuitive manner. RPyC also supports asynchronous calling, where the results of a remote call may be polled to determine completion.  9  Limitations R P y C d o e s n o t ( a n d d o e s n o t a s p i r e to) offer a n y k i n d of o b j e c t o r c o d e m o b i l i t y ; it is a r e m o t e c a l l s y s t e m a n d n o m o r e . F u r t h e r m o r e , it i n t e r a c t s p o o r l y w i t h t h r e a d e d programs.  2.2.3  PyRCall  P y R C a l l ( P y t h o n R e m o t e C a l l M o d u l e ) is a n o t h e r r e m o t e c a l l l i b r a r y for P y t h o n [10]. ( U n f o r t u n a t e l y , t h e d o c u m e n t a t i o n we w e r e a b l e t o f i n d w a s s o m e w h a t s p a r s e . ) ilar to R P y C , P y R C a l l  offers r e m o t e access t o o b j e c t s .  r e m o t e calls are m u c h m o r e e x p l i c i t i n t h a t  C o m p a r e d to R P y C ,  access t o r e m o t e n a m e s  p l i c i t P y R C a l l f u n c t i o n calls r a t h e r t h a n i m p o r t i n g n a m e s p a c e s . through S S L connections  Sim-  involves  the ex-  S e c u r i t y is offered  a n d t h r o u g h a u t h o r i s a t i o n k e y s w h e r e b y access t o s e r v e r  objects m a y be limited.  Limitations P y R C a l l a p p e a r s t o offer n o s u p p o r t for c o d e or o b j e c t m o b i l i t y .  2.2.4  Thoughts on distributed toolkits  A l t h o u g h t h e t o o l k i t s d i s c u s s e d i n p r e v i o u s s e c t i o n s are q u i t e f u n c t i o n a l , it is o u r o p i n i o n that none of t h e m are sufficiently t r a n s p a r e n t to be entirely convenient programmers.  to  T w o o f t h e m are i n n o s i g n i f i c a n t w a y c o m p a r a b l e t o t h e a i m s o f  t h e P y R e m o t e p r o j e c t , as t h e y p r o v i d e o n l y r e m o t e f u n c t i o n c a l l s , r a t h e r t h a n f u l l featured mobile objects.  ( T h e r e a r e , i n fact, s o m e s m a l l e r or less m a t u r e  toolkits  t h a t d o m u c h t h e s a m e , n o t s u r v e y e d h e r e . N o n e o f t h e m , as far as we c o u l d f i n d , a r e more ambitious.)  P Y R O goes m u c h further.  10  However, none of the toolkits  found  or investigated offered all of the following features, which we consider critical to transparency: • P r o x y representations of objects • Object mobility • Marshalling and automatic transmission of types • A single set of uniform semantics on proxies and objects T h e PyRemote project provides all of these features.  11  Chapter 3  The C P y t h o n implementation P y R e m o t e builds u p o n the c a n o n i c a l P y t h o n i m p l e m e n t a t i o n ,  'CPython',  which  is w r i t t e n i n C . ( T h e v e r s i o n w r i t t e n for t h i s t h e s i s is b a s e d o n t h e v e r s i o n source code.)  A l t h o u g h C P y t h o n is a f a i r l y l a r g e p i e c e o f software,  s o m e 366,000 lines o f c o d e ,  m u c h o f t h i s resides i n m o d u l e s  not  2.4.3  consisting  relevant  to  of the  P y R e m o t e i m p l e m e n t a t i o n . N o changes were m a d e to the parser, bytecode c o m p i l e r , stack machine, components  a n d so f o r t h , a n d a l t h o u g h s o m e u n d e r s t a n d i n g o f s o m e o f t h e s e  was  necessary  i n order to design  PyRemote  a n d decide where  best  to i m p l e m e n t changes, t h e y have no p a r t i n the i m p l e m e n t a t i o n . T h i s c h a p t e r thus d i s c u s s e s s o m e k e y facets o f t h e C P y t h o n i m p l e m e n t a t i o n t h a t t h a t a r e f u n d a m e n t a l to u n d e r s t a n d the low-level design of the P y R e m o t e p r o j e c t .  3.1  Python objects  3.1.1  PyObject  B e c a u s e t h e i m p l e m e n t a t i o n l a n g u a g e is C , a n d b e c a u s e P y t h o n is a n o b j e c t - o r i e n t e d language  w i t h p o l y m o r p h i s m a n d o t h e r features  12  associated  with O O P , C P y t h o n  L i s t i n g 3.1: T h e b a s i c P y t h o n o b j e c t  A  * Tim is a simplified version of the PyObject-HEAD macro * See Include/object.h  */ #defme PyObject_HEAD  \  i n t ob-refcnt;  \  s t r u c t _typeobject *ob_type; t y p e d e f s t r u c t .object { PyObject-HEAD } PyObject;  i m p l e m e n t s i n h e r i t a n c e "by h a n d " . T h e p r e p r o c e s s o r m a c r o PyObject_HEAD defines t h e m i n i m a l i n f o r m a t i o n t h a t a l l o b j e c t s n e e d , s u c h as reference c o u n t , o b j e c t t y p e , a n d ( d e p e n d i n g o n c o m p i l a t i o n o p t i o n s ) c e r t a i n o p t i o n a l fields t o d o w i t h g a r b a g e c o l l e c t i o n a n d reference  tracing.  contains only this i n f o r m a t i o n .  T h e basic object  type  in C P y t h o n ,  PyObject,  L i s t i n g 3.1 p r o v i d e s a n a p p r o x i m a t e i d e a o f w h a t  t h e P y O b j e c t s t r u c t u r e l o o k s like (the r e a l c o d e is c o m p l i c a t e d b y m o r e , a n d m o r e sophisticated,  p r e p r o c e s s o r m a c r o s for d e b u g g i n g a n d v a r i a b l e - s i z e o b j e c t s ) .  m a k i n g every object  (including type objects)  c o n t a i n PyObject_HEAD as its  i t e m , it is t h e n p o s s i b l e t o p e r f o r m g e n e r i c o p e r a t i o n s u n i f o r m l y o n a l l o b j e c t s c a s t i n g t h e m as P y O b j e c t p o i n t e r s .  F u r t h e r m o r e , since the object  By first by  t y p e is o n e o f  t h e b a s i c d a t a w h o s e offset is k n o w n i n a l l t y p e s t r u c t u r e s , t y p e l o o k u p c a n also b e performed. O b j e c t s m a y b e e i t h e r p u r e P y t h o n o b j e c t s , or ' C o b j e c t s ' .  T h e core b u i l t - i n  t y p e s s u c h as type, t h e o b j e c t b a s e class, i n t , s t r , a n d so f o r t h , are a l l i m p l e m e n t e d i n C . T h e C interface allows p r o g r a m m e r s to w r i t e P y t h o n extensions i n C , w h i c h are l a r g e l y t r a n s p a r e n t t o t h e P y t h o n p r o g r a m m e r . 1  1  T y p i c a l l y , C types are somewhat 'less mutable' than pure P y t h o n types; so for instance  the methods of built-in types cannot be altered.  13  3.1.2 The  PyTypeObject type o f  a P y t h o n o b j e c t is d e f i n e d e i t h e r i n a s t r u c t u r e g e n e r a t e d  at r u n t i m e  ( i n t h e case o f a l l u s e r - d e f i n e d , a n d m a n y l i b r a r y classes), or i n a ( g e n e r a l l y s t a t i c ) structure directly i n the C code.  T h e P y T y p e O b j e c t is a n s t r u c t u r e c o n t a i n i n g a l l  o f t h e i n f o r m a t i o n p e r t a i n i n g t o a t y p e , s u c h as t h e n a m e , a l l o c a t i o n i n f o r m a t i o n , a n d p o i n t e r s to a l l o f t h e o p e r a t i o n s o n o b j e c t s o f t h i s t y p e t h a t are i m p l e m e n t e d i n C . W e s h a l l use t h e c o n v e n t i o n of r e f e r r i n g t o t h e s e as to regular m e t h o d s ,  built-in  w h i c h are i m p l e m e n t e d i n P y t h o n .  m e t h o d s , as o p p o s e d  Built-in methods  include  a l l o c a t i o n a n d d e a l l o c a t i o n , as w e l l as c o m m o n , l o w - l e v e l o p e r a t i o n s s u c h as c a l l i n g ( o n c a l l a b l e o b j e c t s s u c h as f u n c t i o n s a n d f u n c t i o n o b j e c t s ) , c o m p a r i s o n s , h a s h i n g , and p r o d u c i n g a string representation. c o m m o n t o all o b j e c t s , sufficiently c o m m o n l y  ( T h e y are n o t ' c o m m o n ' i n t h e sense o f b e i n g  b u t r a t h e r i n t h e sense o f b e i n g s u f f i c i e n t l y c o m m o n , a n d  invoked,  t o m e r i t s p e c i a l , efficient  lookup.)  O p t i o n a l l y , t h e s t r u c t u r e m a y also c o n t a i n s u b - s t r u c t u r e s o f c o m m o n o p e r a t i o n s to a r i t h m e t i c t y p e s ,  sequence  t y p e s ( s u c h as t u p l e s a n d l i s t s ) ,  mapping  types  ( s u c h as d i c t i o n a r i e s ) , a n d buffer t y p e s ( p r o v i d i n g r a w m e m o r y access). See L i s t i n g 3.2 for a t r u n c a t e d v i e w o f t h e P y T y p e O b j e c t s t r u c t u r e .  T h e real P y T y p e O b j e c t  s t r u c t u r e contains m a n y m o r e types of f u n c t i o n pointers, m o r e m e t h o d suites, a n d many more built-in methods,  b u t t h e e x a m p l e s h o u l d suffice t o m a k e t h e  general  idea clear. M e t h o d l o o k u p for a b u i l t - i n m e t h o d o f a n o b j e c t o follows t h e c o d e outlined (in r o u g h terms)  •  path  below:  T h e o b j e c t is p a s s e d t o a n i n t e r p r e t e r f u n c t i o n t h a t a c t s o n P y O b j e c t p o i n t e r s , s u c h as P y O b j e c t J l e p r O , w h i c h r e t u r n s a ( P y t h o n ) s t r i n g r e p r e s e n t a t i o n o f  14  Listing 3.2: T h e object type object /* These are only excerpts: refer to In elude/object .h */ t y p e d e f PyObject * (*unaryfunc)(PyObject *); t y p e d e f PyObject * (*binaryfunc)(PyObject *, PyObject *); t y p e d e f PyObject *(*getattrfunc)(PyObject *, char *); typedef int (*setattrfunc)(PyObject *, char *, PyObject *); typedef struct { binaryfunc nb_add; binaryfunc nb_subtract; unaryfunc nb.positive; } PyNumberMethods; t y p e d e f struct _typeobject { PyObject.VAR-HEAD char *tp_name; destructor tp-dealloc; getattrfunc tp-getattr; setattrfunc tp_setattr ; PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; } PyTypeObject;  its object. • T h e function looks up the object's type object: PyTypeObject * t p = o->ob_type • Generic operations are invoked through the intrinsic method pointers, whose offsets are known (they are fields i n PyTypeObject). • Invoke the function pointer w i t h o as its first parameter: PyObject * r e s u l t =  tp->tp_str(o);  Although this is a very simplified account (partially for historical reasons, attribute lookup i n C P y t h o n can internally be rather complex), it is sufficient to explain how P y R e m o t e exploits these structures.  It should be noted that there  is an entirely different, more general code path for P y t h o n methods that are not  15  necessarily built-in operations. Below is an example for lookup up some method m of an object o: • The interpreter looks up the object's dictionary, o.__dict__ • The method m is looked up through an ordinary dictionary lookup: met = o. __dict__[m] • met's __call__() method is invoked with the proper parameters. Although more general, this codepath is of little interest to PyRemote, since it is handled automatically, so long as the low-level method lookups providing services such as dictionary lookup and the __call__() method are forwarded as appropriate. These lower-level lookups correspond to the first code path outline.  3.2  Name resolution  Name resolution in Python is generally performed at runtime, using dictionaries. This is an extremely general approach and is reflected frequently throughout the system; so for instance a local variable within a function is looked up in the function's dictionary, global variables in a global dictionary, an object's attributes in the object's dictionary, et cetera. Although not very fast, this mechanism enables Python to be a highly dynamic language, and manipulating key facets is often as simple as altering values in a dictionary.  3.2.1  Method resolution order  When a method on a Python object is invoked—or indeed any attribute lookup is performed—the interpreter potentially performs a series of lookups in object dictio16  naries. Initially, the dictionary of the object itself is queried; if the desired attribute is not found, its type is queried; if again it is not found, then the superclasses of this type are queried. The precise lookup order is non-trivial. "Old-style" P y t h o n classes (see Section 6.1.3) used a simple algorithm; superclasses were queried depth-first in the inheritance graph, left to right. However, this approach is problematic i n that it is not always monotonic. "New-style" P y t h o n classes, since version 2.3, use a more sophisticated ordering which guarantees monotonicity [11]. T h e general principle, however, is much the same; only the order of the dictionary lookups is changed.  17  Chapter 4  Object m o b i l i t y design W h e n we s p e a k o f ' o b j e c t m o b i l i t y ' a n d t h e c o n c e p t o f ' m o v i n g ' a n o b j e c t f r o m o n e c o m p u t e r t o a n o t h e r , t h e r e a r e s e v e r a l s e m a n t i c i n t e r p r e t a t i o n s o f w h a t it m e a n s for h o s t A (the c l i e n t i n t h i s p a r t i c u l a r t r a n s a c t i o n ) t o r e q u e s t a n o b j e c t o f r o m h o s t  B  (the s e r v e r ) . P o s s i b i l i t i e s i n c l u d e  •  Sending a  copy,  •  Sending a  proxy,  where  A  receives a c o p y OA c o m p l e t e l y i n d e p e n d e n t o f o;  where the object remains o n  o n it are p e r f o r m e d r e m o t e l y , o n  •  Actually  moving t h e  with a proxy p , a  • Replicating  B  and any operations  A  performs  B;  object, where  A  n o w a c t s as a s e r v e r for  o and B  is left  r e f e r r i n g t o o — t h i s is e q u i v a l e n t t o a c o p y a n d a d e l e t i o n ;  the object, where  A  and  B  e a c h h o l d a ' r e a l ' o b j e c t (as  opposed  t o a p r o x y ) b u t , u n l i k e i n t h e c o p y o p e r a t i o n , t h e y c o n c e p t u a l l y refer t o t h e s a m e o b j e c t , so t h a t a n y c h a n g e s t o OA a r e r e f l e c t e d i n 03-  T h e d e f a u l t m e c h a n i s m for o b j e c t m o b i l i t y i n P y R e m o t e is p r o x y s e m a n t i c s , b u t t h e r e is also s u p p o r t for c o p y a n d m o v e s e m a n t i c s — a l t h o u g h d u e t o l i m i t a t i o n s  18  of the implementation and the constraints of working w i t h the existing C P y t h o n code base, not all objects can be be moved, and some (though fewer) objects cannot be copied. Immutable objects comprise an exception to this general rule as they are always copied; see Section 4.2.1. Meanwhile, there are some objects, primarily resource handles, for which move operations do not make much sense (although remote access through proxies is still relevant and functional). Types are also handled differently, as it is not feasible to have strictly remote type objects, as they are enmeshed rather deeply in the type system, and as they are typically read vastly more often than they are modified; it is more sensible to replicate rather than move them.  4.1  PyRemote in action  For readers unfamiliar w i t h our use of terminology, inspired primarily by the Emerald language, this section provides a number of examples derived from the P y R e m o t e test code to demonstrate how its facilities may be used. In order to show b o t h code and results, the P y t h o n interpreter's interactive mode was used. Lines i n the examples indented w i t h » > or . . . are code entered into the interpreter; lines not so prefixed are results printed to the standard output. We do not show server code; the server side of these transactions consists solely of launching a server thread and publishing an entry object.  4.1.1  Example 1: Remote calls  A client connects to a server and (in the current, ad-hoc system) receives an advertised object. We may then perform operations on it as though it were a regular P y t h o n object, including type lookup, member listings, and method calls; see sec19  tion 4.4. A l t h o u g h the object appears i n every respect identical to local object, the i s p r o x y O function is provided to identify proxies. It is even possible to modify the type of the remote object. Listing 4.1: P y R e m o t e example 1: Remote calls >>>  i m p o r t pyremote  » >  # Obtain a remote object proxy:  >>>  remote.ob = pyremote.connect()  >>>  # Inspect the proxy:  ... dir(remote.ob) ['__add__', ' ..class— ' , ' —delattr__ ', ' __dict__ '_Jiash__', ' __init__ ' , '__module__',  '__doc—', ' __getattribute__ ',  '__new__', ' __reduce__ ' , ' __reduce_ex__ ',  ' __repr._ ', ' __setattr„ ' , ' __str__ ' , ' __weakref__', 'anotherJbo', 'class_name', ' dict_str ' , ' dir_str ' , ' foo-class ' , 'frobble', 'named-foo', 'someJist', ' ugly-class ', 'x'] >>>  type(remote_ob)  <class 'testclass .Foo'> >>>  isproxy(remote.ob)  True >>>  # Remote method call:  ... remote_ob.frobble() -14 >>>  # We may modify the remote type:  ... type(remote-ob).frobble = l a m b d a self: self.x >>>  remote-ob.x = 22  >>>  remote_ob.frobble()  22  20  4.1.2  Example 2: Object mobility  Given a remote object, we may request the real object by means of the new function r e a l ( ) , wherein the object is moved to the local peer; see sections 4.2 and 4.3. This is a 'soft request'; if the object is immovable, it will return a proxy object (the argument). Note that at present, this only works for objects in certain namespaces (see section 5.1.3). L i s t i n g 4.2: P y R e m o t e example 2: Object mobility >>>  i m p o r t pyremote  >>>  remote-ob = pyremote.connectQ  >>>  local.ob = real(remote_ob.another_foo())  >>>  isproxy(locaLob)  False >>>  type(locaLob)  <class '.Foo'> » >  # Note (above) thai although we have copied the class, it is now a member of no  ... # module, since the module was remote and is not copied.  > »  local_ob.frobble()  -14  4.1.3  Example 3: Object consistency  If a type is distributed over multiple machines, as it must be when real objects are copied, it should be kept synchronised. P y R e m o t e does this, although it is a broadcast system that does not guarantee absolute consistency (there are no strict transactional semantics).  21  L i s t i n g 4.3: P y R e m o t e example 3: Object mobility >>> i m p o r t pyremote >>> remote.ob = pyremote.connect() >>> # We. obtain a purely local copy with the same type ... locaLob = real(remote_ob.another_foo()) >>> type(local_ob).fruble = l a m b d a self: 2 >>> remote_ob.fruble() 2 >>> type(local_ob).fruble = l a m b d a self: 1 >>> remote.ub.frubleQ 1  4.2  Copying objects  Copying objects is simple and, i n general, achieved by using P y t h o n ' s built-in functionality through such tools as the p i c k l e and m a r s h a l modules, p i c k l e [12] is the primary tool used by P y t h o n programmers for marshalling and serialisation; it uses an extremely simple stack-based language to encode objects i n a portable format suitable for transmission, and can handle quite arbitrary (though not all) objects as well as recursive structures. Copying is therefore—as far as implementation is concerned—a fairly t r i v i a l issue. The main exception to this rule is the copying of types, which cannot normally be either ' p i c k l e d '  1  or 'marshalled'; however, it is not generally difficult to  serialise and transfer their components (class name, methods, and so forth). A l x  W h e n the P y t h o n  qualified name.  pickle  module encounters a type, it inserts a reference by fully  T o 'unpickle' it, therefore, it imports it, requiring the module where the  type is defined to be already present. T h i s is a space efficient way of serialising objects with known types, but provides no facility for type transmission.  22  though intrinsic facilities do not exist to do this, the m a r s h a l module is capable of marshalling P y t h o n bytecode, including the code objects representing the bytecode of functions. B u i l t - i n types, of course, do not need to be copied (and i n fact never are copied), as they exist on each peer i n a P y R e m o t e system, and share the same static identifier. Extension modules (not part of the standard library) implemented in C (or another compiled language, such as C + + ) present a special problem. If they are present on both nodes involved i n a copy operation, they can be looked up and imported by n a m e — t h i s is done automatically. However, if only one of the nodes has a copy of the extension module, the copy operation w i l l fail. It is not possible, i n the general case, to copy binary modules from one host to another. (We do not assume that peers i n a system are i n any way homogenous, save i n running compatible P y t h o n / P y R e m o t e interpreter versions.) We shall discuss further limitations of this system i n Section 6.1.2.  4.2.1  Immutable objects  Immutable objects present a special case: A l t h o u g h P y t h o n is highly dynamic, there are some built-in types whose objects cannot i n any way be modified. These include arithmetic and Boolean types, tuples (although tuple  items can be modified), and  strings. Since they cannot be modified, it is semantically equivalent to hold a proxy to a remote i n t object and to hold a local copy w i t h the same 'unique' identifier. Another set of objects considered immutable are built-in types, which  must  be present at both peers. (Strictly speaking, it is necessary that the t y p e type exist and be functionally equivalent; proxy types cannot be constructed without using t y p e as a base type. To simplify and streamline the bootstrap process, we assume  23  that peers i n a PyRemote system w i l l have compatible versions of all the built-in types.)  4.3  Moving objects  To move an object, it must be copied to the destination and deleted from the source. Copying is described i n Section 4.2; move operations use the same mechanism. T h e deletion poses new challenges i n terms of the implementation, because  CPython  was not originally designed w i t h this i n mind. In particular, C P y t h o n object references are simple pointers and may be held by references i n dictionaries representing variable mappings as well as internal structures and functions, there is no t r i v i a l mechanism to trace them. This is unlike languages such as Distributed Smalltalk, which has an object indirection table (although not originally for the purpose of mobility), allowing the mobility system to simply modify this indirection entry to point to a proxy instead of a real object [3]. This problem should not be under-emphasised, nor its impact on the implementation underestimated. In languages designed for object distribution, such as Emerald, object references may be made globally unique, and probably should be made so [2]. C P y t h o n , on the other hand, is designed to work w i t h i n a single address space. It was therefore necessary to add to every object a globally unique identifier. However, this identifier cannot be used for local object lookups: B o t h i n terms of performance and implementation effort, the consequences would be disastrous. T h e version 2.4.3 source code of the C P y t h o n interpreter contains 20,035 mentions of P y O b j e c t pointers (quite apart from pointers to specific types of objects!). However, the vast majority of objects on a given node w i l l never be addressed by another node. Only those objects that are actually used in a 'distributed m a n n e r ' — t h a t is, 24  those which are sent as proxies, sent as copies, or distributed to other m a c h i n e s — truly need more sophisticated references. (Then, of course, the pointer value is no longer sufficient as an identifier as two different objects i n different address spaces may have the same memory address.) The approach taken to the object-move problem is fairly straightforward: PyRemote attempts to replace any 'external' references to an o b j e c t — t h a t is, references not held by PyRemote internal structures—by references to a proxy. If all external references can be resolved, the object is considered 'mobile' and is moved; if some references remain unresolved, then the object is considered 'immovable' and the request to move it is refused. In searching for references to the object, there are various locations to consider (see Section 3.2 on name resolution i n Python). Each module currently loaded has a dictionary mapping strings (identifiers) to P y t h o n objects. The dictionaries of all loaded modules may thus be searched (linearly) for keys corresponding to the target object, and all such references can be replaced. Similarly, we may scan the dictionaries of objects (and type objects) containing their attributes, and the dictionaries of closure objects belonging to active functions and methods. See Section 5.1.3 for a discussion on the possible policies.  4.4  Proxies  W h e n a host A operates on some object o that is not l o c a l — t h a t is, it resides wholly on some remote node B and is not copied to A—it  is represented on A by a proxy  object p , whose task is to forward any requests made of it to its 'target' object o. 0  This structure is similar to classical remote procedure call implementations—indeed, the basic idea of the structure dates back to at least the 1980's [13]. The primary 25  concern i n the design of the proxy objects of the PyRemote project was to achieve transparency—as much as possible, it should not be (and is not) visible to a user whether an object is a local, 'real' object or a proxy to a remote object. N o t only are all requests (attribute lookups, method calls, et cetera) forwarded to the target; an object o of type T is logically considered to have the exact same type as the proxy Po referring to o. A new built-in function, i s p r o x y O , is made available to distinguish proxies from 'real' objects.  4.4.1  Client-side proxies  T h e basic mechanism of implementing proxies i n C P y t h o n is relatively straightforward i n principle, if not i n detail.  Every object contains a reference to its type  structure, as described i n Section 3.1.2. T h e type object tracks built-in operations by function pointers, and functions that operate on generic P y O b j e c t pointers do so by performing function pointer lookups. PyRemote works by effectively intercepting these lookups.  In p r i n c i p l e —  an idealised implementation, such as might have been used i n a language designed from s c r a t c h — a proxy object would simply refer to a generic P r o x y type. T h e generic operation 'slots'  2  i n this ideal class contain pointers to proxy functions,  which marshal the arguments and send them to the appropriate peer, which would unmarshal them and, v i a a generic object stub function, dispatch them to the real object. T h e results may then be marshalled and returned to the proxy. If a remote invocation causes an exception to be thrown, this too should be intercepted, marshalled, and re-raised on the caller side. (That is, if p on host A 0  We here use the word 'slots' in a rather generic way; this is not to be confused with the Python concept of slots for a fixed set of user-defined variables, saving space by using a fixedsize __slots_- object member rather than a variable-sized dictionary member, — d i e t (See 2  http://docs.python.org/ref/slots.html.)  26  invokes an operation on o on host B, causing an exception to be raised, the caller on host A should be the one to deal w i t h i t — n o t host B.)  The current P y R e m o t e  implementation does intercept, marshal, and transmit exceptions, although some traceback information is lost since the current implementation of p i c k l e does not support pickling of t r a c e b a c k objects (see Section 6.1.2). In reality, because C P y t h o n was not designed w i t h remote invocation considerations in mind, the implementation is more complex. T y p e structures do not contain only function pointers, but also data, which are specific to types, such as the type name. It is therefore necessary to create a proxy type corresponding to each real type; because it would be prohibitively (and unnecessarily) expensive to do so on interpreter startup (or class creation time), this is done on demand.  In  general, however, these proxy types work much as the ideal P r o x y type outlined above; indeed the creation of most of the proxy dispatch functions was automated. The difference is strictly the caching of type-specific data. P y R e m o t e does, i n fact, have an ideal proxy type (PyObjectProxy_Type), although no objects of this type are ever created. Instead, it is used as a template for specific proxy types. A partial listing of this class is given in L i s t i n g 4.4. W h e n creating a real proxy type, a new PyTypeObject is allocated, and its fields are copied from either PyObjectProxy_Type, for all forwarding methods, or the concrete type (the 'target' of the proxy), for data fields such as the name. PyRemote is thus able to create, at runtime, a proxy representation of any type, whether it is implemented in a C extension or in P y t h o n code. A c t u a l proxy objects (as opposed to proxy type objects) are extremely simple; they contain effectively no data (since their task is to forward requests to remote locations where the data actually reside). The full layout of a P y O b j e c t P r o x y object  27  Listing 4.4: T h e object proxy type PyTypeObject PyObjectProxy_.Type = { PyObject-HEAD JNIT(&PyType.Type)  0,  / + obsize */  NULL, /* tp.name; provided by target */ sizeof(PyObjectProxy), /* tp.basicsi-ze */  0,  / * tpJtemsize */  proxy_tp_dealloc, NULL, proxy_tp_getattr, proxy_tp_setattr ,  /* /* /* /*  tp-.deall.oc */ lpjpri.nl (deprecated); unsupported */ tp^getattr .+•/ tpsetattr */  };  Listing 4.5: T h e proxy object t y p e d e f struct { PyObjectJfEAD; PyWeakReference *real_object; } PyObject Proxy;  is given i n Listing 4.5. For efficiency reasons, and because there are circumstances under which proxies may exist to objects residing on the contain  weak references  3  local  machine, proxy objects  to targets residing i n the same address space; these allow  much more efficient lookup of such objects, while adding only a negligible cost (compared to the I P C mechanism) to remote lookups.  4.4.2  Server-side object stubs  The server side of remote invocations is considerably simpler.  Conceptually, we  might envision an object stub for each and every type for which a proxy type exists; much as we have a generic PyObjectProxy_Type, we might have a generic Weak references are objects that look like ordinary references, but do not add to an object's reference count. If the target disappears, an attempt to use the weak reference will raise an exception. Weak reference objects can also be queried efficiently; PyRemote will not raise and catch exceptions every time a remote proxy is referenced. 3  28  Listing 4.6: Server side dispatch PyObject *server_dispatch(PyObject *remote.tuple) { switch (opcode) { case M P . L E N G T H : if(PyRemote_ExtractArgs(remote_tuple, " O " , &u) != 0) goto error; if (u—>is_proxy) goto not Jiere; x = PyMapping-Length(u); res ='PyRemote.BuildArgs(opcode | R E M O T E - R E S P O N S E , " i " , x); break; case ... ... break; error: default: Py-XDECREF(res); res = N U L L ;  if (!PyErr_Occurred()) PyErr_Format(PyExc-RuntimeError, "Object stub error"); break;  i f (PyErr.Occurred ()) res = bundle_exception(); r e t u r n res;  }  PyObjectStub.Type, w i t h an instance for each remotely referenced object. It turns out, however, that this is not necessary, since a l l lookups on the conceptual server stubs are function lookups rather than data lookups (as data lookups cannot be forwarded and must therefore be handled by mirroring data i n each proxy). A single s e r v e r _ d i s p a t c h ( ) function can therefore unmarshal requests and dispatch them through the appropriate function. Listing 4.6 presents a snippet of the dispatch code.  29  4.4.3  Method lookup and proxies  Due to the nature of Python's method resolution order (see Section 3.2.1) and method lookup process, it is frequently the case that method lookups require access not only to an object's dictionary, but also to its type object's dictionary, and potentially a number of supertype dictionaries. This could easily lead to very large numbers of lookups, which might get prohibitively expensive if each lookup involved a pair of remote messages. However, each proxy contains a shallow copy of its target's dictionary. (Dictionary keys are always copied; this is never problematic since P y t h o n requires dictionary keys to be immutable objects, and P y R e m o t e already copies immutable objects. T h e dictionary values are copied by proxy.)  A s such,  although invoking a method w i l l cause remote messages to be sent, querying an object for the method proxy is a purely local operation.  4.5  Augmentation of Python objects  A l t h o u g h PyRemote mainly added to C P y t h o n rather than modifying it, some modifications were necessary.  One of the more significant is that the P y O b j e c t -HEAD  macro has been modified to contain information necessary for remote objects (see Listing 4.7): A flag ( i s _ p r o x y ) was added, as well as a field for the globally unique identifier ( G U I D ) necessary since an address unique w i t h i n a single address spaces may clash w i t h an address on another node. T h i s increases the size of the basic P y t h o n object by s i z e o f ( i n t ) + N for an AT-byte G U I D . T h e current implementation uses a 128-bit (16-byte) G U I D . Another (more minor) modification was a conditional dispatch entered into the attribute accessor and mutator functions of PyTypeObject, the type of types:  30  Listing 4.7: New PyObjectJffiAD macro #defhie PyObject_HEAD int ob-i'efcnt; struct .typeobject *ob.type; PyGuid obJd; int is.proxy;  \ \ \ \  In the general case, the type of a proxy is itself a special proxy type, but this regression must terminate, and the type of the special proxy type itself is simply t y p e , represented internally by PyTypeObject; so although there exists a proxy for this type, its functions must be invoked through regular type operations.  4.6  Type and object consistency  A crucial problem i n distributed systems i n general, and one of the issues which this project aimed to explore, is the notion of consistency when objects are either copy-replicated or distributed by proxies: W h a t changes made on one node need be reflected o n other nodes, and to what degree is this achievable? T h e problem is particularly poignant i n P y t h o n , due to the extremely dynamic semantics of the language.  Unlike many other programming  languages,  P y t h o n allows the creation of types at runtime; furthermore (and more problematically), types are first-class, mutable objects; finally, object type assignments are m u t a b l e — t o say that object o has type t means only that o's type reference points to t\ it may be reassigned to some other type t!. Thus, at runtime, an object may have its type changed. For types implemented i n P y t h o n , the type of an object does not affect its layout (all variables are stored by name i n the object's __dict__ member). (See Listing 4.8 for an example.) Since, in general, an object's attributes are simply stored i n its dictionary (this applies to type objects as well as t o other objects),  31  it is similarly the case that classes may acquire methods (and class attributes) at runtime, and even type names may change. For proxies, the solution is quite straightforward: The generic mechanism for modifying an object is a special method called _ _ s e t a t t r i b u t e _ _ ( ) , which the proxy mechanism already intercepts. (Additionally, data members such as __dict__, the member dictionary, may be modified; however, these can already be represented as dictionary proxies. For regular objects, the simple process outlined here is sufficient. T h e difficulty arises, once again, w i t h type objects; once again, because only method calls, not data accesses, are forwarded. W h e n the server modifies the real type object, the information cached by proxies becomes stale. A type object that is updated should send an update message to all of its proxies (or broadcast an update message, if it does not know where they are). The node requesting the change receives update information i n the reply. See Section 5.1.2 for a further discussion of this subject.  4.6.1  R e a l object consistency  Besides proxies, we also need to ensure that types are kept consistent across the system, lest we encounter unexpected behaviour when moving or copying an object from one host to another where the type exists on both nodes, but is not equivalent. It would perhaps be desirable to also be able to keep objects consistent—in other words, to be able to distribute an object, i n order to make it faster to access and more resilient to f a i l u r e — b u t this would be very difficult to efficiently implement; it is also our opinion that this is much less important than maintaining type consistency. It is particularly important that types be kept consistent because they may be copied implicitly by P y R e m o t e support routines (for example, if a node A  32  Listing 4.8: Type reassignment c l a s s Foo(object): def  init  ( self):  self . x = 5 d e f foo( s e l f ) : r e t u r n self.x * self .x c l a s s Bar(object): def  __init  ( self):  self.x =  'hello'  d e f foo( s e l f ) : r e t u r n self.x f = FooQ  # x = 5, foo()  f.fooQ f. _class__ = B a r f.fooQ  # returns 25 # fooQ — > x # returns 5  —> x * x  b = Bar()  # x = 'hello ', fooQ - > x  b.foo()  # returns 'hello '  b. __class._ = F o o  # foof) — > x * x  b.fooQ  # TypeError: can't multiply sequence by non—int  requests a real object o from another node B to replace a proxy p , its type is copied a  from B to A, invisibly to the user). Obviously, it would be highly undesirable for implicit operations that the programmer may not be aware of to risk introducing inconsistencies. PyRemote's method to cope with this is to mark all type objects that are distributed across two or more nodes as being synchronised. Whenever a synchronised object's attributes are modified (through the __getattribute__ method), the update is pushed onto the network in a message that is broadcast to all the peers that the current node are aware of. (Since the current PyRemote implementation uses TCP/IP v4, it is not a true broadcast message, but rather an asynchronous message sent iteratively over each peer connection.)  33  Chapter 5  Results and evaluation The P y R e m o t e system offers most of the features it was envisioned to have. P r i marily, it provides object sharing and remote method invocation through the use of proxy objects, in a manner entirely transparent to the programmer. Beyond initialising the system (with a single call) and connecting to a remote peer to obtain a reference to a remote object, a programmer does not need to use any non-standard syntax, semantics, or function calls (although functions to provide additional, optional functionality, such as requesting references to real o b j e c t s — ' p u l l ' requests— are provided).  W i t h very few exceptions, primarily ones regarding raw memory  access (see Section 6.1.4), arbitrary operations are permitted on arbitrary types. Some objects naturally cannot be moved—such as handles to open resources, e.g., files and sockets—but they can nevertheless be accessed remotely, by proxy. W h e n proxies to objects of unfamiliar types are requested, the necessary type information is conveyed automatically and transparently.  34  5.1  D e s i g n choices a n d tradeoffs  D u r i n g the implementation of the PyRemote system, some questions arose to which the proper answers at present are unclear and merit further research. In this section we shall discuss a few important policies that have been implemented, but are not necessarily the o n l y — a n d not obviously the correct—choices.  5.1.1  Immutable types  A s noted i n Section 4.2.1, P y R e m o t e never creates proxies for certain immutable types, as they can be freely copied. One potential drawback of t h i s — o r at least of applying it as 'indiscriminately' as it is presently a p p l i e d — i s that if such objects are merely passed back and forth, large amounts of data may be copied. It might be prudent to perform a check and still present proxy objects for objects such as very large strings, which might represent entire text files. This is not currently done in PyRemote: It would add considerable extra complexity i n that there are many operations on string objects that  require them to be locally present, and it is not  clear how the implementation would know when it is necessary to fetch large strings, and when it would be appropriate to leave them where they are.  5.1.2  P r o x y consistency policies  It is not exactly clear how the proxy type data cache update should take place, and here is an important tradeoff between efficiency and strict semantic coherence. In order to guarantee total coherence—'Once the type object is considered updated, it is updated everywhere'—we would need a transactional system where, at the very least, a l l the stale caches are marked stale before the operation takes place and returns an A C K . P y R e m o t e does not implement such a transactional system,  35  although it would be interesting to investigate how much effort it would take to create it, and to see exactly what the tradeoffs would be. T h i s sounds prohibitively expensive, and indeed we do not expect that a distributed system where every remote message obeys transactional semantics would be efficient, it should be kept in mind that this problem i n PyRemote exists only for type objects, and although they present great flexibility as well as the greater challenge i n terms of coherence, we may well expect that messages that modify type objects represent a very small subset of the messages sent during the r u n of a typical P y t h o n program. A transactional system, although expensive, might not turn out to be prohibitively expensive.  5.1.3  M o v i n g objects  A s discussed i n Section 4.3, moving an object is implemented w i t h the usual copy operation and detaching the object from plain object references by replacing a l l such references w i t h references to a proxy object i n lookup dictionaries where it is present. W e may envision a number of policies for this process. For example, we might search only the module corresponding to the script w i t h which the P y t h o n interpreter is invoked (__main__); we might search all 'user' modules (as opposed to modules known to correspond to the standard library); we might search all loaded modules; or we might choose some intermediate option (e.g.  excluding built-in  modules ). Other places where references may be held include object dictionaries 1  (of attributes: 'member variables') and closures of function (and method) objects. The current implementation replaces references only i n __main__ , and does 2  1  ' B u i l t - i n ' modules are modules that are implemented in C ; they are generally fairly  'immutable' a n d we do not expect to be able to perform the same kind of substitutions in these as in pure P y t h o n modules. 2  A c t u a l l y , the current implementation, for testing purposes only, attempts to replace  references also in a module n a m e d t e s t c l a s s , if present a n d loaded.  36  n o t l o o k at a n y classes o r o b j e c t s ,  but this should be considered a n  intermediate  s t a t e o f d e v e l o p m e n t r a t h e r t h a n a f i n a l s o l u t i o n . W e e x p e c t t h a t i t is p r o b a b l y n o t t h e o p t i m a l s o l u t i o n — r a t h e r , we e x p e c t t h a t r e p l a c i n g references  i n the  __main__  m o d u l e w o u l d b e wise, a t t h e v e r y least, a n d q u i t e p o s s i b l y a l l 'user' m o d u l e s . T o p e r f o r m further searches a n d replacements w o u l d be s t r a i g h t f o r w a r d a n d s i m p l e , a l t h o u g h it is n o t d o n e at t h e p r e s e n t s t a t e o f d e v e l o p m e n t . not  whether o r how this could b e  done, but i / a n d  to what degree  T h e q u e s t i o n is it  should b e  done.  t r a d e - o f f m u s t b e f o u n d b e t w e e n efficiency ( b o t h i n t e r m s o f w o r k o n t h e i n t e r p r e t e r , a n d i n r u n t i m e p e r f o r m a n c e , as a l l o f t h e s t r u c t u r e s m e n t i o n e d m u s t b e  searched  l i n e a r l y ) a n d c o m p l e t e n e s s . O n o n e e x t r e m e , n o o b j e c t s m i g h t ever b e m o v e d at a l l ; o n t h e o t h e r , e v e r y s t r u c t u r e c a p a b l e o f h o l d i n g a reference m u s t b e e x a m i n e d ; o p t i m a l a p p r o a c h lies o b v i o u s l y s o m e w h e r e  the  i n b e t w e e n , b u t it is n o t c l e a r w h e r e .  T o f i n d this g e n e r a l o p t i m u m requires p r o f i l i n g of realistic a p p l i c a t i o n s m a k i n g use of the P y R e m o t e f r a m e w o r k — a n d of course, n o such a p p l i c a t i o n s presently exist.  5.1.4  GUIDs  T h e p r e s e n t i m p l e m e n t a t i o n o f G U I D s is q u i t e p r i m i t i v e a n d s u i t a b l e o n l y for s m a l l scale testing;  a 1 2 8 - b y t e f i n g e r p r i n t is g e n e r a t e d  s i m p l y b y repeatedly calling the  s y s t e m r a n d ( ) f u n c t i o n . A l t h o u g h t h i s suffices for t h e c u r r e n t i m p l e m e n t a t i o n , dep l o y m e n t o n a larger s y s t e m w o u l d need a m o r e intelligent fingerprint that p r o v i d e s (at t h e least) a r e a s o n a b l y s t r o n g p r o b a b i l i s t i c g u a r a n t e e t h a t G U I D s w i l l t r u l y b e unique. M o r e p r o b l e m a t i c a l l y , a G U I D is g e n e r a t e d  every time  a n o b j e c t is c r e a t e d .  T h i s is a v e r y w a s t e f u l p o l i c y : T h e i d e n t i f i e r is o n l y n e c e s s a r y  when an object  is  involved i n mobility. Deferred G U I D generation has been i m p l e m e n t e d , b u t e n a b l i n g  37  A  it causes an extremely obscure bug, possibly caused by a primitive copy w i t h i n the interpreter resulting in multiple identical objects w i t h the same ' n u l l ' identifier. A s the system now stands, it would be expensive for a system that allocates large numbers of small objects.  5.2  Local performance  A series of tests were run to examine the performance of the modified interpreter on code that does not make use of any distributed features. We consider it essential that the modifications thus made not incur prohibitive costs to normal P y t h o n programs. Ideally, no additional cost should be incurred; however, we may expect that a project of this nature (and w i t h only one programmer) should incur some costs on a first attempt. Please note that no major efforts have yet been made to optimise the system. The system used to run the tests was an A t h l o n 64 3000+ (2.00 G H z , 512 k B cache) w i t h 1 G B of R A M , running Gentoo L i n u x (kernel version 2.6.21).  The  baseline P y t h o n interpreter used for comparison is the version that the P y R e m o t e code was based on, version 2.4.3. A l l interpreters were compiled w i t h gcc  version  4.1.2. Unless otherwise specified, compilation options were left at the defaults. T h e original intent was to run, at the very least, tests using two well-known P y t h o n benchmarking suites, P y B e n c h and ParrotBench, as well as P y r e x as an example of a real-world application. Unfortunately, ParrotBench, specifically designed to be extremely demanding in terms of language compliance and used as much as a compliance test as a performance measure, failed to run using our modified i n terpreter due to an unexpected hash value. It is currently unclear whether this represents an invalid value, or an unexpected value that ParrotBench simply does 38  PyRemote C P y t h o n 2.4.3  Run 1  Run 2  Run 3  Run 4  Run 5  Average  3604 3285  3625 3275  3621 3300  3626 3239  3605 3277  3616 3275  Table 5.1: P y B e n c h results (times in milliseconds) not accommodate.  5.2.1  PyBench  P y B e n c h is a benchmark suite for P y t h o n created by eGenix [14]. It performs a wide variety of tests. Each invocation of P y B e n c h actually runs ten iterations of the entire test suite, producing detailed information as well as an over-all average. Scores here represent averages, and our tests represent entire sets of iterations. Table 5.1 summarises the tests. O n average, PyRemote is 1 0 % slower than the stock C P y t h o n interpreter. It is not entirely clear exactly why the penalty is so great, since the changes to the interpreter were very s m a l l — n o t e that no distributed computation was performed i n this benchmark suite. In order to trace the slowdown, the PyRemote interpreter running pybench was analysed using the g p r o f profiling tool included w i t h the gcc compiler suite [15]. In section 5.1.4, we expressed a concern that the lack of deferred G U I D initialisation would slow down execution. We therefore examined the profile, extracting the data for initialisation and copying of G U I D s . A s seen i n Table 5.2, functions responsible for initialising and copying G U I D s accounted for about 1.26% of the program's total running time. T h i s is not a very large number, but the cost of PyGuid_Make () is incurred at object creation time, and a test or application that spends more time creating small objects will therefore incur a proportionally higher cost. It is a problem that should be e l i m i n a t e d — b u t it does not account for the entire slowdown.  39  % running time  Seconds  # calls  PyGuid_Make  Function  0.97  0.30  83,111,266  PyGuid_Copy  0.29  0.09  PyGuid_Hash  0.00  0.00  T a b l e 5.2:  C o s t of P y G u i d  No  data  2231  functions  L i s t i n g 5.1: P y r e x t e s t i n g  code  #!/bin/sh for i in 1 2 3 4 5 do # Ran 10 loops, repeat the whole exercise 5 times ./python Lib/timeit.py —n 10 —r 5 \ —s 'from Pyrex.Compiler.Main import compile' \ 'for x in xrange(lOO):' \ ' compile("../pyrex—demos/spam.pyx")' \ ' compile(" ../pyrex—demos/numeric.demo.pyx")' \ ' compile(" ../pyrex—demos/primes.pyx")' done  B r a n c h i n g was a d d e d t o s o m e b u i l t - i n t y p e o p e r a t i o n s (see S e c t i o n 4.6), b u t p r o f i l i n g i n f o r m a t i o n does not indicate t h a t a n y significant t i m e was spent i n these functions.  5.2.2  Pyrex  P y r e x is a p r o g r a m m i n g l a n g u a g e c r e a t e d t o a i d t h e d e v e l o p m e n t  of P y t h o n m o d -  ules, a l l o w i n g a m i x t u r e o f P y t h o n classes w i t h C d a t a t y p e s a n d s t r u c t u r e s , f i n a l l y c o m p i l e d into C extension m o d u l e s w i t h no b u r d e n o n the p r o g r a m m e r to e x p l i c i t l y create glue code.  I n o u r tests, P y r e x was  used to compile the b u n d l e d  programs: numeric_demo.pyx, spam.pyx, and p r i m e s . p y x .  example  A s t h e s e p r o g r a m s are  a l l v e r y s m a l l , e a c h r u n o f t h e test c o m p i l e s e a c h p r o g r a m 100  times.  T i m i n g was p e r f o r m e d using the s t a n d a r d P y t h o n m o d u l e , t i m e i t ,  which  a t t e m p t s t o use t h e b e s t t i m i n g facilities a v a i l a b l e o n a n y g i v e n t a r g e t p l a t f o r m [16].  40  PyRemote C P y t h o n 2.4.3  Run 1  Run 2  Run 3  Run 4  Run 5  Average  4.49 7.63  4.48 7.58  4.52 7.61  4.50 7.63  4.51 7.59  4.50 7.60  Table 5.3: P y r e x timings (seconds per t i m e i t loop) The script presented i n Listing 5.1 was used to invoke the tests. Results are given in Table 5.3. Curiously, P y R e m o t e actually appears to run significantly faster than the stock P y t h o n i n t e r p r e t e r — b y as much as 69%! Since no effort has been made to optimise the interpreter, and since there is code i n PyRemote that adds more code to be executed (PyGuid function calls, etc.; see Section 5.2.1), it seems unlikely that this reflects a true performance increase. If it represents an error i n the tests, it is unclear wherein this error consists: Identical output is generated by the P y r e x compiler regardless of which interpreter is used to run it, and the timings obtained by the t i m e i t module are consistent w i t h those obtained using the standard U N I X timing utility, t i m e . A l t h o u g h we cannot explain these numerical results, they are presented for completeness and as a curiosity. We recommend that the P y B e n c h results be viewed as representative of the true cost of P y R e m o t e — a n d a sign that optimisation work should be done.  5.3  Distributed performance  To test the overhead of P y R e m o t e alone, we ran a distributed system of two nodes on the same c o m p u t e r — t h e results thus include the overhead of the T C P / I P stack, but not network transmission. The test program called a function of a remote object 100 times; the function did nothing but return an integer (which was returned by  41  copy; see Section 4.2.1). The cost of evaluating the function call itself was negligible (see Table 5.4). The distributed test system was analysed using the p r o f i l e module of the P y t h o n standard library [16]. This particular profiler measures only time spent in P y t h o n code, not the lower-level C libraries; total C P U time in this test run (slowed down by the profiler) was measured as 75.58 seconds. The bottleneck is the client; the server spent the vast majority of its time waiting for user input (the test server is terminated by a key press). O n the client side, although a few percent of the time was spent in networking code, output formatting, etc., the overwhelmingly largest culprits were the pickling code («19.6 seconds; 26%) and the a c q u i r e ( ) method of lock objects («12.9 seconds; 17%). The data are very approximate, and we do not wish to cite detailed data and give the impression of more confidence in these numbers, but should be taken as an indication of problem areas. It should be noted that P y t h o n actually contains two pickling modules: p i c k l e , which is implemented purely in P y t h o n , and c P i c k l e , which is implemented in C and, although somewhat less customiseable, is a very great deal more efficient—it is up to 1,000 times faster than the P y t h o n implementation [16]. Originally, the P y R e m o t e implementation used the P y t h o n p i c k l e module for ease of implementation, as some (very minor) changes had to be made; we expect that the system is more efficient using the c P i c k l e module. It turns out, however, that the variance of the timings is extremely high, rendering this comparison useless; probably due to non-deterministic interaction of locks and threads (see Table 5.4). We do not show average values for the remote-call runs; the variance is too great for mean values to be meaningful. We may say w i t h some confidence that the cost of a remote invocation—apart from network traffic, which is unpredictable—is on the  42  Remote ( p i c k l e ) Remote ( c P i c k l e ) Regular call  Run 1  Run 2  Run 3  138 533 0.00845  546 485 0.00900  506 532 0.00829  Run 4  Run 5  155 128 0.00823  174 405 0.00826  Average N/A N/A 0.00845  Table 5.4: Distributed performance (seconds per 10,000 calls) order of 35 milliseconds ( ± 2 5 milliseconds, or more). R u n n i n g g p r o f to find the C profile of the same code reveals that almost 1 0 % of the program's time was spent in a dictionary lookup method l o o k d i c t _ s t r i n g ( ) , mostly when called from G e n e r i c G e t A t t r O . This is not necessarily indicative of any problems; rather we should expect attribute lookup to consume a great deal of time, since the test involves the repeated lookup and call of a function which itself involves very little processing. (The single most 'expensive' function was the interpreter function PyEval_EvalFrame ( ) , which evaluates a stack frame object; it accounted for some 3 4 % of the runtime. It may be worth noting that it occupied over 5 4 % of the time i n the P y B e n c h test.) We conclude that the P y R e m o t e source code would benefit from retooling i n terms of the locking scheme. Also, the data marshalling, even apart from switching to a version of the more efficient c P i c k l e module, is unoptimised and can probably be made much more efficient.  T h e locking scheme, however, is likely to be the  greater bottleneck i n more complex scenarios where more threads compete for locks on resources related to greater numbers of remote peers.  43  Chapter 6  C o n c l u s i o n and discussion 6.1  The challenge of CPython  The P y R e m o t e project aspired to add capabilities to the P y t h o n programming language. T h i s has obvious benefits—it harnesses the power of an existing and welltested programming language w i t h a powerful standard library, many third-party libraries, existing applications, and developers. However, it also creates many challenges in that the pre-existing system was in no way designed to support the type of modifications that were necessary, and in some cases was not easily amenable to these changes. A l t h o u g h P y t h o n , as a programming language, is simple, consistent, and employs an explicit policy that "There should be one, and preferably only one, obvious way of doing s o m e t h i n g " , C P y t h o n is often rather confusing 'under the 1  hood'; so for instance there are several ways in which a method call may be resolved, and several places where something so basic as an object attribute may be looked  'From The Zen of Python by Tim Peters; see http://www.python.org/doc/Humor. html#zen.  44  up.  I n t h e s i m p l e s t case, m a n y a t t r i b u t e s m a y b e h e l d e i t h e r i n 'slots' i n t h e t y p e  o b j e c t or i n t h e o b j e c t ' s a t t r i b u t e d i c t i o n a r y ; t h e m e t h o d c a l l r e s o l u t i o n is s t i l l m o r e complex.  6.1.1  Type equivalence  S o m e o f t h e m o r e i n s i d i o u s b u g s t h a t were u n c o v e r e d i n t h e c o u r s e o f i m p l e m e n t ing  t h e P y R e m o t e p r o j e c t were r e l a t e d t o t h e n o t i o n o f  type equivalence.  Inter-  n a l l y , b u i l t - i n t y p e s i n C P y t h o n t e n d t o h a v e t w o u t i l i t y m a c r o s for efficient checking, an has  object  e.g.,  for t h e d i e t  type  ( P y D i c t O b j e c t ) , P y D i c t _ C h e c k ( o ) t o c h e c k if  o is a d i c t i o n a r y , a n d P y D i c t _ C h e c k E x a c t (o)  exactly t h e  A l t h o u g h we  type diet.  type  T h e former accepts  consider a P y R e m o t e p r o x y type  to check if a n object  a subclass;  logically  the latter does  e q u i v a l e n t t o its  o  not.  target  t y p e , P y D i c t _ C h e c k E x a c t ( ) l o o k s for a reference t o t h e p r e c i s e C s t r u c t u r e r e p resenting the  type  object.  P y D i c t _ C h e c k E x a c t ()  T h e consequence  o f t h i s is t h a t  a type  that  passes  m a y be safely a s s u m e d to have the exact m e m o r y layout a n d  d a t a m e m b e r s s p e c i f i e d b y P y D i c t O b j e c t . A t y p e t h a t o n l y passes P y D i c t _ C h e c k ( ) , therefore, s h o u l d have the m e t h o d s specified b y the t y p e object, b u t m a y not have the exact same m e m o r y layout. Unfortunately,  some of the code i n the d i c t o b j e c t  module assumed  pre-  cisely this, a n d therefore failed (in u n e x p e c t e d ways) w h e n faced w i t h d i c t i o n a r y p r o x i e s ( w h i c h o b v i o u s l y d o not  h a v e t h e s a m e m e m o r y l a y o u t as a c t u a l d i c t i o -  n a r y o b j e c t s ! ) — t h a t is, P y D i c t _ C h e c k ( ) w a s u s e d w h e r e P y D i c t _ C h e c k E x a c t () was n e e d e d . S e v e r a l s u c h b u g s were f o u n d , a n d we s u s p e c t t h a t t h e r e m a y b e m o r e t h a t P y R e m o t e tests h a v e n o t u n c o v e r e d ( a n d t h a t P y R e m o t e m a y b e e n t i r e l y u n a f f e c t e d by).  Since the code chiefly deals w i t h internal structures, w h i c h i n the n o r m a l course  45  of execution w i l l never be anything but precisely d i e t objects, it is not likely that they w i l l be encountered except by modification of the sort we have done.  2  Errors of this type, whether caused—as i n this e x a m p l e — b y bugs i n the C P y t h o n interpreter, o r — a s i n many, many cases—by programming error in writing P y R e m o t e due to an incomplete or faulty understanding of the C P y t h o n codebase caused the implementation to consume much more time than initially estimated.  6.1.2  Pickling limitations  P y R e m o t e uses Python's existing object marshalling facilities, the p i c k l e and m a r s h a l modules. Although minor additions were made (e.g., the ability to marshal types as described i n Section 4.2), there are certain objects that are presently 'unpicklable'. T h i s is here considered to be more i n the nature of a limitation of the library than of PyRemote: The solution is deferred to improving the p i c k l e library rather than working around it.  6.1.3  Old and new classes  For historical reasons, the P y t h o n language actually supports two types of classes and instances.  W h i l e largely transparent to users, the underlying implementa-  tions i n C P y t h o n are completely different. T h e P y R e m o t e implementation does not presently support the use of "old-style" classes. A l t h o u g h it could be done i n much the same way as the proxy system for "new-style" classes, it was deemed an irrelevant duplication of effort for the purposes of this project; furthermore, the use in P y t h o n of "old-style" classes is discouraged. In future revisions of C P y t h o n , they 2  We  plan to examine the PyDictObject code for instances of this problem a n d submit  patches when more time is available. It would probably not be relevant to do so based on the existing P y R e m o t e code, however, as it is based on a C P y t h o n codebase several minor versions b e h i n d the current one.  46  w i l l be removed.  6.1.4  Unsupported operations  Not quite a l l 'standard' operations are supported by P y R e m o t e proxies, for the simple reason that the arguments cannot be transmitted. Certain entries i n the type structure's table are functions that expect, as parameters, pointers to raw memory, which cannot intelligently be marshalled. Operations not supported by P y R e m o t e proxies include the intrinsic t p _ p r i n t ( ) method (which is deprecated and should not pose a problem), coercion i n numeric types (this also is not so great a problem as it may seem, as intrinsic number types are small, immutable objects which are always copied and never sent by proxy) and operations on the b u f f e r type (whose purpose is precisely to expose object data as raw bytes).  6.2  Future work  PyRemote, i n its present state, more or less represents a proof of concept and not a production ready system. Certain unsupported operations (see Section 6.1.4) should be implemented or at the very least handled more gracefully (for example, buffers might be copied automatically rather than having operations thereupon refused). Garbage collection is not presently implemented for remote objects. Finally, some features were left out due to simple lack of time and underestimations of the complexity of the implementation task.  6.2.1  Garbage collection  The present PyRemote implementation does not perform distributed garbage collection. To a certain extent, the problem is 'easy': T h e P y t h o n interpreter performs 47  reference counting (the default implementation uses a reference counting garbage collector w i t h cycle detection), and the only thing that prevents an object, once proxies to it exist, from being properly garbage collected, is the existance of a reference to the object held by the PyRemote machinery, used to resolve remote requests. A garbage collection mechanism, therefore, would 'simply' need to track the number of remote peers that own proxies to the object; once this number reaches 0, unless there exist local proxies, the object may be removed from the P y R e m o t e cache, and garbage collected if appropriate. However, this requires a proper distributed garbage collector to deal intelligently w i t h 'ordinary' problems of the area. This is beyond the scope of this thesis. Given a distributed garbage collection algorithm, it would probably be fairly straightforward to implement distributed garbage collection in PyRemote. T h e system already satisfies the local tracking criteria mentioned by Bennett in his design of the Distributed Smalltalk garbage collector [3]: Purely local garbage is collected, and references to remote objects are stored in a centralised location where a distributed collector can easily look them up.  6.2.2  Object mobility policies  For reasons laid out in Section 6.1, the PyRemote implementation does not support all the features that were originally planned. One of the more interesting features that were left out, inspired by the object mobility paper on Emerald [1], is the no-' tion of need or usage pattern based object mobility policies. In effect, the system may detect whether an object, invoked remotely, is likely to be used enough (more accurately, whether the remote usage is likely to exceed the local usage by a sufficient margin) that it would be profitable to move it rather than to create a proxy. i  48  This was, then, to be accompanied by a form of 'stickiness' property whereby a programmer might request that the object stay put. W h i l e this may seem micromanagement, it may be relevant for objects where availability is a critical point and certain nodes i n a network may be known or expected to be more reliable than others. Unfortunately, there was not enough time to implement this feature, which could perhaps substantially improve performance of applications making heavy use of remote calls. It would be necessary to implement some limited form of dynamic profiling, and a policy to move objects based on these measured parameters. Perhaps more useful still is the notion of using profiling (whether wholly dynamic or cached) to determine which portions of an application execute i n isolation and so make good candidates for 'shipping off' to other nodes for processing. This feature is not i n the current version of PyRemote, b u t the groundwork exists: In P y t h o n , due to its powerful introspection, and the profiling module present in the standard library; and i n PyRemote, which can move code objects to remote nodes.  6.2.3  Transactional consistency semantics  A s mentioned in Section 4.6, it could be interesting to investigate the possibility of implementing real transactional semantics for type consistency, rather than the more efficient but somewhat ad-hoc approach used i n the current implementation.  6.2.4  Nameserver  The current P y R e m o t e implementation does not have an elegant way of connecting to a remote peer and obtaining a directory of services provided. Indeed, the present service is rather ad-hoc in nature, where a server offers an arbitrary (user-specified)  49  object, and clients must connect to the server by IP address and port (although a default server port of 10101 is used). A useful addition to the system would be a more intelligent interface, and a more flexible encoding of hosts.  6.3  F i n a l thoughts  It is perhaps in the nature of a thesis of this type that it is even more a learning experience of existing systems than it is a production of novel knowledge. W h e n the PyRemote project was begun, the C P y t h o n interpreter was an opaque unknown, and the early stages of modification were very much i n the nature of t r i a l and error, where core features of the interpreter were overlooked or entirely unknown. Significant amounts of time (it is tempting to say 'inordinate') were spent (or lost) on not only fixing bugs, but refactoring or redesigning the core PyRemote modules as interactions between different interpreter components became more clear.  6.4  O b t a i n i n g the code  The code shown in examples and used to produce the cited test results was locally denoted as revision 215. It may be obtained as a patch set against P y t h o n version 2.4.3. It is a fairly large diff set which includes all modifications, test code, and various generation scripts. (The size of the diff, 37 M i B uncompressed, is rather disproportionate to the relatively smaller amount of changes to the code base.)  If  you want to hack on the latest version of PyRemote, a further diff is available of revision 243, which is current as of the finalisation of this thesis. T h i s revision contains some incomplete work and may not be entirely functional, but in particular contains more and more robust consistency code.  50  P y t h o n 2.4.3 source code (several download options, checksums listed): http://python.org/download/releases/2.4.3/ P y R e m o t e diff, revision 215 against P y t h o n 2.4.3 (6.0 M i B ) : http://petterhaggholm.net/pyremote/pyremote-r215-diff.tar.bz2 md5sum: bc47b0378859e3468cd0516a939e3acb P y R e m o t e diff, revision 243 against revision 215 (3.3 M i B ) : http://petterhaggholm.net/pyremote/pyremote-r215_243-diff.tar.bz2 md5sum: 514651a78705a979b6f7e6062ecfd06c  51  Bibliography [1] E. J u l , H. Levy, N. Hutchinson, and A . Black, "Fine-grained mobility i n the E m e r a l d system," ACM Transactions on Computer Systems, vol. 6, pp. 1 0 9 133, February 1988. [2] A . P. Black, N . C. Hutchinson, E. J u l , and H. M . Levy, " T h e development of the E m e r a l d programming language," i n HOPL ACM  SIGPLAN  III: Proceedings of the third  conference on History of programming languages, (New York,  N Y , U S A ) , pp. 11-1-11-51, A C M Press, 2007. [3] J . K. Bennett, " T h e design and implementation of Distributed Smalltalk," i n Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA)  (N. Meyrowitz, ed.), vol. 22, (New York,  N Y ) , pp. 318-330, A C M Press, 1987. [4] J . Armstrong, R. V i r d i n g , C. W i k s t r o m , and M . Williams, Concurrent Programming in Erlang. Prentice-Hall, second ed., 1996. [5] J . L. Armstrong, " T h e development of Erlang," i n International Conference on Functional Programming, pp. 196-203, 1997. [6] S. D. B r u d a , P. Haggholm, and S. Stoddard, "Distributed, real-time programming on commodity P O S I X systems: A preliminary report," i n Proceedings of 52  the 5th International Symposium on Parallel and Distributed Computing, I E E E Computer Press, 2006. http://turing.ubishops.ca/home/bruda/rtsync/. [7] L. Prechelt, " A n empirical comparison of C, C + + , Java, Perl, P y t h o n , Rexx, and T e l for a search/string-processing program," i n Technical Report 2000-5, Universitat Karlsruhe, Fakultat fur Informatik, 2000. [8] I. de Jong, " P y t h o n Remote Objects." http://pyro.sourceforge.net/. [9] T . F i l i b a , "Remote P y t h o n C a l l . " http://rpyc.wikispaces.com/. [10] G . Payer, " P y t h o n Remote C a l l Module." http://www.gernot-payer.de/pyrcall/. [11] M . Simionato, " T h e P y t h o n 2.3 method resolution order." http://www.python.org/download/releases/2.3/mro/. [12] A . Vassalotti, "Pickle: A n interesting stack language." http://peadrop.com/blog/2007/06/18/pickle-an-interesting-stack-language/. [13] A . D. B i r r e l l and B. J . Nelson, "Implementing remote procedure calls," ACM Transactions on Computer Systems, vol. 2, pp. 39-59, February 1984. [14] M.-A. Lemburg, "Pybench." http://www.egenix.com/files/python/pybench-1.0.zip. [15] J . Fenlason and R. M . Stallman, " T h e G N U profiler."  53  http://www.gnu.org/soitware/binutils/maxtual/gproi-2.9.1/html mono/gprof.html. [16] G. van Rossum, " P y t h o n 2.3.4 library reference." http://www.python.org/doc/2.3.4/lib/lib.html.  54  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items