UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A parallel algorithm for ASN.1 encoding/decoding Joseph, Carlton Charles Agnel 1991

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1992_fall_joseph_carlton_charles_agnel.pdf [ 3.69MB ]
JSON: 831-1.0065289.json
JSON-LD: 831-1.0065289-ld.json
RDF/XML (Pretty): 831-1.0065289-rdf.xml
RDF/JSON: 831-1.0065289-rdf.json
Turtle: 831-1.0065289-turtle.txt
N-Triples: 831-1.0065289-rdf-ntriples.txt
Original Record: 831-1.0065289-source.json
Full Text

Full Text

A PARALLEL ALGORITHM FORASN.1 ENCODING/DECODINGbyCarlton Charles Agnel JosephB.Eng. (EE), McGill, 1990.A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinTHE FACULTY OF GRADUATE STUDIESDEPARTMENT OF ELECTRICAL ENGINEERINGWe accept this thesis as conformingto the required standard THE UNIVERSITY OF BRITISH COLUMBIASeptember 1992© Carlton C.A. Joseph, 1991In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.Department of Electrical EngineeringThe University of British ColumbiaVancouver, CanadaDate^DE-6 (2/88)AbstractComputers differ in many ways: the characters may be coded in ASCII, EBCDIC orUNICODE; integers can vary in size; processors can operate in Big-Endian or Little-Endianformat; and programming languages can use different memory representations for data. TheInternational Standards Organization (ISO) provides an Open System Interconnection (OSI)seven-layer reference model[20] to facilitate heterogeneous inter-computer communication.Multimedia technology places new demands on inter-computer communication speedsbecause of the higher bandwidth requirements of voice and image data. Various implementa-tions of high speed networks meet this demand at the lower layers. The presentation layer ofthe ISO reference model is the dominant component of the total protocol processing time[5].Hence, the presentation layer represents a bottleneck in communication systems that use highspeeds networks. The presentation layer is traditionally implemented in software but now acombined parallel hardware/software approach is taken.Prior to this thesis, VASN.1[3][24][25] was the only parallel hardware/software com-bination that achieved high throughput in the presentation layer. The parallel algorithm ofVASN.1 limits the scalablity of hardware and hence the achievable parallelism. VASN.1users are expected to use the parallel data structure that it uses. This limitation reduces theflexibility for VASN.1 users and incurs an additional overhead for handling the data structure.The parallel system presented in this thesis avoids the serious problems found in VASN.1.A new parallel algorithm has been developed which provides users with flexible serial datastructures 2. The encoding algorithm converts flexible serial data structures into parallelinformation which allow simultaneous processing. The conversion from a serial to a parallelinformation has not been previously explored in the literature. The decoding algorithmalso provides maximum parallelism while providing a flexible serial data structure forapplications to use. The algorithm has been designed in such a way that any changes inA parallel data structure is a data structure that contains extra information required for a parallel algorithm.2^A serial data structure is a data structure that holds no extra information required for a parallel algorithm.iithe ASN.1 specification can be easily incorporated into the system. The final advantage isthat the hardware scales gracefully and allows for more parallelism than VASN.1. The initialsimulations are promising and show that the throughput achievable by the parallel algorithmis five times greater than that achievable by a serial implementation.iiiContents Abstract ^  iiList of Tables  viiList of Figures ^  viiiAcknowledgments 1 Introduction ^  11.1 Communication in Heterogenous Networks ^  11.1.1 ISO Presentation Layer Conversion Standard ^ 21.2 Presentation Layer Cost ^  41.3 Initial Parallel Analysis  61.4 Previous Work with ASN.1 ^  101.4.1 Single Processor ASN.1 Implementations ^  101.4.2 Multi Processor ASN.1 Implementations  111.5 Thesis Objectives and Outline ^  142 Parallel Architecture ^  152.1 Hardware Architecture  152.1.1 Contiguous Memory Management ^  172.2 Parallel Algorithm Outline ^  182.3 Memory Mapping Scheme  193 Encoding^  203.1 Encoding Interface Data Structures ^  203.2 Encoder Generation from ASN.1 Specification ^  233.3 Encoders Functions and Data Structures  243.3.1 EncodeStructBegin () ^  253.3.2 EncodeStructEnd ()  283.3.3 QEncode () ^  293.4 Encoding Flow Trace  303.4.1 Host Responsibilities ^  313.4.2 Queue-Processor  333.4.3 Processors ^  34iv4 Decoding^  354.1 Decoding Interface Data Structures ^  354.2 Decoder Generation from ASN.1 Specification ^  354.3 Decoders Functions and Data Structures  364.3.1 Node for Primitive Types^  374.3.2 Node for SEQUENCE and SET Types ^  374.3.3 Node for SEQUENCE OF and SET OF Types ^ 384.4 Decoding Flow Trace ^  394.4.1 Host Responsibilities  414.4.2 Queue-Processor ^  414.4.3 Processors  444.4.3.1 tsptr = tsend^  454.4.3.2 UPDATE bit set  464.4.3.3 find correct node ^  464.4.3.4 CONSTRUCTOR bit set ^  474.4.3.5 OF bit set ^  484.4.3.6 primitive  485 Hardware Models ^  505.1 Bus Arbitration Method ^ 505.2 Processor^  555.2.1 Integer Unit  565.2.2 Data Unit ^  575.2.3 Instruction Unit  575.2.4 Register File ^  575.2.4.1 Reg(0) = Zero Register ^  575.2.4.2 Reg(26) = Frame Pointer  575.2.4.3 Reg(27) = Stack Pointer ^  575.2.4.4 Reg(28) = Return Address  585.2.4.5 Reg(29) = Queue Address ^  585.2.4.6 Reg(30) = Processor Address  585.2.4.7 Reg(31) = Wait Result ^  585.3 Queue-Processor ^  596 Simulations ^  636.1 Simulation with System V IPC ^  636.2 Memory Mapper ^ 646.3 Assembler  656.4 VHDL Model  666.5 Utilities ^  666.6 Description of a Complete Simulation Cycle ^  677 Results  697.1 Need for Standard Test Vectors ^  697.2 Simulation Statistics ^  698 Future Work^ 738.1 Simulations  738.2 Processing Element ^  738.3 Parallel use of PEs  748.4 Fabrication Constraints  759 Conclusion ^  76Bibliography 78Appendix A Acronym Definitions ^  81Appendix B Cost per Instruction Mapping ^  82Appendix C Pros and Cons of Static Analysis  85Appendix D Processors Memory Read and Write Cycle^  86Appendix E ASN.1 PersonnelRecord Type ^ 87Appendix F ASN.1 Types Instruction Counts  89viList of Tables Table 1^CASN.1 Primitive encoding/decoding Routines ^ 6Table 2^Primitive Routines Instruction Count.^  7Table 3^Parallelism Speed Gain ^  7Table 4^Minimal Instructions for Encoding/Decoding ^  8Table 5^Queue-Processor Commands.  17Table 6^Processing Elements Signals ^  58Table 7^Instructions Executable On a Processing Element^ 59Table 8^Simulation Results in Mbit/s  70Table 9^Worst Case Simulation Results for PersonnelRecord. ^ 76Table 10^Simulations Instruction per Byte Values. ^  89VIIList of FiguresFigure 1^Abbreviated OSI Seven Layer Model ^  2Figure 2^Example of Local Representation to Transfer Syntax ^ 3Figure 3^ASN.1 Description of Personnel Information  4Figure 4^ASN.1 Processing Cost. ^  5Figure 5^Serial vs Parallel Encode Time  10Figure 6^VASN.1 Overview^  11Figure 7^Parser/Assembler and Encoder/Decoder ^  13Figure 8^Parallel Hardware Setup ^  16Figure 9^Contiguous Memory: Initial  17Figure 10^Contiguous Memory: Allocated  18Figure 11^Memory Mapping Scheme ^  19Figure 12^Partial List of Mapping Rules  21Figure 13^ASN.1 Example ^  22Figure 14^C Type Equivalent of Figure 13 ^  22Figure 15^Transfer Syntax of Figure 13 in IDX Structures ^ 23Figure 16^Generated Encoding Routines for Figure 13  24Figure 17^Encode Queue Element. ^  25Figure 18^Encoding Cumulative Length Data Structures ^ 26Figure 19^Data Dependency Graph of Figure 13 ^  27Figure 20^Postponed ASN.1 Type ^  28Figure 21^Queue with the Encode Example Calls  31Figure 22^Partial Flow Diagram for Small Encoding Example ^ 32Figure 23^Decoding Queue-Element    36Figure 24^Node Data Structure of Type Tree ^  36Figure 25 Two Decode Type Trees Comparing SET and SEQUENCE ^ 39Figure 26^Transfer Syntax of Figure 13  40Figure 27^Decode Type Tree of Figure 13 ^  40Figure 29^Partial Flow Diagram For Decoding Example ^ 42Figure 28^Algorithm of Queue-Processor  43Figure 30^Decoding Algorithm for Processors^  45Figure 31^Daisy Chain and Rotating Daisy Chain  50Figure 32^Partial Diagram of RDC Arbitration  53Figure 33^RDC Timing Diagram ^  54Figure 34^Processor Block Diagram  56Figure 35^queue-processor  60Figure 36^Queue Information ^  61Figure 37^Processor State Information ^  62Figure 38^Software used in Simulation  63Figure 39^Development of Algorithm using Unix and System V IPCs ^ 64Figure 40^Example Assembler Code ^  65Figure 41^Local Cache Setup ^  74Figure 42^Multi-Bus Global Cache  75Figure 43^Memory Read/Write Timing Diagrams ^  86Figure 44^Ten Integers Type ^  89ixAcknowledgmentsI would like to express my gratitude to Dr. Mabo Ito for his suggestions and forsupporting me throughout my graduate studies. I also would like to thank Dr. GeraldNeufeld for his insight on ASN.1 and his very valuable comments. A special thanks toMike Sample for providing me with statistics on ASN.1 processing and Donna Fuller for hersupport during the preparation period of this thesis. Lastly, I would like to thank my parentsfor their continuous support, encouragement and understanding.x1 Introduction Section 1.1 of this chapter covers the need for communication in heterogeneous networks.Section 1.2 presents information showing that the computation cost of the presentation layeraccounts for the majority of the total OSI protocol cost[5]. Section 1.3 suggest a parallelapproach to reduce this cost. Section 1.4 discusses the approaches of other researchers andtheir relative advantages and disadvantages. Section 1.5, the final section of the introduction,states the objectives and the outline of this thesis.1.1 Communication in Heterogenous NetworksIn industry, at home and in educational institutions, there are a large variety of computerhardware configurations available from various manufacturers. There exists a need forcomputers based on different hardware structures to communicate with one another. Severalvendors address this issue by implementing protocols so that networks with heterogenouscomputers can communicate. Two of these vendor-specific protocols are Sun Microsystems'XDR[6] and Apollo Computer's NDR[8].To aid in the goal of truly inter-operable heterogenous systems, ISO has constructed theOSI seven-layer protocol reference model[17][20]. Figure 1 shows the top three layers ofthe OSI seven-layer model. Any layer n of Host A communicates with its peer on layer nof Host B using a preset protocol. For instance, the presentation layer's peers in Host Aand B communicate using a presentation protocol. The protocol exchanges information byexchanging data units known as protocol data units (PDU) 3 . The presentation layer in HostA and Host B do not directly exchange PDUs. The PDU is propagated down through to thelower layers where layer one is responsible for exchanging information between Host A andHost B. Within the OSI model the presentation layer is responsible for the transformationof information from a machine-dependent format to a machine-independent format and vice-versa. Section 1.1.1 presents an outline of how this is achieved in a standard way.3^A list of all the acronyms used in this thesis is presented in Appendix A1Application ProtocolPresentation ProtocolSession Protocol-4 ^..3$ ^,4:1 ^- a--4.-Application ApplicationInterface InterfacePresentation PresentationInterfaceVh.InterfaceSession SessionLowerLayers ••• Lower• Layers••Host A Host BFigure 1 Abbreviated OSI Seven Layer Model1.1.1 ISO Presentation Layer Conversion StandardThis section presents a summary of the roles that Abstract Syntax Notation One(ASN.1[26]) and the Basic Encoding Rules (BER[27]) play in the presentation layer. Ref-erences [11][4] present additional detail on BER and ASN.1.ASN.1 can specify the data structures of the presentation protocol data units (PPDU) whentwo machines communicate (figure 1). The ASN.1 specification can then be compiled intoroutines that encode/decode to/from machine-specificlmachine-independent formats. BERare used when transforming data from a machine-specific format to a machine-independentformat and vice-versa. Figure 2 shows a possible communication scenario, where Host Atransmits information to Host B.2Tag Len. Encoded Bool Tag Len. Encoded Integer Tag Len.^Encoded StringIntegerHost A Internal RepresentationBooleanHost B Internal RepresentationBoolean•^ 32 •String40String40Application Layer (Machine Dependent)Encode Using BER Decode Using BERPresentation Layer (Conversion)Integer•^ 32 •Transmitted Transfer SyntaxSession Layer (Machine Independent)Figure 2 Example of Local Representation to Transfer SyntaxIn figure 2 one can see that the internal representation of the data on the two machines isdifferent and hence BER are used to convert the data into a generic transfer syntax (TS) andthen back to an internal representation. This diagram shows that the TS consists of a tag, alength and the encoded value (TLV4). In the triples, the tag represents the information that isstored in the encoded part, the length specifies the encoded part's length and the encoded partholds the actual encoded information. BER specify how information is encoded/decoded.Figure 3 shows an ASN.1 representation of a data structure that holds personnel information.Notice that the data representation is similar to those used in most high level programminglanguages.4^TLV = Tag Length Value.3Personnellnformation ::= SET {Name IA5String,Age INTEGER,Sex BOOLEAN }Figure 3 ASN.1 Description of Personnel InformationThe ASN.1 description given in figure 3 is machine-independent and the internal repre-sentations shown in figure 2 are the machine-specific information for one person. Host Aand Host B in figure 2 are exchanging personnel information for one person. When boththe machines receive an ASN.1 description of PersonnelInformation they are responsible formaintaining value instances of this data in their machine-specific format and converting itto/from a machine generic format when transmitting/receiving.1.2 Presentation Layer CostWhen using the OSI protocol stack, transferring from a machine-dependent to a machine-independent format can consume 97%[5] of the total processing time. Unpublished simula-tions carried out at the University of British Columbia (UBC) show that the cost is closer to70% when the more efficient CASN.1[12] 5 implementation is used in the presentation layer.The data gathered by Takeuchi[ 19] can also provide the processing cost associated with thepresentation layer. Takeuchi assumes the processing costs per packet of the OSI lower lay-ers are constant6 . In the presentation layer the cost per byte is calculated for the ASN.1PersonnelRecord shown in Appendix E. For the ASN.1 PersonnelRecord specification, theencoding/decoding instruction count for the presentation layer is Encode = 350 + 27 * bytesand Decode = 400 + 60 * bytes? . The spread sheet extract in figure 4 shows the cost ofthe presentation layer with reference to the total protocol processing cost. Data Set Oneshows the cost for sending/receiving PDUs at the lower layers. Data Set Two shows that the5^Details of CASN.1 will be given later.6^The assumption is correct if the CRC checking in the transport layer is omitted or computed by a hardware assist at no cost.7^Page 55 Sc Appendix C page 106-108 of Ref.D9].4presentation layer can consume a significant portion of the processing power as the packetsize increases.Data Set OneLayer Send Receive2 40 453 160 1604 400 3755 400 450Total 1000 1030Bytes/Pkt Layers 2-5Encode DecodeData Set TwoLayer 6Encode^Decode% LayerEncode6Decode100 1000 1030 3050 6400 75.31 86.14200 1000 1030 5750 12400 85.19 92.33400 1000 1030 11150 24400 91.77 95.95800 1000 1030 21950 48400 95.64 97.921600 1000 1030 43550 96400 97.76 98.943200 1000 1030 86750 192400 98.86 99.476400 1000 1030 173150 384400 99.43 99.7312800 1000 1030 345950 768400 99.71 99.87Figure 4 ASN.1 Processing Cost.The ASN.1 PersonnelRecord type in Appendix E consists of IA5String string types andone integer type. This leads to two important conclusions:Since the cost for encoding/decoding a strings is low, the costs attributed to the presenta-tion layer in figure 4 are mostly due to the encoding/decoding of the tag and the length.5Figure 4 already shows that the presentation layer occupies a significant portion of theprotocol stack even with a simple encoding/decoding to perform. The percentage of theprocessing time that the presentation layer uses is significantly higher for complex typessuch as integers and reals.1.3 Initial Parallel AnalysisThe initial parallel analysis is carried out by gathering statistics for the CASN.1[12] tool.CASN.1 takes an ASN.1 specification as input and generates C code that is capable of encod-ing/decoding PPDUs. In CASN.1, procedure calls are made to primitive encoding/decodingroutines that are shown in table 1. These calls take primitive types and transfer them backand forth from machine-specific format to machine-independent transfer syntax TLV triples.ASN.1 Type Encode Routine Decode RoutineBoolean  Encode_bool() Decode_bool()Integer Encode_intQ Decode_int()Bit String Encode_bits() Decode_bits()Octet String Encode_octs() Decode_octs()Null Encode_null() Decode_null()Object Identifier Encode_oid() Decode_oid()Universal Time Encode_UTCTime() Decode_UTCTime()Generalized Time Encode_GNLTime() Dec ode_GNLTime()External Encode_ext() Decode_ext()Table 1 CASN.1 Primitive encoding/decoding RoutinesBy using primitive encoding/decoding routines with constructor types such as sets andsequences, complex data types can be described. A set of primitive data types is shown infigure 3. The only difference between sets and sequences is that in a sequence the orderof the primitive data types is maintained and in sets it is not. The use of set and sequencedata types can be nested.6The primitive encoding/decoding calls constitute most of the time in CASN.1. Anestimate of the number of instructions used by these primitive calls was constructed bycompiling CASN.1's implementation of the primitives to assembler 8 . Table 2 presents thetotal number of assembler instructions counted 9. The static analysis in table 2 shows thatsignificant gains are achieved if the primitives are executed in parallel. Significant speedupis gained even if the cost incurred by the parallel algorithm is high. Table 3 shows how thecost of a parallel algorithm is amortized over the number of processors. The table ignoresthe requests placed on scarce resources such as a shared bus. It also assumes that the parallelalgorithm does not incur any communication overhead. These effects are apparent in thefinal evaluation of the system and thus the linear speedup shown in table 3 is not achieved.Primitives Encoding Instruction Count Decoding Instruction CountInteger 583 479Boolean 529 457Bits 668 523OCTET String 672 477Null 515 432Table 2 Primitive Routines Instruction Count.Processors—* 1 1 2 3 mPrimitives1 Serial Parallel Parallel Parallel Parallel1 100 200 200/2 200/3 200/m2 200 400 400/2 400/3 400/mn 100n 200n 200n/2 200n/3 200n/mTable 3 Parallelism Speed GainThe final analysis of the primitive encoding/decoding routines proves that even whenencoding/decoding is optimized for the simplest cases parallelism is applicable due to8^The cc —S option is used on a SPARC 2.Appendix C provides the pro and cons of using a static analysis.9^The count includes the cost of encoding/decoding the tag and the length.7the number of instructions that need to be performed. Table 4 presents the number ofinstructions 10 executed for encoding/decoding some of the primitive types assuming thefollowing simplifications:• All the tags and lengths are in short form.• The instruction counts for NULL, BOOLEAN, INTEGER and OCTET String, includethe instructions required to encode/decode the tag and length.• The INTEGER instruction counts were generated for a one byte positive integer.• The OCTET String instruction counts were generated for octets that start and end on aword boundary. The total length of the octet must be less than 127 bytes.Routine TS size in bytes Encode InstructionCountDecode InstructionCountNULL 2 12 13BOOLEAN 3 18 19INTEGER 3 20 16OCTET String 2+n 27+1.5n 21+1.5nTAG & LEN. 2 17 14Table 4 Minimal Instructions for Encoding/DecodingThe instruction count presented in table 4 does not consider:• The cost incurred by procedure calls.• The cost incurred when allocating memory.• The variable cost of handling non word-aligned OCTET Strings.• The cost of handling complex primitive encodings/decodings.• The cost of maintaining data dependencies within complex structure types.to^The Motorola 88100110] RISC processor instruction set was used.8The above analysis shows that if the primitive encoding/decoding calls are executed fastthen a significant speedup can be achieved. In the best case, the calls return as soon as theyare made. This thesis takes the approach that when a primitive call occurs it is pushed ontoa FIFO queue for servicing, therefore the calls take little time. The problem of servicingthese calls quickly is solved using the architecture proposed in chapter 2.Multiple processors, reading from the FIFO, are implemented to service the calls quickly.This essentially increases the number of instructions executed in a cycle. If there are nprocessors, then in one cycle, n instructions are executed while in a single processor systemonly one instruction is executed. Figure 5 shows how time is saved by encoding in parallelwith n processors. The encoded values could be an individual's personnel information asdiscussed in section 1.1.1. Note that there exists a time skew in the parallel encoding shownin figure 5. A similar skew also exists in the decoding algorithm. The following is thedescription of the skew.When encoding, the skew times are associated with the following actions.• T1 is the time delay between assigning each primitive call to the FIFO. The instructionsexecuted between the assignments account for the delay and are explained in chapter 3.• T2 is the time delay from Ti to the time the primitive call receives service.When decoding, the skew times are associated with the following actions.• T1 is the time required to read the length of the present encode value so that the nextvalue's starting address is known.• T2 is the delay between the end of T1 and when computation starts on the next value. Thedelay is due to the communication required by the parallel algorithm. The communicationrequired by the decoding parallel algorithm is explained in chapter 4.9Encoded BooleanLenTagLen Encoded IntegerTagIP, • ►^ Tag Encoded StringT2T11 Processor Encodingr Tag Len Encoded Boolean Tag Len Encoded Integer Tag Len^Encoded Stringn Processors Parallel EncodingTimeFigure 5 Serial vs Parallel Encode Time1.4 Previous Work with ASN.1Some of the previous work carried out in relation to the presentation layer in the ISOmodel includes the ISODE[23] project, the CASN.1[28] project, the a2c project[15] andthe VASN.1[3][24][25] project. Section 1.4.1 briefly discusses the ISODE, CASN.1 anda2c single-processor software used for ASN.1 encoding/decoding. Section 1.4.2 describesVASN.1, a multiprocessor ASN.1 encoder/decoder.1.4.1 Single Processor ASN.1 ImplementationsThe ISODE, CASN.1 and a2c projects are single processor programs 11 that take an ASN.1specification and create encoding/decoding routines to transfer to/from machine -independenttransfer syntax/machine -dependent format. The problem with using single-processor softwareto perform this function is that the achievable transfer rates are unacceptably low for highspeed networks. A 8 Mbits/s transfer rate was achieved by running a2c on a MIPS R3260(approximately 18 specmarks) with 48 megabytes of RAM.11^The programs are Unix-based C programs.1 0ParserAssemblerAInter-mediateMemoryEncoderDecoder HostMainMemorySystem Bus Interface1.4.2 Multi Processor ASN.1 ImplementationsFigure 6 shows the parallel hardware that VASN.1 uses to achieve a higher throughputcompared to a single processor implementation.Figure 6 VASN.1 OverviewThe following factors limit the hardware used in the design and simulations:• The design is based on a 16—bit microprocessor; hence, a complicated memory mappingscheme is required to map to today's typical 32—bit architecture. This mapping couldbe expensive.• The hardware setup is non-expandable because a ROM holds the encoding/decodingroutines.The above problems could be solved by adapting the algorithm to a 32—bit machine andconverting the ROM that holds the encoding/decoding routine to a downloadable RAM.However, VASN.1 fails to eliminate some of the other problems as easily. Figure 7 is theblock diagram of the hardware that is used for the Parser/Assembler and the Encoder/Decoderof figure 6. The following factors limit the hardware shown in figures 6 and 7:• The Intermediate Memory block of figure 6 is an associative memory scheme thatdepends on the algorithm VASN.1 uses. The scaling of the hardware is difficult due11to the associative memory and the Intermediate Memory block will not operate whenASN.1 types have long tags.• Figure 7 shows that two queues are used for servicing the interrupts generated byslave processors. This method for handling interrupts produces unfair arbitration if thesampling rate of the two queues is low[2].• In figure 7, information exchanged between the slave processors is transmitted on Bus Avia the Processor Master. Information that the slave processor needs from the systemor local bus is retrieved on Bus B via the Interface Processor. The setup is restrictivebecause it partitions the utilization of the two busses and leads to a high utilization ofone bus and a low utilization of the other. The total throughput of the multiple bussystem is not exploited.In figure 7 the arbiter services the requests for the system bus and for the local bus madeby the four slave processors, hence the arbiter can become a bottleneck in the system.This thesis proposes an implementation that avoids the problems of VASN.1 by choosing amore flexible hardware/software configuration.12Local Bus110- Queue ProcessorMaster411— QueueBus A VBus BV^• ProcessorSlave 241111—► Arbiter InterfaceProcessor• v ProcessorSlave 1V^• ProcessorSlave 4• V ProcessorSlave 3System BusInterrupt LinesFigure 7 Parser/Assembler and Encoder/DecoderThe encoding/decoding algorithms represent the last hidden cost for VASN.1. The localrepresentation that VASN.1 uses for its encoding input and decoding output is a parallel datastructure that holds all the hierarchical information of the equivalent ASN.1 description. Thedata structure has the following implications:• An application would either use the parallel data structure if they want to encode/decodedata via VASN.1.• Or an application can convert the parallel data structure into one that is more convenient.Either case incurs an expensive programming cost because of the lack of flexibility andextraneous information in the data structure. This processing cost is the cost of parallelizinga serial data structure. VASN.1 attributes a zero cost for the processing. The algorithmpresented in this thesis provides the application with flexible data structure and does theserial to parallel conversion at a low cost.131.5 Thesis Objectives and OutlineThis section presents a list of thesis objectives that were achieved and the organization ofthe thesis. Prior to this thesis, VASN.1 was the only parallel system for encoding/decodingASN.1 types. Therefore many of the objectives listed below are solutions to problems thatVASN.1 exhibits.1. Devise a parallel ASN.1 encoding/decoding algorithm that is capable of higher throughputthan present serial implementations.2. Devise an algorithm that can take serial data and transform it into parallel data foroptimal encoding/decoding.3. Devise an ASN.1 encoding/decoding algorithm that can be incorporated with applicationsat a minimal cost.4. Design a scalable parallel architecture that is capable of running the algorithm at highspeeds.The following is the outline of this thesis. Chapter 2 presents the parallel architecture.Chapters 3 and 4 describe the encoding and decoding algorithms in detail. Chapter 5 providesthe functional specification of the hardware. Chapter 6 presents the details of how thesimulations of the parallel architecture were carried out and chapter 7 presents the simulationresults. Chapters 8 and 9 provide the conclusions and suggest directions that future workmay follow.142 Parallel Architecture Figure 5 shows that in a single processor implementation the time required to encode thedata is the sum of the time required to encode each primitive type. In the parallel version theprimitive types encode nearly simultaneously thus saving time. The same principle applies tothe decoding routines. Section 2.1 provides a high level description of the parallel hardwareneeded for the efficient execution of the parallel algorithm. Section 2.2 presents an outlineof the parallel encoding/decoding algorithm. Section 2.3 explains how the host system andthe parallel ASN.1 system share the same address space.2.1 Hardware ArchitectureFigure 8 depicts the parallel setup that allows primitive encoding/decoding routines to runsimultaneously. This section presents a high level description of the hardware and chapter 5presents details of the bus arbitration scheme, the processors and the queue-processor. Thehost or a processor can write primitive calls into the FIFO queue. The information in theFIFO queue is stored in the Queue Memory.15QueueMemory4111-1110.QueueMemoryManager• •buffptrProcessor Processor Processor Processor1 2 3 n• •ToExternalBusQueue ProcessorFigure 8 Parallel Hardware SetupThe Queue Memory Manager is responsible for accesses made to the Queue Memoryand for distributing primitive encoding/decoding calls to the processors for servicing. Theencoding/decoding algorithms in chapters 3 and 4 need multiple words of information to bepassed to the processors. If a processor requires n words of information, then n words canbe written into the Queue Memory and the Queue Memory Manager can distribute thesewords to the processor. Another option is to write the address of the n words into the QueueMemory and then the Queue Memory Manager can redistribute the address. A Queue-Processor can be designed with either method but in our implementation the address methodis chosen. The Queue Memory Manager is also responsible for recognizing certain valuesas commands. Constants that represent these commands can not be used as memory locationpointers. Table 5 lists these commands with a brief explanation of each. The completeexplanation of these commands occurs in chapters 3 and 4.16Command Explanationwait for free Wait for all the processors to be free before assigning out the queueelement.add control Increments the control counter.end signal Decrements the control counter. If the control counter is zero execute await for free then interrupt the host.Table 5 Queue-Processor Commands.The processing elements in our system each have a local instruction memory that storesthe primitive encoding/decoding routines that are to be executed. When a primitive typeis encoded, the resulting TS must be stored in memory, hence the processor needs somemechanism to allocate memory. Memory is also needed when decoding a TS to a machinerepresentation. The buffptr in figure 8 is a register that holds a pointer to the starting memorylocation of a free memory pool. The host is responsible for setting up the buffptr pointerto point to the available memory. When the free memory pool is set up, it could consistof contiguous or noncontiguous memory blocks. For the initial simulations, the contiguousmemory approach explained in section 2.1.1 is taken.2.1.1 Contiguous Memory ManagementFigure 9 shows what the memory space looks like at initialization. The buffptr holdsthe starting address of the free memory space.buffptrFree MemoryFree MemoryFree MemoryFigure 9 Contiguous Memory: InitialWhen a process needs x bytes of memory it preforms the following operations withoutreleasing the shared bus:17Reads buffptr.Increments the read value by x.Writes the new value to buffptr.The memory, after a memory allocation, looks like figure 10.buffptr^xAllocated Memory^Free MemoryFree MemoryFree MemoryFigure 10 Contiguous Memory: Allocated2.2 Parallel Algorithm OutlineThe following section provides a brief outline of the encoding/decoding algorithms.Chapters 3 and 4 present the full details of the algorithms.When encoding, the host processor parses its data structure and writes the primitives intothe FIFO queue. The queue-processor then distributes the primitives to the free processors.Since the host processor knows the exact data that it wants to encode, it is responsible forassigning all the primitive calls to the queue-processor. Simulations verify that the primitivecalls take most of the processing time therefore the time spent by the host on parsing thedata structure is minimal.When decoding, the host processor writes the starting address of the TS to the FIFOqueue. The queue-processor then assigns the address to a free processor P1. P1 then readsthe tag and length of the first element in the TS and calculates the starting location of thesecond element in the TS. P1 then writes the starting address of the second element in theTS to the FIFO queue. The queue-processor then assigns the address to the free processorP2. P2 performs the same process that P1 has performed. The process is repeated by otherprocessors until the complete TS is decoded.18•••Queue ProcessorProcessor 1Processor 2•••ProCessor n••••••I/OAddressSpaceTotalAddressSpaceMemoryAddressSpace2.3 Memory Mapping SchemeThe memory scheme that the host system and the ASN.1 parallel hardware use is depictedin figure 11. The host system segments the memory space into I/O address space and memoryaddress space, which allows I/O registers to be mapped to I/O address spaces.Figure 11 Memory Mapping SchemeFigure 11 shows that the queue-processor and the processors of the parallel hardware aremapped to some of the I/O addresses. Thus, when the host system wants to assign work tothe queue-processor it writes to the I/O mapped location of the queue-processor.193 EncodingIn the encoding process, information is taken from the application layer, encoded andpassed to the session layer. The data structures received from the application layer andprovided to the session layer are presented in section 3.1. Once these data structures aredefined then the explanation of how an ASN.1 protocol specification is used to generateencoding routines for an application is discussed in section 3.2. Section 3.3 then explainsthe functionality of the encoding procedures that the applications use. Finally, section 3.4provides the control flow of the parallel architecture and also concludes the running exampleof the chapter.3.1 Encoding Interface Data StructuresThe application layer provides a C data structure of the value instance that it needs toencode. The data structure is generated according to the ASN.1 specification and the mappingrules provided in figure 12. A complete list of these mapping rules is found in Ref. [28].When a value instance is encoded it is presented to the session layer in IDX structures. AnIDX structure is defined in figure 12. Figure 13 shows an ASN.1 specification and figure 14shows an application's equivalent C type. When the value instance shown in figure 13 isencoded, the resulting output is presented in figure 15. A hardware component can convertthe IDX structure shown in figure 15 into a contiguous TS. The hardware component can bepipelined with the parallel encoder and will not effect the total throughput of the system.20ASN.1 Type C TypeINTEGER intCharacter String OCTSSEQUENCE {} struct {SET 0 struct 0SEQUENCE OF { } LISTSET OF { } LISTtypedef struct OCTS {struct OCTS^*next; /* pointer to next OCTS nodeint^len;^/* # of bytes in OCTS string */char *data;^/* pointer to the OCTS string) OCTS;typedef struct LIST_ITEM {struct LIST_ITEM^*next;^/* next list item */char^*item;^/* list item */} LIST_ITEM;typedef struct LIST {LIST_ITEM^*top; /* first item in the list */LIST_ITEM^*last;^/* last item in the list */}^LIST; ./* encoded output goes into IDX structures */typedef struct IDX {struct IDX^*next;^/* next idx */int^*len;^/* # of bytes in IDX data */char *data;^/* information of the IDX */)^IDX;*/*/Figure 12 Partial List of Mapping Rules21recordType^[APPLICATION 1] IMPLICIT SEQUENCE {scores SEQUENCE OF REAL,name^[1] IA5String,age^[2] INTEGER}value instance = {{70.12,80.34,90.56},"John",50}Figure 13 ASN.1 Example^#define APPTAG^0x61#define NAMETAG Oxal#define AGETAG Oxa2typedef struct recordType {LIST^scores;OCTS^name;int age;}recordType;Figure 14 C Type Equivalent of Figure 1322len^tag lendata next Altag len►seq oftag lentag lenretag lenail lendatanextlendatanextlendatanextlendatanextlendatanextlen datanextlen datanext80.34tag lenI Tohntag lentag lenI I^90.56realtag len• ;-,E1===Ilen data ^nextNULLFigure 15 Transfer Syntax of Figure 13 in IDX Structures3.2 Encoder Generation from ASN.1 SpecificationTo generate encoded values, like those shown in figure 15, all the values that need to beencoded are written to the parallel ASN.1 encoder/decoder hardware. A program that readsin the ASN.1 specification and generates the encoding routines that the application usesis necessary. CASN.1 can be modified to generate these encoding routines. The encodingroutines for the parallel algorithm can be identical to the code generated by the present versionof CASN.1. An example of the encoding routines required for the specification of figure 13is shown in figure 16. The actual function of the QEncode () is to write the encode routine23to the queue-processor. The EncodeStructBegin 0 and EncodeStructEnd () routines functionare optimized for parallelism and the detailed explanation of them is provided in section 3.3.EncodeData (pcode)pcodeType *pcode;{EncodeStructBegin (APPLICATION1TAG);EncodeSeq0f1 (&pcode->scores, SEQOFTAG)EncodeStructBegin (NAMETAG);QEncode (OCT, &pcode->name, tag);EncodeStructEnd ();EncodeStructBegin (AGETAG);QEncode (INT, &pcode->age, tag);EncodeStructEnd ();EncodeStructEnd ();}EncodeSeq0f1 (list, tag)LIST *list; tagType tag;{LIST_ITEM *item;EncodeStructBegin (tag);for (item = list->top; item; item^item->next)QEncode (REAL, item->item, REALTAG);EncodeStructEnd ();Figure 16 Generated Encoding Routines for Figure 133.3 Encoders Functions and Data StructuresThe EncodeStructBegin (), EncodeStructEnd () and the QEncode () procedures areexplained in sections 3.3.1-3.3.3. The cost for each of the calls is estimated and compared24to the equivalent call in CASN.1. The determination of the cost of the calls is shown inAppendix B.The information shown in figure 17 is assigned to the processing element when a primitivecall is made. Within the data structure in figure 17 the following are defined:• rtag holds the tag of the value to be encoded and a value that specifies the encodingroutine to be used.• data holds a pointer to the data to be encoded.• idxptr holds a pointer to the idx structure that is updated with the encoded value.• updateptr holds a pointer to an integer that is incremented by the total number of TSbytes generated by the primitive encoding routine. The integer is used for calculatingthe cumulative length in structures.typedef struct egelemType {int^rtag; /* tag & encode routine */char^*data; /* data to be encoded */idxType *idxptr;./* encoded data location */int^*updateptr; /* cumulative length */} egelemType;Figure 17 Encode Queue Element.3.3.1 EncodeStructBegin 0When an EncodeStructBegin call is made the length of the structure cannot be encodedbecause the lengths of the elements in the structure are unknown. There exists a datadependency between the elements of the structure and the length of the structure; hencethe length of the structure can only be calculated after all the elements in the structureare encoded. This data dependency must be maintained somehow and must be generallyexpandable for SEQUENCES, SET, SET OF and SEQUENCE OF. A tree that maintains thedata dependency is constructed by using the data structures presented in figure 18.25typedef struct nodeType {/* called structure node */struct nodeType *parent;struct nodeType *next;int^lent;idxType^*idx; /* idxType same as IDX */tagType *tag;} nodeType;typedef struct levelType {/* called level node */struct levelType *plus;struct levelType *minus;nodeType^*head;nodeType *tail;}Figure 18 Encoding Cumulative Length Data StructuresIn the remainder of this thesis a node of type nodeType is called a structure node andnode of levelType is called a level node. Figure 19 shows the data dependency structureconstructed for the value instance given in figure 13. This data dependency structure isgenerated according to the following rules.Initially the tree consists of:a. one level node known as the root levelb. one structure node at the root levelc. a present level pointer pointing to the root leveld. a deepest level pointer pointing to the root levelWhen an EncodeStructBegin call is made and the next level exists:a. a new structure node is added to the tail of the next levelb. the parent field of the new structure node points to the present levels last nodec. the present level pointer now points to the next leveld. the tag of the EncodeStructBegin call is saved in the new structure node26level+^level+^level+level level- level-head ^head headtail tail^tail tag idxlenparentnexttagidxlenparentnexttag idxlenparentnextV tag idxlenparentnext V tag idx lenparentnextWhen an EncodeStructBegin call is made and the next level is null:a. a level node along with an accompanying structure node is added after the present levelb. the parent field of the new structure node points to the present levels last nodec. the present level pointer now points to the next leveld. the deepest level pointer now points to the next levele. the tag of the EncodeStructBegin call is saved in the new structure nodeThe maximum cost for calling EncodeStructBegin is 35 while the equivalent routinein CASN.1 costs a minimum of 22. The CASN.1 cost is lenient because it assigns theEncodeTag routine within the EncodeStructBegin routine a zero cost.Figure 19 Data Dependency Graph of Figure 13273.3.2 EncodeStructEnd 0When an EncodeStructEnd call occurs the elements within a structure have been assignedto the queue-processor. The length of the structure becomes known when all the elementswithin the structure are encoded. The tag and length of a structure can be encoded at theend of the EncodeStructEnd call or postponed until a later time. The postponed option isused here. The reason for this choice is explained by using the ASN.1 specification shownin figure 20.postponedType ::= SEQUENCE {SEQUENCE {reall^REAL,real2^REAL} ,SEQUENCE {rea13^REAL,real4 REAL}}Figure 20 Postponed ASN.1 TypeIf there are four processors available to encode a value instance of the ASN.1 specificationin figure 20 and the structure lengths are calculated immediately after the EnodeStructEndcall, then only two of the four processors are utilized. Once the values of reall and real2 areassigned to the processors the EncodeStructEnd routine is executed. For the correct lengthof the sequence to be known one must wait until reall and real2 are encoded and the lengthof the structure is updated by them. The same thing happens for real3 and real4. Thusat any time only two of the four processors are utilized. If the length calculations of thestructures are postponed until the end of all the primitive encodings, then reall—real4 canbe encoded simultaneously on four processors. Once all the reals are encoded then the two28sequences that make up the postponedType can encode their tag and length. Finally the tagand length of the postponed-Type is encoded.EncodeStructEnd moves the level pointer to the previous level. If the previous levelis the root level then all the structures, represented by the structure nodes, in the datadependency graph can be assigned to the queue-processor. The structures below the deepestlevel pointer are assigned to the queue-processor first, next the structures of the previous levelare assigned to the queue-processor and this continues until the root level is reached. Afterall the structures of a level are assigned to the queue-processor, the wait for free commandis queued. The command tells the queue-processor to wait for all the processors to becomefree before assigning the next element in the queue. This is necessary because the data inthe previous level is dependent on the lengths calculated by the present level.The cost for calling EncodeStructEnd is 6 for the non-root level and is variable for the rootlevel according to cost = 25* nodes +7 * levels. In the worst case every EncodeStructBegincall creates a structure node and a level node. The cost for assigning this data dependencygraph to the queue-processor is cost = 32* EncodeStructEnd. Hence the maximum cost forthe EncodeStructEnd is proportional to the number of times the EncodeStructEnd is calledand is cost = 38 * EncodeStructEnd. The simplest case of CASN. 1 's equivalent call costsis cost = 40 * EncodeStructEnd.3.3.3 QEncode ()The QEncode call is equivalent in function to the primitive encoding routines in CASN.1.This is where the parallel algorithm does significantly better than CASN.1. The total costof writing this information into the queue is 16 whereas in CASN.1 the cost for a primitiveencoding depends on the primitive.There are three parameters passed to the QEncode (), namely, a value specifying theencoding routine that should be used, the tag for the transfer syntax and the pointer to thedata structure to be encoded. This information is written to the queue—processor along withan allocated idx pointer and a cumulative length pointer. The cumulative length is extracted29from the data dependency structure and is the length field of the node that is at the tail ofthe present level node.3.4 Encoding Flow TraceThe queue information for the value instance of the ASN.1 specification given in figure13 is shown in figure 21. A partial parallel trace of the program is shown in figure 22. Infigure 21 the first element of the queue is at the bottom of the figure. The first five elementsin the queue are for encoding the 3 REALS, the IA5String and the INTEGER. All the non-bold boxed values in the diagram point to queue element structures, one of which is shownto the right in figure 21. The sixth element, wait for free, is a signal to the queue-processornot to assign another element until all the processors are free (i.e. until all the previouslyassigned elements have been encoded). The next elements in the queue are the sequence ofelement, the nametag element, the agetag element and then the wait for free signal. The lastencodable element in the queue is the APPLICATION 1 element which is followed by theend signal that instructs the queue-processor to interrupt the host when all the processors arefree. Sections 3.4.1-3.4.3 explain the actions that the short statements in figure 22 perform.30end signal^Ipointer to APP 1wait for free ^Ipointer to agetagpointer to nametagpointer to seq ofwait for treepointer to INTpointer to IA5pointer to real 3pointer to real 2pointer to real 1queue element contentscumulative lengthtag and routinedata pointeridx pointerFigure 21 Queue with the Encode Example Calls3.4.1 Host ResponsibilitiesWhen the host wants to encode a TS it has the option of either creating the completequeue information and then assigning the work to the queue-processor or assigning the workto the queue-processor as it parses its data structure. In either case, the host puts all theprimitive encoding routines into the FIFO. The host then assigns, to the queue-processor, allthe structures from the deepest level of the data dependency graph followed by the wait forfree signal. It then assigns all the structures from the previous level followed by the waitfor free signal. This process continues until the root level is reached. At the root level thehost writes an end signal into the FIFO to signify that the encoding list is complete. When31O■-■al)C"00CC-00C44O.• • • •► • • •.o 1:7:1 9a.)-o- 8 -=4.) t4.-1,- a c a o,C=^0,)Fti•^.8 •3 ¢ 3 ¢ 3r.nCa)CYN9 9 9 -Q^0^0 "71-4.) II...'0 0 b M "0 M8^8 - 8 1==•• 8(-1^C t".4Q•^a.) O U G. t.>•••••• f4i P.' ,^• • •fa.• C 4 C N Ctv) .^ 44^01)^0.0 7.1•R '7, - 8 - 7,v•-•^s"S 2a) a) 4.>^0 C'0 '0 '0 0 "0 ,■-■• 8 8 8 8 8 .•A' 8=000O 0 a.) a) o• 4.) (1) 4.)-c • -c -c3 3 3 3 3 3 3•• •°^_sea.) A L ■-■2 o ."S °0•• •72)2rA04cn 2"0 a.)r.4^•-• • • •c▪ .32the queue-processor encounters this end signal and all the processors are free it sends aninterrupt to the host to signal the completion of the encoding.The following describes the actions performed in the flow diagram of figure 22.• Write (buffptr) write the starting address of the memory pool that is available for theencoding routines into buffptr. This is the first action that the host performs and is notshown on the diagram.• Write (encode call i) write into the queue-processor any of the possible primitive encodecalls.• Write (wait for free) write to the queue-processor a synchronization symbol that signalsthat all the processors must have a free status before the other elements in the queueare assigned to the processors.• Write (End_Signal) write to the queue-processor a symbol that signals the end of aencoding session.• Accept Finished accept that the encoding is finished. One possible action is the releaseof the unused buffer space that is pointed to by buffptr.3.4.2 Queue-ProcessorDuring encoding, the queue-processor reads elements inserted into the queue by the hostand assigns the work to one of the n processors. Whenever the queue-processor encountersthe wait for free signal it waits for all the processors to become free. When the queue-processor encounters an end signal it stops assigning work. After the processors havefinished their encoding the queue-processor signals to the host that the encoding is complete.The following describes the algorithm that the queue-processor performs.1. Read an element from the queue. If the queue is empty, then block.2. If the element is a wait for free then:Waitlist (pl,p2,...,pn) block until all the encoding processors have finished beforeassigning the next work.33else if the element is a End_Signal then.Waitlist (pl,p2,...,pn) block until all the encoding processors have finished beforeinterrupting the host.Interrupt Host signal the host of the encoding completion and wait for the ac-knowledge.else the element is a encode call iAssign (px, encode call i) assign a processor x to encode a specific call. If all theprocessors are busy, block and wait for one to become free.3 Start over at step ProcessorsA processor blocks until it is assigned a primitive encode call by the queue-subsystem.The processor then encodes the primitive call and upon completion informs the queue-subsystem that it is free. The following explains the actions that the processor executes:1. Start Encoding receives work from the queue processor and starts encoding the primitive.2. End Encoding finishes the encoding and signals the queue-processor of its free avail-ability status.3. WaitAssign (encode call i) waits for the queue-processor to assign a primitive call.WaitAssign is not drawn on the diagram for clarity but would occur in the space beforethe Start Encoding and after the End Encoding.344 DecodingThe decoding chapter follows the same format as the encoding chapter. Section 4.1discusses the data structure requirements of the application layer and session layer. Section4.2 explains how the decoding is achieved for ASN.1 specifications. Section 4.3 presentsthe requirements of the software that is needed to generate the decoder. Finally section 4.4presents a flow trace of the parallel algorithm.4.1 Decoding Interface Data StructuresPrior to decoding, the session layer provides a continuous memory block which holdsthe TS to be decoded. When the decoding is complete the result is made available to theapplication layer in a data structure created by using the mapping rules given in table 12.4.2 Decoder Generation from ASN.1 SpecificationFor an optimal parallel decoding strategy a type tree needs to be generated for the ASN.1specification. Generation of such a type tree is outlined in [13][1611231 The type tree isgenerated off-line and loaded when an instance of a transfer syntax needs to be decoded.To decode a received TS the application needs to write to the parallel ASN.1 decoder thestarting address of the TS, the finishing address of the TS, the address of the type tree to beused for decoding this TS instance and the memory location to store the decoded result ora pointer to the decoded result. The data structure used for communicating this informationfrom the host to the parallel decoder is shown in figure 23. The contents and usage of thetype tree are discussed in section 4.3.35typedef struct qdType {nodeType *tt; /* pointer to a node of the type tree */char^*tsptr; /* pointer to the start of the TS */char^*update; /* mem. location to be updated or used * /char^*tsend; /* pointer to the end of the TS */Figure 23 Decoding Queue-Element.4.3 Decoders Functions and Data StructuresFor a functional decoder to operate it is necessary to construct a type tree and thetraversing routine for this type tree. The rules for generating the type tree are presentedin this section and the traversing routines are explained in section 4.4. The type tree isconstructed of nodes that have the data structure as shown in figure 24.typedef struct nodeType {int^tag;int meminst;struct nodeType *option;struct nodeType *listptr;struct nodeType *next;} nodeType;Figure 24 Node Data Structure of Type TreeThe tag field holds the tag that is to be found in the transfer syntax and a value whichindicates the decoding routine to be used. The meminst field is discussed in the nextparagraph. The option field is used for elements in a SET. The listptr field is used forSEQUENCE/SET and OF types. The next field is used for pointing to the next elementexpected.The memisnt field is made up of three sub-fields; the allocation field, offset field and abit marker field. The allocation field specifies the amount of memory to allocate at this node.The offset field specifies at which offset within a memory location the result of the decodeis stored. The bit marker field contains an UPDATE bit, an OF bit and a CONSTRUCTOR36bit. The UPDATE bit specifies whether the present node needs to allocate memory for thedecoded output or update a memory pointer passed to it. The OF bit is set whenever thepresent node corresponds to a SEQUENCE OF or SET OF. The CONSTRUCTOR bit shouldbe set whenever the present node corresponds to a SEQUENCE or SET tag.The rules stated in sections 4.3.1 to 4.3.3 are used when generating nodes of the typetree. Two example type trees are shown in figure 25. Notice that the set type tree in thisfigure occupies more memory than the sequence type tree. The memory requirements ofthe set type tree can be reduced by adding a level of indirection; but by doing so, the totalnumber of memory references increase which slows down the overall speed of the system.This essentially becomes a speed/memory trade off.4.3.1 Node for Primitive TypesThe tag field holds the. tag of the primitive and a value which indicates the decodingroutine to be used. The option pointer is set only if the primitive is an element of a SET.The listptr is always NULL and the next node pointer points to the next node expected.Within the meminst field the allocate sub-field is zero. The offset sub-field holds theoffset location used to store the result. The UPDATE, CONSTRUCTOR and OF bit fieldsare not set.4.3.2 Node for SEQUENCE and SET TypesThe tag field holds the tag of the SEQUENCE/SET and a value which indicates thedecoding routine to be used. The option pointer is non NULL if the SEQUENCE/SET isan element within a SET. The listptr pointer points to the first element expected withinthe SEQUENCE/SET. The next pointer points to the first element expected after the SE-QUENCE/SET.Within the meminst field the allocate sub-field is zero. The offset sub-field is zero. TheCONSTRUCTOR bit is set. The OF bit is not set. The UPDATE bit is set only on theroot node of the tree.374.3.3 Node for SEQUENCE OF and SET OF TypesThe tag field holds the tag of the SEQUENCE OF/SET OF and a value which indicatesthe decoding routine to be used. The option pointer is non NULL when the SEQUENCEOF/SET OF is an element of a SET. The list pointer, points to the first element expectedwithin the SEQUENCE OF/SET OF. The next pointer points to the first element expectedafter the SEQUENCE OF/SET OF.Within the meminst field the allocate sub-field is set to the size of the SEQUENCEOF/SET OF structure. The offset sub-field holds the offset location of a LIST structure. TheCONSTRUCTOR bit is not set. The OF bit is set. The UPDATE bit is set only on theroot node of the tree.38codel ::= SEQUENCE {score REAL,name IA5String,age INTEGERcode I Type TreeSeq Tagmeminstoptionlistptr --sw Real Tagmeminstnextcode2 ::= SET {score REAL,name IA5String,age INTEGERcode2 Type TreeSeq TagmeminstoptionlistptrnextoptionlistptrnextIA5 TagmeminstoptionlistptrnextTNT Tagmeminstoptionlistptrnextmeminstoptionlistptrmeminstoptionlistptrmeminstoptionlistptrmeminstoptionlistptroptionlistptrmeminstoptionmeminstoptionlistptrmeminst meminstmeminstmeminstoption optionoptionlistptroptionlistptr listptroptionlistptrFigure 25 Two Decode Type Trees Comparing SET and SEQUENCE4.4 Decoding Flow TraceTo describe flow control of the decoding algorithm we use the transfer syntax in figure26. This transfer syntax is the value instance of the ASN.1 specification given in figure39Of TagmeminstoptionlistptrnextSeq Tagmeminstoption listptrnextlistptrnext2listptrnextINT TagmeminstoptionlistptrnextReal TagmeminstoptionlistptrnextIA5 Tagmeminstoption listptr next••13 of chapter 3. The decoding type tree generated for this ASN.1 specification is shownin figure 27.tag len tag len tag len tag len tag len tag len tag len tag len tag len70.12 I-^I 80.34 90.56 John f 150Al^seq of real^real real^IA5^2^INTFigure 26 Transfer Syntax of Figure 13Figure 27 Decode Type Tree of Figure 13For the decoding to start a decoding queue-element must be written into the queue by thehost. The queue—processor then assigns this element to a free processor and continues todo so whenever an element is available in the queue. The queue—processor stops assigning40values to decode when it sees a special end marker, at which time it signals the host thatdecoding has finished. Figure 29 shows the graphical representation of the control flow thatoccurs when the TS of figure 26 is decoded. Sections 4.4.1 to 4.4.3 explain the actions of theshort statements in figure 29. For clarity purposes the control flow figure does not includeall the signals that are transmitted in the system.4.4.1 Host ResponsibilitiesWhen a host wants to decode a TS it sets up the free buffer pointer and then writes a de-coding queue-element into the queue-processor. The host then waits for the queue—processorto generate an interrupt upon completion of the decoding.The following describes the actions that are performed on the flow diagram of figure 29.• Setup (buffptr) set up a pointer to the available free memory that the processor can use.• Write (decode call 1) write the first decode queue element into the queue-processor.• Acknowledge Interrupt acknowledge the interrupt of the queue-processor.4.4.2 Queue-ProcessorThe algorithm that the queue-processor performs is shown in figure 28. The queue-processor reads elements inserted into the queue and assigns the non-command element toone of the free processors. The functions of the command elements such as the wait forfree, add control and end signal are discussed next. When the queue-processor sees await for free signal it waits for all the processors to become free before it assigns anyother work. When the queue-processor sees an add control signal it increments the controlcounter that is initially zero: When the queue-processor sees an end signal it decrements thecontrol counter. If the control counter is zero when the queue-processor sees an end signalit interrupts the host and waits for an acknowledgment. Note that the functions that thequeue-processor performs for encoding are a subset of the functions described here; hence,the same queue-processor is used for both encoding and decoding.41est•I.)042Figure 28 Algorithm of Queue-Processor434.4.3 ProcessorsThe decoding algorithm used by the processors is depicted in figure 30. A processorsignals the queue-processor that it is free and then it waits for an address of a decode queueelement to be written to its I/O mapped location. Henceforth whenever a processor assignswork this means it constructs a queue element and then writes a pointer to this informationinto the queue. Also the queue element pointer that the processor receives is referred to asrec and the pointer that the processor assigns to the queue processor is referred to as q.The rest of this section explains all the actions shown in figure 30.44-calc next addr-queue add control-queue next-caic sub-struct addr-queue sub-struct-caic sub-structs addrs-queue all sub-structswith add controls-find correct node-calc next addr-queue next addrFigure 30 Decoding Algorithm for Processorstsptr = tsend If tsptr = tsend then the processor is at the end of the decoding segment andhas to signal the queue-processor that one control trace of the algorithm has ended.45if (rec->tsptr .= rec->tsend)write end signal to queue-processor;UPDATE bit set The UPDATE bit is set only for the first node in the type tree. This signalsthat the amount of memory specified in the allocation sub-field of meminst (figure 24) shouldbe allocated and that the queue element update pointer should be updated with the address ofthe allocated memory. The following C like pseudo code segment summarizes these actions:q->tt = rec->tt->next;q->tsptr = rec->tsptr +length in length field of (rec->tsptr);q->update = allocate memory;q->tsend = rec->tsend;*q->update = allocate memory;find correct node When the processor is not at the end of a control trace it checks to see itis at the correct type tree node. If the nodes option pointer is not NULL, it then decodes theTS tag and determines if the tag at the present node matches this TS tag. If the TS tag andtag at the node are not the same it follows the option pointer to the optional node and thencompares the TS tag to the tag at that node. The processor continues following the optionpointer until it finds a matching tag. If the processor does not find a matching tag a signalis generated signifying that there is an error in the incoming transfer syntax.Note that the elements within a SEQUENCE always have a NULL option pointer butfor elements within a SET the option pointer can be non NULL. This means that the tag ofan element in a SEQUENCE need not be decoded before the next element is assigned tothe queue-processor; but for elements within a SET, the tag has to be decoded and matchedbefore the next element is assigned to the queue-processor. This will make SEQUENCEdecoding faster than SET decoding. The following C like pseudo code segment summarizesthese actions:46tt = rec->tt;while (tt != NULL) {if (tt->tag == tag of (rec->tsptr) break;if (tt->option == NULL) error; else tt = tt - >option;if (tt->option == NULL) error;CONSTRUCTOR bit set If the CONSTRUCTOR bit at the present node of type treeis set then the find correct node and UPDATE bit set operations are performed. TheCONSTRUCTOR bit set means that we are at the start of a SEQUENCE/SET. At this pointthe processor calculates the end of the SEQUENCE/SET which is the starting location of thenext element expected. It then informs the queue that one more control thread is being addedand assigns, to the queue-processor, the element expected after this sequence. The processorthen calculates the starting location of the first element of this sequence and assigns thiselement to the queue-processor. Finally if the option pointer is NULL the processor checksto see if the TS tag matches the present nodes tag. The following C like pseudo code segmentsummarizes these actions. Note that tt value used below is defined in find correct node.ql->tt^tt->next;ql->tsptr^rec->tsptr +size of tag and len of (rec->tsptr) +length in length field of (rec->tsptr);ql->update^rec->update;ql->tsend^rec->tsend;write ql to queue-processor;write add control to queue-processor;q2->tt^tt->listptr;q2->tsptr^rec->tsptr +size of tag and len of (rec->tsptr);q2->update = rec->update;q2->tsend^rec->tsptr +size of tag and len of (rec->tsptr) +length in length field of (rec - >tsptr);write q2 to queue-processor;match tt->tag and tag of rec->tsptr;47OF bit set If the OF bit of the present node of the type tree is set then the find correctnode and UPDATE bit set operations are performed. The OF bit set means that we areat the start of a SEQUENCE OF/SET OF. At this point the processor calculates the endof the SEQUENCE OF/SET OF which is the starting location of the next element. I thenassigns this element to the queue-processor. Next the processor scans through the elementsof the SEQUENCE OF/SET OF performing only length calculations, to arrive at the nextSEQUENCE OF/SET OF element. At each SEQUENCE OF/SET OF element the processorassigns the element to the queue-processor and informs the queue-processor to add one tread.Finally, if the option pointer for this OF structure is NULL the processor checks to see ifthe TS tag matches the tag of the present node. The following C like pseudo code segmentsummarizes these actions. Note that tt value used below is defined in find correct node.q->tt^tt->next;q->tsptr = rec->tsptr +size of tag and len of (rec->tsptr) +length in length field of (rec->tsptr);q->update = rec->update;q->tsend = rec->tsptr +size of tag and len of (rec->tsptr);write q to queue-processor;endptr = rec->tsptr +size of tag and len of (rec->tsptr) +length in length field of (rec->tsptr);q->tt = tt->listptr;q->tsptr = rec->tsptr +size of tag and len of (rec->tsptr);while (endptr != q->tsptr) {q->tsend = q->tsptr + length in length field of (q->tsptr);q->update = allocate memory give in sub-file of tt->meminst;write q to queue-processor;q->tsptr = q->tsend;match tt->tag and tag of rec->tsptr;primitive If neither the OF bit nor the CONSTRUCTOR bit is set then this is a primitivedecode element and the first action is to assign out the next element to be decoded. The48primitive element is then decoded and stored at the update pointer location adjusted withthe offset quantity in meminst. The following C like pseudo code segment summarizes theseactions. Note that tt value used below gets defined in find correct node.q->tt = tt->next;q->tsptr = rec->tsptr +size of tag and len of (rec->tsptr) +length in length field of (rec->tsptr);q->update = rec->update;q->tsend = rec->tsend;write q to queue-processor;decode the primitive;49ArbiterADaisy Chain^ g3ing2in g4inglin PE2 PE3PE1 • • •Vgnin PEngnin PEn• • •glin g2in g3inPE 1 PE2 g4inPE35 Hardware Models The algorithms suggested in the previous sections were simulated on the parallel hardwarepresented in figure 8. The elements in this hardware are specified at the behavior level usingVHDL[1][9]. Sections 5.1-5.3 describe the operations of each of the components.5.1 Bus Arbitration MethodA rotating daisy chain (RDC) is chosen for the bus arbitration method. A daisy chainand a rotating daisy chain bus arbitration setup are shown in figure 31. A brief outline ofboth the arbitration schemes is provided in the next paragraph. A RDC is chosen for thebus arbitration method because this is a distributed method that is scales easily. The RDCmethod also offers the best bus arbitration scheme for the amount of hardware needed forimplementation[2]. Ref.[21] presents various implementation methods for the RDC alongwith their relative advantages.bus request lineRotating Daisy Chainbus request lineFigure 31 Daisy Chain and Rotating Daisy ChainIn the daisy chain bus .arbitration scheme shown in figure 31 any of the n processingelements (PE) can request the bus by raising the bus request line. The arbiter will then50raise the glin grant signal. The glin signal allows PE1 access to the bus; but if PE1 doesnot require the bus, it will propagate the glin signal to g2in. The grant signal can propagatethrough to any of the n PEs that request the bus. This bus arbitration scheme is unfair asPEI has the highest priority because whenever it requests the bus it will always get the glingrant signal. In this system, PEn has the lowest priority because when PEn requests thebus it will get the grant if none of the previous (n-1) PEs require the bus. At a processorPEi the processors to the right of PEi have lower priorities and processors to the left of PEihave higher priorities.The RDC initially starts with one PE responsible for generating a bus grant signal. ThisPE has the lowest priority while the first PE to the right has highest priority. The PE thatgenerates the grant signal is the arbiter. When a PE requests the bus and receives the busgrant signal this PE then assumes lowest priority and becomes responsible for generating thebus grant signal for the next arbitration. This moving of the bus granting capabilities fromone PE to another maintains an even priority system[2]. Another advantage of using theRDC method is the relative ease with which other systems can connect into the RDC. Whenanother system wants to share the bus with the elements on the bus, it simply includes itselfin the RDC. The overhead associated with including a arbiter in every PE is low. Figure 32shows that the hardware requirements are minimal.Figure 32 shows a three PE RDC implementation. Each processing element has a onebit bus grant register and a one bit bus request register. The operation of this three PE RDCis explained by using the timing diagram of figure 33 on page 54. The signals /B_RQ<1-3>are bus request signals generated by the PEs. The /req<1-3> are the clocked /B_RQ<1-3>signals and /t<1-3> are the grant signals that the PEs use. When the system initializes,only one bus grant register is set and this is signal in the timing diagram. Now, itl isresponsible for generating the bus grant upon initialization. When one PE receives a busgrant from another, this sets its bus grant register and now it is responsible for generatingthe next bus grant signal. The next action in the timing diagram occurs when PE 3 requests51for the bus by raising the /B_RQ3 line which in turn gets clocked and raises /req3. Finally/t3 rises signaling that PE 3 now owns the bus. PE 3 is now responsible for generating thenext bus grant and so PE 1 is released of this duty. The timing diagram of figure 33 containssome interesting scenario such as two PEs requesting the bus at the same time. The readeris free to explore the timing diagram and no further explanation is provided.52gdig -&ggggg.;a1.1aIM---> MEMil1ratLtai.....Igi ,iNI^NIgkflEg.nhd .,'^mm e.-->NMI_t WM.^.Ili„g t. §1t1'41•F'.._ —ggkE,gi._._> El111,.._,..-1k•,^.5 _._._c>-^a 1^- III iLlbIFigure 32 Partial Diagram of RDC Arbitration53Figure 33 RDC Timing Diagram545.2 ProcessorThe proposed RISC processor shown in figure 34 is used for the processing elements (PE)in the parallel architecture. The architecture is similar to the Motorola MC88100[10] andthe Sun SPARC[18] architecture but the instruction set is optimized for encoding/decoding.The processing element is a dual bus 32—bit processor with a 32—bit fixed length instructionset. A 32—bit processor is chosen because memory management and mapping is compatiblewith contemporary 32—bit address space systems. A 32—bit fixed length instruction format ischosen to make the instruction decoder simpler and to allow an instruction to be fetched permemory read. This processor can exploit super-scalar parallelism because the integer unit,the data unit and the instruction unit are designed to operate simultaneously.The core processor in figure 34 has a smaller instruction set than the Motorola MC88100or the Sun SPARC and it has some special instructions that are not available on either. Thesespecial instructions are used to implement atomic memory read/write cycles and to accesslocal memory. These special instructions make this processor more suited for running theparallel algorithm described above than either the Motorola MC88100 or the Sun SPARC.55PROCESSING ELEMENTCORE PROCESSORInteger RegisterUnit Filet tInternal Busest tData InstructionUnit UnitA •• lo• 1! •LocalMemoryInstructionMemoryRDCUnit7^iShared BusFigure 34 Processor Block DiagramSections 5.2.1 to 5.2.4 describe the main units in the block diagram and discuss theinstruction set of the PE which is presented in table 7. Table 6 shows all input/output signalsof the PE and provides a brief explanation of each.5.2.1 Integer UnitThe integer unit executes the first 7 instructions in table 7. The unit executes all thearithmetic and logical operations in one clock cycle. The integer unit works only in a registerto register mode. There are no complex instructions for modifying memory locations.565.2.2 Data UnitLoading from and storing to memory is performed by this unit. Depending on theload/store operation that is being executed, the data unit sets an internal line that activatesthe local memory or the RDC element of section 5.1. A typical memory read and writetiming diagram is given in Appendix D.5.2.3 Instruction UnitThe instruction unit does memory reads which follow the read timing diagram depictedin Appendix D. The instruction unit implements a two stage pipeline with delayed branching.This allows an instruction to be executed in the slot immediately after a branch operationso that the cost associated with branching can be reduced by compiler writers and assemblylanguage programmers.5.2.4 Register FileThe register file consists of a 32 32—bit registers. Not all of these registers are availablefor general programming use. The following text explains the functions of the specialregisters.Reg(0) = Zero Register Use of register 0 as a zero register is a hardware convention.Register zero is wired to zero by the hardware. Programs are allowed to write into registerzero, but this has no effect on its contents.Reg(26) = Frame Pointer Use of register 26 for the frame pointer is a software convention.The frame pointer points to the base address of the stack frame of each procedure.Reg(27) = Stack Pointer Use of register 27 for the stack pointer is a software convention.The stack pointer points to the present location in the stack.57Reg(28) = Return Address Use of register 28 for the return address is a hardware con-vention. Whenever a branch to a subroutine call is made the return address is stored intoregister 28.Reg(29) = Queue Address Use of register 29 for the queue address is a software conven-tion. The queue address is the I/O mapped address of the queue-processor.Reg(30) = Processor Address Use of register 30 for the processor address is a softwareconvention. The I/O mapped address of the processor is stored in this location. When aprocessor writes to its own I/O mapped location this means that it is free for work assignment.Reg(31) = Wait Result Use of register 31 for the wait instruction's result is a hardwareconvention. When the processor executes the wait instruction, the information that is writtento the processors I/O mapped location is stored in register 31.Signal Functional DescriptionA<31..0> Address lines.D<31..0> Data lines.READ Signals memory read.WRITE Signals memory write.READY Signals a completion of a memory access.LOCAL Signals a write to the local memory.ATOMIC Signals that a atomic read, increment, write is taking place.BUSRQ Bus request line.BUSGTIN Bus grant in line.BUSGTOUT Bus grant out line.RESET The asynchronous reset line.CLK1 Phase one of the clocking signal.CLK2 Phase two of the clocking signal.Table 6 Processing Elements Signals58Instruction Description Of Arithmethic And Logic Instructionsshift left Shifts a register left.shift right Shifts a register right.add Adds two registers or a register and an immediate.subtract Subtracts between two registers or a register and an immediate.XOR  Bitwise XOR of two registers.AND Bitwise AND of two registers.OR Bitwise OR of two registers.Instruction Descriptions Of Control Flow Instructionsbrig Branch to an address specified by a register and an immediate.brsr Same as brig but the return address is saved.wait Blocks until a memory write is executed on the memory mappedlocation of the processor.Instruction Description Data Load / Store Instructionslocal load Loads to a register from the local memory.local store Stores to the local memory the contents of a register.global load Loads to a register from a global memory.global store Stores to the global memory the contents of a register.atomicincrementLoads a register from memory, increments it and stores the result backinto memory without releasing the shared bus.load high Loads to a register's high bits the immediate constant.Table 7 Instructions Executable On a Processing Element.5.3 Queue-ProcessorThe queue-processor maintains the availability status of the processors in the parallelsystem and distributes the work load accordingly. Figure 35 shows the block diagram ofthe queue-processor. The Queue Memory Manager is the controller of the queue-processorand implements the algorithm presented in figure 28. The block structure of the Queue59Information block is shown in figure 36 and the block structure of the Processor StateInformation block is shown in figure 37.Queue Processor QueueInformationAQueueMemoryManagerAProcessorStateInformation.41r--II■ .0111--•RDCUnita•Shared BusFigure 35 queue-processorWhenever a word of information is written to the I/O mapped address of the queue-processor this information is actually written into the queue memory location pointed to bythe first queue element pointer (figure 36). The first queue element pointer is then incrementedso that it is ready for the next write to the I/O mapped location of the queue-processor. Whena word of information is assigned to the queue, the information is read from the locationthat the last queue element pointer is pointing to. Again the last queue element pointer getsincremented after this operation is performed. The first and last queue element pointers areimplemented using a counter that gets incremented whenever information is written to theI/O mapped location of the queue.602AnA First^Element..-^I^QueueLast^ElementQueue.4 ►Words otn bitsQueue AddressFigure 36 Queue InformationWhen the queue-processor sees that there is an element in the queue and that any oneof the processing elements is free, it initiates a write to the I/O mapped location of the freeprocessor. This increments the last element pointer of the queue (figure 36) which signifiesthat the job is assigned to a processor. After the processor finishes the job it signals thequeue-processor of its availability status. The processor signals that it is free by writingany value into its I/O mapped address. Figure 37 shows a simplified diagram of how thequeue-processor and four PEs can maintain correct processor availability states. In figure37, comp is a comparator and the other components are explained below. Correct processoravailability states are maintained by the queue-processor. It maintains a toggle bit (p<1-4>state's in figure 37) for every processor that it assigns work to. This bit toggles on each writeto the I/O mapped location of a processor (p<l-4> addr in figure 37). Initially all bits areset to 1 signifying that all processors are busy (p<1-4> = 1) and upon start-up each processorwrites to its I/O mapped location which signals that it is free. When there is work for aprocessor to do, the queue-processor writes the information to any of the free processors.Suppose p4 gets the work written to it (ie w = 1 and AddrBus = p4 addr). This changesp4's state in the queue-processor to 1, which signifies that processor 4 is busy. Once theprocessor finishes its work, it writes any information to its I/O mapped location (ie w = 161^p I state<^p2 state<Al7N- D^AN\D^w 1^i wcomp comp3 stated^p4 statedA AANDQueueProcessorw —1 I compp4^ addrilcomppl addr p2 addr p3^ addrhAddrBuspl addrcomp1ANDp I state<Procesor 1p2 addrp2 statProcessor 2p3 addrp3 statProcessor 3p4 addrI compw_ ___..1ANDp4 statProcessor 4comp1ANDcomp___IANDand AddrBus = p4 addr) which resets p4's state in the queue-processor and signals thatthis processor is free. The queue-subsystem can then assign this processor new work.Figure 37 Processor State Information626 Simulations This section describes how the parallel encoding/decoding algorithms were developedusing Unix System V IPC calls and then simulated on the parallel VHDL model. Figure 38shows the setup of all the software used to achieve the results presented in chapter 7. Sections6.1-6.5 provides details on each of the modules in figure 38 and section 6.6 explains thecomplete simulation cycle. Section 6.1 discusses the tools used to generate the algorithm.Section 6.2 discusses the tool used to create the memory images that are used in the VHDLsimulation. Section 6.3 specifies the features of the assembler used to create the code for theVHDL simulation. Section 6.4 explains how the VHDL model uses the information that isgenerated for it. Section 6.5 explains the utilities used throughout the simulation process.Figure 38 Software used in Simulation6.1 Simulation with System V IPCSystem V IPC calls are used because they allow shared memory and message queuesto be set up in a similar manner to the hardware configuration. During this project, fourdifferent algorithms were simulated on a Unix based machine using System V IPC calls.The final algorithm is the best of the four and it was presented in the previous sections ofthis thesis. The setup of the IPC communication software had to resemble the final hardwarearchitecture so that porting of code to the VHDL model would not require redefining thealgorithm. With this restriction in mind, the software was setup as shown in figure 39.63••Processor 3System V IPCMessage QueuesControllerSemaphores 4 e'd*QueueShared Memory Processor 1Figure 39 Development of Algorithm using Unix and System V IPCsThe functionality of the controller, the queue and processor is mapped to the hardwareequivalents of the host, the queue-processor and the processors. The System V IPC codeis designed so that the modules containing information to be reused in the memory mapperare in separate files. Data Base 1 consists of the System V IPC code. For encoding,the information that generates the encoding queue elements is kept in a separate file. Theencoding information is the same information shown in figure 16 of chapter 3. For decoding,the information that generates the type tree is also kept in a separate file.6.2 Memory MapperThe memory mapper takes its input from Data Base 1 and generates the initial memoryimages of the shared memory and the queue-processor. The encoding memory mappergenerates the shared memory image that holds the following: the data structure to be encoded,the queue elements for this data structure and the pointer to the available free memory. Italso sets up the queue-processor with the appropriate pointers to the shared memory queue64elements. The decoding memory mapper generates the shared memory image that holds thefollowing: the transfer syntax, the decode type tree, the first queue element and the pointerto the available free memory. It also sets up the queue-processor with the appropriate pointerto the first queue element. The shared memory images generated are written to Data Base 2.6.3 AssemblerThe assembler reads in the text file to be assembled and generates an output file that isa text file with binary numbers. This pseudo binary file is the input to the processors in theVHDL model. Data Base 3 holds all the assembler code and the output of the assembledcode goes into Data Base 4. The assembler provides these three basic functions:1. Converts text description to opcodes.2. Allows labels and calculate the destination address of branch instructions.3. Constructs multiple simple instructions from a single assembler instruction.The assembler does not flag potential errors that occur from the delayed branch operation.This is a programmer's responsibility. Figure 40 shows examples of an assembler text file.{ Contents:^This file is a sample assembler code. }beginaddq^rl , r0 , 5 ;^{ load 5 into register 1 }loop :^subq^rl , rl, 1;^{ decrement register 1 }briqz^stop;^{ branche on zero to stop label }add^r0 , r0 , r0 ;^{ no op }brig^loop;^{ branches to loop label }add^rO,r0,r0;^{ no op }stop:^halt; { stop processing }end.Figure 40 Example Assembler Code65Comments are allowed anywhere within the assembler code provided they are surroundedby 44 )1 . The above code sets register one equal to five, decrements it until it is zero, andthen halts the processor. Briqz is a branch on zero to the label. The add instruction afterthe branch instructions is present for a delayed branch error not occur. The rest of this codeshould be self explanatory.6.4 VHDL ModelUpon starting up, the VHDL model loads the processors with their binary program fromData Base 4. The VHDL model also loads the shared memory image and the queue memoryimage from Data Base 3. The model then runs the simulation until completion, terminates,and produces its output in Data Base 5.6.5 UtilitiesSome utilities were constructed to aid in the debugging process and to analyze the outputof the VHDL simulation. The following is a list of them:• mm2hex The VHDL simulator deals with text files of binary numbers. This utilityconverts these pseudo binary files to hexadecimal so that it is easier to read.• mm2idx This program takes the memory dump of an encoding run and produces thecontents of the idx structure. The task of verifying the encoded result is easier withthis tool.• bin2hex When the VHDL model is put into debug mode it produces information oneach instruction executed such as the contents of the instruction register, the PC andvarious other registers. This program takes this pseudo binary file and converts it to aneater hexadecimal format.combine Interleaves binary and hex files so that either the binary or hex value can beconsulted depending on which makes interpretation easier.666.6 Description of a Complete Simulation CycleThe steps required to complete a simulation cycle and achieve the results presented inchapter 7 are explained in this section.The steps required to generate a simulation result for an encoding are:1. Choose an ASN.1 specification and a value instance.2. Construct the C data types, place the value instance in the C structures and generate theCASN.1 like structured encoding routines.3. Compile the data from list item 2 with the System V IPC code. Since the data typesand the structure encoding routines are hand generated they may have errors in them.Finding these errors by running the System V simulation is easier than looking throughthe pseudo binary files and hex files generated by the VHDL simulations.4. Use the memory mapper software to create the memory images for the encode.5. Construct the assembler code needed by the VHDL processors for encoding.6. Run the simulation of the VHDL model using the data generated by 4 and 5.7. Verify encoding results using the utilities discussed in 6.5 and gather simulation data.The steps required to generate a simulation result for a decoding are:1. Choose an ASN.1 specification and a value instance.2. Construct a decode type tree in C.3. Run the encoding simulation and use the mm2idx utility of section 6.5 to extract thetransfer syntax of the encoding routine.4. Compile the data from list item 2 with the System V IPC code. Since the decode typetree is hand generated there may be errors present in the type tree. Finding these errorsby running the System V simulation is easier than looking through the pseudo binaryfiles and hex files generated by the VHDL simulations.5. Use the memory mapper software to create the memory images for the decode.6. Construct the code needed by the VHDL processor for decoding.677. Run the simulation of the VHDL model using the data generated by 5 and 6.8. Verify decoding result using the utilities discussed in 6.5 and gather simulation data.687 Results The initial choice of the tests to simulate the parallel algorithm raised many questions,some of which are discussed in the following sections. Section 7.1 discusses the need forstandard test vectors for ASN.1 encoders/decoders and section 7.2 presents the statistics ofthe simulation.7.1 Need for Standard Test VectorsAt present no standard tests are used when encoding/decoding speeds are quoted.Comparing results is difficult because test vectors can be chosen to bring out the optimalperformance of specific encoders/decoders. Two possible pathological cases are:• In VASN.1 strings are not copied from the incoming TS to the local data structure. Insteada pointer in the local representation is set to the data. An ASN.1 complier like CASN.1copies the strings. If we are dealing with 1 Kbyte per string then the performance ofVASN.1 is significantly faster.• Suppose we assume an ASN.1 type is defined as one string and that the string octetshad to be copied from the incoming TS to the local data structure. If we comparedCASN.1 to the parallel algorithm presented in this thesis, the speeds of the two wouldbe approximately equal. When strings of information must be copied, with little or noprocessing, the access to memory becomes the bottleneck.7.2 Simulation StatisticsIt is impossible to compare the parallel algorithm presented in this thesis and the algorithmin VASN.1[3]. The result in [3] presents the total number of instructions executed for anunknown abstract syntax and value instance. The reconstruction of the results of VASN.1would require acquisition of the hardware models and re-simulation with a known abstractsyntax and value instance. Since the reconstruction is not possible, the parallel algorithm iscompared to the code generated by UBC's ASN.1 compiler.69The processors in the parallel algorithm were simulated at a speed of 25 MHz and usedthe instructions per byte count shown in Appendix F. The results are summarized below.ASN.1- Personnel Rec. Ten Reals Ten IntegersUnits-4 Mbits/s Mbits/s Mbits/s# Proc. Oper.1 Encode Decode Encode Decode Encode Decode1 noop 34.3 '^8.6 99.5 24.2 26.1 6.32 noop 59.6 15.5 167.5 29.1 43.9 7.610 noop 101.5 22.6 208.2 28.8 54.6 7.61 copy 13.7 7.2 40.8 21.0 11.9 5.82 copy 23.9 13.2 65.4 28.4 19.2 7.610 copy 40.0 20.3 80.2 28.1 24.0 7.51 50% 6.7 2.5 13.9 3.4 4.1 2.02 50% 12.1 4.94 25.5  6.6 7.4 3.710 50%  28.1 15.1 61.6  16.9 16.5 6.051 100% 4.5 1.6 8.3 1.8 2.5 1.22 100% 8.4 3.2 15.6 3.6 4.7 2.310 100% 21.4 10.9 45.9 12.0 12.5 5.01^_ CASN .253 .242 - - - -Table 8 Simulation Results in Mbit/s.The first row of table 8 shows the three different ASN.1 types simulated with the parallelalgorithm. Column one shows the number of processors the algorithm is simulated with.Column two shows the actions that the processors execute. These actions and the analysisof their results are as follows:• noop When a primitive call is made no operations are executed. The simulation providesthe upper limit on the throughput of the algorithm and the overhead associated with the70algorithm.• copy When a primitive call is made, the contents in the data structure of the applicationare copied to the TS for encoding and vice-versa for decoding. The throughputs forthe noop simulations are higher than the throughputs for the copy simulations due tothe shared bus contention caused by the copying of the IA5String types. The IA5Stringtypes require almost no processing and must be copied over the shared bus.• 50% When a primitive call is made, the copy operation is executed and as well as 50%of the instruction/byte count generated by the serial program. This simulation providesa middle ground result in relation to the copy and the 100% option specified below.• 100% When a primitive call is made, the copy operation is executed and as well as 100%of the instruction/byte count generated by the serial program. This simulation providesa lower limit on the achievable throughput.• CASN This is the encoding/decoding rate for CASN.1[12]. This result was achievedby running CASN.1 on a SUN 3/260 under UNIX 4.2 BSD. The SUN 3/260 machinesuse a 25MHz 68020 Motorola processor. The single processor simulation of the parallelalgorithm produces higher throughput than the single processor implementation CASN.1because CASN.1 was designed only for functionality and not optimized for speed. Thesingle processor simulation of the parallel algorithm uses the instruction/byte countgenerated by an optimized version of CASN.1, named a2c, that is under developmentat UBC.The higher speeds achieved when encoding are attributed to the following:• When encoding, the compiled encoding routine approach is followed while in decodingthe type tree approach is followed. Ref.[16] notes that the compiler approach is usuallyfaster than the type tree approach due to the extra processing required for traversingthe type tree.• When decoding, the extra processing requirements of a type tree increases the amount ofcommunication overhead in the parallel system which slows down the decoding speeds.71• When encoding, pipelining and parallelism are achieved by having the host and theparallel hardware operate simultaneously. When decoding, only parallelism is exploitedbecause the parallel hardware operates alone.• When decoding, the input TS is compared to the expected value. When encoding, nocomparison operation is performed.From the data presented in table 8 it is difficult to determine the optimal number ofprocessors to use in this parallel setup. Deciding the margin of diminishing returns isa difficult problem because the maximum number of processors used depends on howhierarchical the ASN.1 structure is and the data instance being encoded/decoded. Forexample, if there is an ASN.1 definition of two octets then the maximum processor utilizationis two and the memory bandwidth is the bottleneck. In another case suppose that there is asASN.1 type defined as a SEQUENCE of 20 object identifiers. It is possible that 20 processorscould be kept busy decoding these object identifiers without running into a memory bandwidthproblem. Thus the parallel version of the software performs at a speed that is comparableto the serial version in the worst case. The worst case occurs when there is one octet inthe ASN.1 type and the memory bandwidth is the bottleneck. In all other cases the parallelversion significantly outperforms the serial version.728 Future WorkThis section describes enhancements to the present algorithm, which were not tested forthis thesis project, but are suggested for future work.8.1 SimulationsThe following is a list of some of the improvements that can be made to the simulationsystem:• See the impact of programming primitive encoding/decoding in assembly language.Make an ASN.1 compiler that reads in text files of ASN.1 and generates the equivalentC type, the structures encoding routines and the decoding type tree.Modify the back-end of a public domain C compiler so that it produces output that theVHDL model can directly use.Determine the effect of changing the single bus connection network. A good startingpoint for this is a double bus with independent RDC arbitration.8.2 Processing ElementThe following improvements that can be made to the processing elements are listed inorder of importance. The constraining factor for the suggestions below is the total availabledie space on an IC.• Install a two window register file in each processor. The PE calls subroutines that causethe environment of the PE to be saved to memory. On the return of the subroutine theenvironment is restored from memory. This causes memory reads and writes therebythat slows down the execution speed of the processor.• On chip memory shown in Figure 41 is used as a data cache. The advantage of an onchip cache is that the load on the shared bus decreases because the processing elementscan use burst reading/writing of memory reducing the contention for the shared bus.73This allows the processor to acquire control of the bus faster and reduce the total waitingtime for the bus.PE PE PE PEcache cache cache cacheOff Chip I On ChipFigure 41 Local Cache Setup8.3 Parallel use of PEsThe following provides the improvements that can be made to the parallel hardware.Again the constraining factor is the total available die area on an IC.• The single shared bus can be replaced by a multiple shared bus with data caching asshown in figure 42. This decreases the time a processor spends waiting for a bus andincreases the work performed. The parallel hardware shown in 41 and 42 can be specifiedand simulated to determine which is better.• Multiple ED library calling queues in the parallel processor. This would allow morethan one encoding/decoding to take place and can also prioritize the system. This wouldraise the system performance and PE utilization.74PE^PE^PE^PECacheOff Chip On ChipFigure 42 Multi-Bus Global Cache8.4 Fabrication Constraints• When more elements, such as large register files and data caches, are put onto a die thetotal area occupied on the die increases and hence there is a need for fault tolerance andtestability. Hardware designs must keep this in mind.• An initial hand layout instead of an automatically generated layout could save on chiparea and would improve the speed of the processor. Ref. [7][14][22] contain examplesof the hand layouts found in literature.759 Conclusion This thesis has shown that a parallel ASN.1 encoder/decoder allows higher throughputthan a serial implementation. Table 9 presents the worst case simulation results for theASN.1 PersonnelRecord type specified in Appendix E. The results in table 9 are normalizedwith respect to the serial algorithm of CASN.1 and the parallel algorithm executed on oneprocessor. The results in table 9 normalized with respect to the parallel algorithm running onone processor provide a better indication of the speedup achievable by the parallel algorithm.Comparing CASN.1 to the parallel algorithm is unfair because the code generated by CASN.1is not optimized for speed and the values used in the parallel algorithm simulations are derivedfrom an optimized version of CASN.1. The optimized version of CASN.1 is named a2c andis in the final stages of development at UBC.CASN.1 Normalized Par. 1 proc. NormalizedAlgorithm Processors Encode Decode Encode DecodeCASN.1 1 1 1 0.056 0.15Parallel 1 17.8 6.6 1 1Parallel 10 84.6 45.0 4.8 6.8Table 9 Worst Case Simulation Results for PersonnelRecord.The simulation results derived in this thesis can not be compared to simulation resultsof VASN.1 because the published throughputs of VASN.1 do not specify the ASN.1 type orvalue instances that were used. The major features of the parallel system simulated in thisthesis and a structural comparison of it to VASN.1 are presented below.• An optimal ASN.1 processing element can be designed based on the instruction usagestatistics gathered during the simulations.• A distinct interface to the parallel system has been provided for users. This will allowfor easy integration with other systems.76• The rotating daisy chain bus arbitration scheme that is used in the parallel system of thisthesis provides the best bus arbitration scheme for the required amount of hardware[2].In the VASN.1 system the bus arbitration is preformed by four processing elements whichis an inefficient use of the processing power.• The hardware of VASN.1 uses an associative memory while the hardware presented inthis thesis does not. Due to the complexity of implementing an associative memory, thesystem implementation time and the system area requirements will be high in VASN.1.• VASN.1 has a limited number of processors due to its hardware configuration. Thehardware designed in this thesis is structured and scales gracefully. This allows moreprocessing elements to be included in the parallel encoding/decoding which increasesthe achievable parallelism.• VASN.1 does not do a parallel to serial conversion of the data structures it provides forthe users but assumes all users accept parallel data structures. The users of VASN.1 mustpay the high cost of programming their applications to handle parallel data structures.The algorithm of this thesis provides a serial to parallel data structure conversion at alow cost hence user's programs are not required to work with parallel data structures.An algorithm for converting a serial data structure to a parallel data structure in thepresentation layer has never been proposed in the literature.Based on the above data one can conclude that the basis for a significantly better parallelalgorithm than VASN.1 has been provided and the initial simulation results are promising.77Bibliography[1] Peter J. Ashenden. The VHDL Cookbook. This is postscript document available by FTPfrom chook.adelaide.edu.au , July 1990. University of Adelaide.[2] W. L. Bain, J.R. Ahuja, and S.R. Ahuja. PERFORMANCE ANALYSIS OF HIGH-SPEED DIGITAL BUSES FOR MULTIPROCESSING SYSTEMS. In Proceedings of the8th Annual Symposium on Computer Architecture, volume 8, pages 107-133, Minneapolis,Minnesota, May 12-14, 1981.[3] M. Bilgic and B. Sarikaya. AN ASN.1 ENCODER/DECODER AND ITS PERFOR-MANCE. In IFIP PSTV X, pages 133-150, Ottawa, Canada, June 12-15, 1990.[4] D. Chapman. A Tutorial on Abstract Syntax Notation One (ASN.1). Transmission #25 byOpen System Data Transfer, December 1986. Omnicom information service, OmnicomInc.• ISSN 0741286X.[5] David D. Clark and David L. Tennenhouse. Architectural considerations for a newgeneration of protocols. SIGCOMM '90, 24(4):200-208, September 1990.[6] Sun Microsystems Inc. XDR: External Data Representation Standard, (RFC 1014) inInternet Working Group Requested for Comments. Network Information Center, SRIInternational, Menlo Park, Calf., June 1987. No. 1014.[7] Robert W. Sherburne Jr., Manolis G.H. Katevenis, David A. Patterson, and Carlo H. Se-quin. DATAPATH DESIGN FOR RISC. In CONFERENCE ON ADVANCED RESEARCHIN VLSI, MIT, pages 53-62, Cambridge, Massachusetts, January 1982.[8] Mike Kong. Network Computing System Reference Manual. Prentice-Hall, Inc., 1987.ISBN 0136170854.[9] Roger Lipsett, Carl F. Schaefer, and Cary Ussery. VHDL: hardware description anddesign. Kluwer Academic Publishers, 1989. ISBN 0-7923-9030–X.[10] Motorola. MC88100 RISC MICROPROCESSOR USER'S MANUAL, 1989. OrderNumber: MC88100UMAD/AD, page 1-1 — 1-13.[11] Gerald W. Neufeld and Son Vuong. An Overview of ASN.1. Computer Networks andISDN Systems, 23:393-415, 1992.[12] Gerald W. Neufeld and Yueli Yang. The Design and Implementation of an ASN.1–C Compiler. IEEE Transactions on Software Engineering, 16(10):1209-1220, October1990.78[13] U.S. Department of Commerce. NBS, "User Guide for the NBS Prototype Compilerfor Estelle, final report". Technical report, National . Bureau of Standards, October 1987.Report No. ICST/SNA-87/3.[14] D. Pucknell and K. Eshraghian. Basic VLSI Design Principles and Applications.Prentice-Hall, 1985. ISBN 0-13-067851-1.[15] Mike Sample and Gerald W. Neufeld. A High Performance ASN.1 Compiler. inpreparation at UBC Computer Science, 1992.[16] Mike Sample and Gerald W. Neufeld. Support for ASN.1 within a Protocol TestingEnvironment. In Proceedings of the IFIP TCIWG 6.1 Third International Conference ofFormal Description Techniques for Distributed Systems and Communications Protocols,pages 295-302, Madrid, Spain, November 5-8 1990.[17] W. Stallings. Data and Computer Communication. New York: Macmillan, 1988.[18] Sun Microsystems Inc. A RISC Tutorial, 1988. Part Number: 800-1795-10, pages 1-15, Revision A of 9 May.[19] Len Takeuchi. STUDY OF OSI PROTOCOL PROCESSING ENGINES. Master'sthesis, Department of Electrical Engineering, University of British Columbia, 1991.[20] Andrew S. Tanenbaum. Computer Networks. Prentice-Hall, 1988. ISBN 0-13-162959—X.[21] Kenneth J. Thurber, E. Douglas Jensen, Larry A. Jack, Larry L. Kinney, Peter C. Patton,and Lynn C. Anderson. A systematic approach to the design of digital bussing structures.In AFIPS Conference Proceedings, Fall Joint Computer Conference, volume 41 part 2,pages 719-740, 1972.[22] N. Weste and K. Eshraghian. Principles of CMOS VLSI Design: A Systems Perspective.Addison-Wesley, 1985.[23] The Wollongong Group, 1129 San Antonio Rd. Palo Alto, CA, USA. ISODE, The ISODevelopment Environment: User Manual, February 1990. Version 6.102 Volume 1 and 4.[24] W. Wu, M. Bilgic, and B. Sarikaya. VHDL MODELING AND SYNTHESIS OFAN ASN.1 ENCODER/DECODER. In Canadian Conference on Very Large ScaleIntegration, CCVLSI '90, pages 1.5.1-1.5.8, Westin Hotel, Ottawa, Ontario, October21-23, 1990.[25] Wayne Wu. VASN1: A HIGH-SPEED MULTI-RISC EMBEDDED SYSTEM FOR79ASN.1 ENCODING AND DECODING. Master's thesis, Department of ElectricalEngineering, Concordia University, September 1991.[26] Recommendation X.208. Specification of Abstract Syntax Notation One (ASN.1), pages57-130. CCITT Blue Book, Volume V.III, Fascicle VIII.4, International Telecommuni-cations Union, Geneva, Switzerland, 1989.[27] Recommendation X.209. Specification of Basic Encoding Rules for Abstract SyntaxNotation One (ASN.1), pages 131-151. CCITT Blue Book Volume V.III, Fascicle VIII.4,International Telecommunications Union, Geneva, Switzerland, 1989.[28] Yueli Yang. ASN.1—C COMPILER FOR AUTOMATIC PROTOCOL IMPLEMENTA-TION. Master's thesis, Department of Computer Science, University of British Colum-bia, 1988.80Appendix A Acronym DefinitionsASN.1 Abstract syntax notation oneBER^Basic encoding rulesFIFO^First in first outIC^Integrated circuitIPC^Inter-process communicationISO^International Standards OrganizationPDU^Protocol data unitPE Processing elementPPDU^Presentation protocol data unitRDC^Rotating daisy chainTLV^Tag, length and valueTS Transfer syntaxVHSIC Very high speed integrated circuitsVHDL^VHSIC hardware description languageUBC^University of British ColumbiaAppendix B Cost per Instruction MappingAction Costallocate memory 5assignment 2complex assign 3write to q 2logic operation 1add or sub 1stack push 3stack pop 2Note that CASN.1 does not do any bounds checking on thestacks. The cost increases if CASN.1 does.Thesis parallel algorithm.EncodePrimitive TagCost^Operation5 1 allocate memory10 5 assignments1^1 write to queueEncodePrimitive Non TagsCost^Operation5 1 allocate memory18 9 assignments1^1 write to queueEncodeStructBegin New Level and NodeCost^Operation2 1 if statement5 1 allocate memory28^14 assignmentsEncodeStructBegin New NodeCost^Operation2 1 if statement5^•^1 allocate memory22 11 assignments82EncodeStructEnd (non root level)Cost^Operation4 2 assignments2 1 if statementEncodeStructEnd (root level)Cost/node^Operation5^1 allocation16 8 assignments1 1 logic OR1^1 write to queue2 1 loop statementCost/level^Operation4^2 assignments1 1 write to queue2 1 branchCASN.1EncodePrimitive20 for boolean (this is a guess)60 for int (this is a guess)300 for real (this is a guess)EncodeStructBeginCost^Operation3 1 complex assignment9 3 stack push10^encode tagEncodeStructEnd (for short form definite length)Cost^Operation4 2 pops8 4 assignment3^1 complex assignment5 1 allocate memory2 1 assignment (IO_OUT)2^1 branch7 or 10^link (explained below)9 3 complex assignmentlink with empty stack (used in EncodeStructEnd)2^1 pop2 1 if not chosen3 1 push83link with non empty stack (used in EncodeStructEnd)2 1 pop2 1 if chosen3 1 assignment3 1 pushAppendix C Pros and Cons of Static AnalysisCompilation of CASN.1 primitive routines into assembler to generate approximateinstruction count has two problems.The first problem is that the analysis of the C code is a static one. The primitiveencode/decode instruction count generated is static so one should take into account thedifference between a static analysis and a dynamic analysis. Generally not all the assemblerinstructions are executed, hence a static analysis count results in a greater overall instructioncount. Also within the fragments of code there are loops that execute a number of times andthese extra instructions are unaccounted for in the static analysis. Finally, within the codesome routines get called more than once but the assembled code includes all subroutinesonce. This reduces the total instruction count in the static analysis.The second problem is that of application-dependent extra code included in the staticassembler count. The error checking code for CASN.1 should not be considered instructionsfor encoding/decoding primitive types. This extra code for the encoding/decoding processis specific to the implementation. To get an accurate count of the instructions that need tobe executed for a primitive encode/decode one should construct the primitive routines inassembler and then analyze the resulting code by hand. By using varied data sets to testthe primitive routine, instruction counts can be calculated by using loop unrolling and othersuch methods. Note that even though the implementation-dependent code does not constitutethe minimum instructions for encoding/decoding it does represent the overhead incurred inimplementing a functional algorithm.Although the instruction counting method is limited, it is presented to show that asignificant number of instructions must be executed to encode/decode a primitive type andhence parallelism is applicable.85Memory R adClocklClockAddr.ReadDataReadyWriteClocklClockAddr.WriteDataReadyReadAddressMemory WriteX ValidValid DataX Valid AddressValid DataAppendix D Processors MemoryRead and Write CycleFigure 43 Memory Read/Write Timing Diagrams86Appendix E ASN.1 PersonnelRecord TypePersonnelRecord ::= [APPLICATION 0]^IMPLICIT SET {titledateOfHirenameOfSpousechildren[0][1][2][3]Name,IA5String,EmployeeNumber,Date,Name,IMPLICIT SEQUENCE OFChildlnformation DEFAULT 0}Childlnformation ::= SET {dateOfBirth [0]Name,DateName ::= [APPLICATION 1] IMPLICIT SEQUENCE {givenName^IA5String,initial IA5String,familyName IA5StringEmployeeNumber ::= [APPLICATION 2] IMPLICIT INTEGERDate^[APPLICATION 3] IMPLICIT IA5String^YYYYMMDDvalue instance{ {givenName "John", initial "E", familyName "Smith"},title"The Big Cheese",99999,dateOfHire"19820104",nameOfSpouse{givenName "Mary", initial "L", familyName "Smith"},children {{{givenName "James", initial "R", familyName "Smith"},87dateOfBirth "19570210"),{{givenName "Lisa", initial "M", familyName "Smith"},dateOfBirth "19590621"})}88Appendix F ASN.1 Types Instruction CountsThe instructions per byte values that are used in the simulation are generated by profilinga2c. Table 10 presents the average instructions executed per byte for three different datastructures. The first is the PersonnelRecord ASN.1 type specified in Appendix E. The seconddata structure is specified in 44 and consists of ten integers. The final data structure is tenreals. The ten real type is the same as the ten integer type, with the integers replaced by reals.teninstType^SEQUENCE fIntl INTEGER,int2 INTEGER,int3 INTEGER,int4 INTEGER,int5 INTEGER,int6 INTEGER,int7 INTEGER,int8 INTEGER,int9 INTEGER,into INTEGER}value instance = (5,5,5,5,5,5,5,5,5,5)Figure 44 Ten Integers TypePersonnelRecord Ten Integers Ten RealsEncode 14 21 13Decode 35 46 66Table 10 Simulations Instruction per Byte Values.89


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items