"Applied Science, Faculty of"@en . "Electrical and Computer Engineering, Department of"@en . "DSpace"@en . "UBCV"@en . "Chan, Vincent"@en . "2009-08-06T23:47:22Z"@en . "2001"@en . "Master of Applied Science - MASc"@en . "University of British Columbia"@en . "The high volume of traffic in facsimile transmissions has lead to demands for high-speed\r\nimage processors to use in dedicated multi-channel fax devices. Image processors in\r\nthese devices must provide a high throughput of data as well as the ability to easily\r\nswitch between processing of different images. In this thesis, the architecture of an\r\nimage processing engine for the encoding and decoding o f images based on the Joint Bi-level Image Group (JBIG) standard is proposed. When used in conjunction of another\r\nprocessing engine for the coding of modified Huffman (MH), modified Read (MR) and\r\nmodified modified Read (MMR) images, the engine is capable of compressing and\r\nconverting between these facsimile standards on the fly at a high data rate. Another\r\napplication of the JBIG engine is in the controller of digital printing or copying devices\r\nwhere image compression can effectively reduce memory usage. For efficient\r\nimplementation of the standard, the JBIG facsimile standard is analyzed to exploit\r\nparallelisms in the algorithm. The proposed JBIG engine utilizes a simple pipelined\r\narchitecture to achieve a throughput of close to 1 pixel per clock cycle. Synthesis results\r\nshow that the JBIG engine can run at a clock rate of 50 MHz on an Altera Flex 10KE\r\nfamily -1 speed rated programmable logic device."@en . "https://circle.library.ubc.ca/rest/handle/2429/11968?expand=metadata"@en . "4261627 bytes"@en . "application/pdf"@en . "Architecture of a High Speed JBIG Processing Engine . by Vincent Chan B.A.Sa, The University of British Columbia A thesis submitted in partial fulfilment of the requirements for the degree of Master of Applied Science in THE' FACULTY OF GRADUATE STUDIES C:-.' Department of Electrical and Computer Engineering We accept this thesis as conforming N to the required standard T H E U N I V E R S I T Y OF B R I T I S H C O L U M B I A A p r i l 2001 \u00C2\u00A9 Vincent Chan, 2001 U B C Special Collections - Thesis Authorisation Form In presenting this.thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make i t freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of bUcj/.'ba. 1 Qy^-ek C.c^^r Er^^ef,\ The University of British Columbia. Vancouver, Canada Date /VofV \ '^-o-O [ Abstract The high volume of traffic in facsimile transmissions has lead to demands for high-speed image processors to use in dedicated multi-channel fax devices. Image processors in these devices must provide a high throughput of data as well as the ability to easily switch between processing of different images. In this thesis, the architecture of an image processing engine for the encoding and decoding o f images based on the Joint B i -level Image Group (JBIG) standard is proposed. When used in conjunction of another processing engine for the coding of modified Huffman ( M H ) , modified Read ( M R ) and modified modified Read ( M M R ) images, the engine is capable of compressing \u00E2\u0080\u00A2 and converting between these facsimile standards on the fly at a high data rate. Another application of the J B I G engine is in the controller of digital printing or copying devices where image compression can effectively reduce memory usage. For efficient implementation o f the standard, the J B I G facsimile standard is analyzed to exploit parallelisms in the algorithm. The proposed JBIG engine utilizes a simple pipelined architecture to achieve a throughput of close to 1 pixel per clock cycle. Synthesis results show that the J B I G engine can run at a clock rate of 50 M H z on an Altera Flex 1 OKE family -1 speed rated programmable logic device. i i Table of Contents Abstract ii Table of Contents iii List of Tables vi List of Figures vii Chapter 1 Introduction 1 Chapter 2 Image Compression 6 2.1 Arithmetic Coding .6 2.2 Adaptive Modeling 12 2.3 Context Models 14 Chapter 3 The JBIG Bi-Level Image Compression Standard . 18 3.1 Overview 18 3.2 Features o f the J B I G Facsimile Standard 21 3.3 Data Organization .-. 24 3.3.1 Bi -Leve l Image Header (BIH) 24 3.3.2 B i -Leve l Image Data (BID) 26 3.3.2.2 Floating Marker Segments \u00E2\u0080\u00A2 27 3.3.2.2 Stripe Data Entity (SDE) \u00C2\u00AB 28 3.4 Adaptive Templates \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2 29 3.5 Typical Prediction 31 3.6 QM-Coder - 3 3 Chapter 4 High Level Design of the JBIG Processing Engine 38 4.1 Engine Capabilities 38 4.1.1 Intended Use 39 4.1.2 Supported J B I G Features 39 4.1.2.1 B I H 40 4.1.2.2 Typical Prediction 40 4.1.2.3 Markers 42 4.1.2.4 Variables' Range 43 4.1.3 Data Access '. 43 4.1.4 Context Switching 44 4.2 Engine Architecture 45 4.2.1 Encoder Design 45 4.2.1.1 J B I G Processing and Control Unit. 46 4.2.1.2 C X Construction Unit 49 4.2.1.3 QM-EncoderUni t . . . . , \u00E2\u0080\u00A2 52 4.2.2 Decoder Design 55 4.2.2.1 J B I G Processing and Control Unit 55 4.2.2.2 C X Construction Unit -58 4.2.2.3 QM-Decoder Uni t 61 Chapter 5 Detailed Description of the JBIG Processing Engine 63 5.1 Q M - C o d e r U n i t ... \u00E2\u0080\u00A2 6 3 5.1.1 Pipelined Design..... 6 4 iv 5.1.2 Adapter Unit 69 5.1.2.1 Encoder 69 5.1.2.2 Decoder 73 5.1.3 Coder Unit , 75 5.1.3.1 Encoder 75 5.1.3.2 Decoder .\u00E2\u0080\u00A2 80 5.2 C X Construction Unit .V. 83 5.3 J B I G Processing and Control Unit 85 5.4 Implementation Results 88 Chapter 6 Conclusions and Future Research , 91 References \u00E2\u0080\u00A2 95 List of Acronyms 98 v List of Tables Table 1: Example of Arithmetic Encoding 10 Table 2: Example of Arithmetic Decoding 12 Table 3: J B I G Improvement Factors..... 19 Table 4: B I H Format in J B I G Facsimile Standard 24 Table 5: Options Field Format 25 Table 6: B I D Format 26 Table 7: S D E Format 28 Table 8: Interface Signals of J B I G Processing and Control Unit in J B I G Encoder 49 Table 9: Interface Signals o f C X Construction Unit in J B I G Encoder 52 Table 10: Interface Signals of QM-Encoder Unit 54 Table 11: Interface Signals for J B I G Processing and Control Unit in J B I G Decoder 58 Table 12: Interface Signals of C X Construction Unit in J B I G Decoder 61 Table 13: Interface Signals of QM-Decoder Unit in J B I G Decoder 62 Table 14: Synthesis Results for F L E X 10K200S at 50 M H z 88 Table 15: Average Throughput o f J B I G Engine 89 Table 16: Performances of J B I G Processors 90 List of Figures Figure 1: Example of Arithmetic Encoding : 8 Figure 2: Example of Arithmetic Decoding 11 Figure 3: A Seven-pixel Context for Image Compression 15 Figure 4: Compression Factors of Templates of Different Sizes 16 Figure 5: Adaptive Templates in JBIG: (a) Three-Line Template, (b) Two-Line Template '., 30 Figure 6: Reused Contexts for Coding the S L N T P Pseudo-pixel with 32 Figure 7: QM-coder Interval Structure 35 Figure 8: Block Diagram of J B I G Encoder 45 Figure 9: Block Diagram of J B I G Decoder 55 Figure 10; Stages of Processing for Each Pixel in QM-Coder : 65 Figure 1,1: Possible Schedule of QM-Coder Unit 's Pipeline 68 Figure 12: Architecture o f Adapter in QM-Encoder Unit 70 Figure 13: Implementations of N e w I and Qe Logic with 72 Figure 14: Architecture of Adapter in QM-Decoder Unit 73 Figure 15: Architecture of Coder in QM-Encoder Unit 76 Figure 16: Algorithms for the C L E A R B I T S procedure 78 Figure 17: Architecture o f Coder in QM-Decoder Unit 80 Figure 18: B i t Assignments of C X in (a) Three-Line Template and (b) Two-Line Template 83 v i i Figure 19: State Transition Diagram of JBIG Encoding Engine. Figure 20: State Transition Diagram for JBIG Decoding Engine Chapter 1: Introduction Chapter 1 Introduction The ability to transmit and receive exact copies of printed documents between any location around the world in a simple, rapid and low cost manner has made the use of fax equipment increasingly popular. Over the years, fax machines have become a necessary and indispensable part of many businesses, from home offices to major corporations. Although some people believe that popular electronic document transmission techniques, such as the World Wide Web and electronic mail , wi l l replace the use of fax machines, this scenario has yet to happen. The reason is mainly because electronic transmission requires an electronic copy of the document. A s most documents are available only in printed form, facsimile transmission remains the fastest and easiest way of transmitting a document between remote locations. A s a result, the volume o f data in the traffic of facsimile transmissions is ever increasing. The current facsimile technology involves scanning, compressing, storing and transmitting of bi-level (black and while) image documents. After establishing a connection, the transmitting and receiving sides must agree on a standard for compression before the document is compressed and transmitted. Several popular compression algorithms used for digital facsimile include the modified Huffman (MH) , 1 Chapter 1: Introduction modified Read (MR) and modified modified Read ( M M R ) and Joint Bi-level Image Group (JBIG) coding standards. Amongst the four coding standards, J B I G requires the most processing power, but also provides the most compression efficiency. In many cases, an image is often first compressed and stored in one standard before transmission. The compressed data is then converted on the fly to another format that is compatible with the terminal at the receiving end. JBIG is a desirable format to store images due to the small file size it generates. However, there is not a large base of fax machines currently available that can receive J B I G images, although the number is increasing. Therefore, the demand for an image processing chip capable of converting images between the above mentioned compression standards certainly exists. In applications such as network fax boards, a single board handles many incoming or outgoing fax channels at one time, sending or receiving images, depending on the usage of each channel. Such fax boards are useful in replacing a bank on separate fax machine in major corporations, performing advanced functions such as fax broadcasts, and providing services so that fax machines can communicate with other devices such as printers and e-mail systems. In these situations, the image processing chip embedded on the fax board must possess enough processing power to process the images in all the channels in a timely manner. Various real-time constraints are imposed on facsimile transmissions in order to detect failures in connections. These timing constraints dictate that a fax board must periodically process data in each fax channel to ensure successful transmissions and receptions. 2 Chapter 1: Introduction Although several image processors capable of compressing images according to and converting between M H , M R , M M R and J B I G are currently available, most are oriented towards operations involving whole pages of data. These processors are adequate for fax devices that have a single phone line. In a multi-channel fax device, processing a page of data in one channel at a time may cause too much latency to supply data to another channel. Some products allow task switching after a preset number of lines has been processed [1][2]. This methodology is difficult to work with when there are no easy ways of determining whether enough data are available to complete the processing o f the pre-defined section, as in the case of fax services. The need to wait for data in one fax channel creates unnecessary idle time in the processor, and prevents the processor from servicing other channels with data already available. Hence, image processors used in multi-channel fax applications must be capable of switching between images at any time for maximum utilization. The objective of this project is to design and develop a processing engine capable of encoding and decoding images in compliance with the J B I G facsimile standard. This engine, when combined with another processing engine for the M H , M R and M M R standards, forms the core o f an image processing chip that can compress, uncompress and convert images between these coding standards. One application of this image processing chip is in fax devices with a large number of incoming and outgoing channels. A s mentioned before these devices require processors having a high processing speed, as 3 Chapter I: Introduction well as being able to switch processing between images in order to service all fax channels in a timely and efficient manner. Therefore, the JBIG engine must have a high processing throughput and provide the capability of being able to switch between tasks at any time. Furthermore, the engine interface must provide enough flexibility so that different configurations can be made to fit the specific applications for which the engine is being used. In addition to usages in facsimile applications, the J B I G engine can also be used in microcontrollers for digital copiers and printer. In high-resolution printing devices, the application of image compression can ease the requirements for controller memory. Instead of storing and printing an uncompressed image directly, a printing device can compress an image before storage, and then uncompress the stored image for printing. The J B I G standard offers a high average compression ratio as well as good worst-case compression behaviour, making it suitable for such applications. J B I G is an advanced image compression standard based on arithmetic coding. Various research has shown that the arithmetic coder specified in J B I G can be implemented in many different ways [3] - [14]. While some designs focus on circuit size and simplicity, others make modifications to the J B I G algorithms for performance gain. These techniques have been carefully studied in this project to efficiently implement the J B I G algorithms into hardware circuits. 4 Chapter 1: Introduction This thesis studies the J B I G standard and proposes the architecture of a J B I G processing engine. The organization of this thesis is as follows. Chapter 2 introduces three compression techniques used by JBIG in its compression standard: arithmetic coding, adaptive modelling and context models.. A n understanding of these techniques is essential for learning the JBIG compression standard. Chapter 3 outlines the details of the J B I G compression standard. Since the research in this project focuses mainly on facsimile applications, the facsimile profde of the J B I G standard is highlighted. The chapter first provides an overview o f JBIG. It then points out various restrictions imposed on the standard when used in facsimile applications before describing in detail how the standard compresses an image. Chapter 4 describes the general architecture of the J B I G engine. The encoding and decoding parts of the engine are described separately. Each part of the engine is broken into functional modules. A description of the functionality and signal interface of each module is provided. Chapter 5 presents a detailed description of the J B I G engine. Different sections in this chapter describe, in detail, how the J B I G engine implements the J B I G algorithms in hardware to encode and decode images. Synthesis and simulation results are also given. Finally, chapter 6 concludes the paper by summarizing the research results and pointing out possible directions in future research based on the architecture of the J B I G engine. 5 Chapter 2: Image Compression Chapter 2 Image Compression This chapter introduces the fundamentals of several image compression techniques. A n understanding of these techniques is essential in this project because they are employed by J B I G when defining its standard for bi-level image compression. The first section in this chapter outlines how arithmetic coding can efficiently compress data from a given source. When compressing an image, arithmetic coding is usually used in conjunction with adaptive modeling and context models. Adaptive modeling and context models are presented in this chapter in the succeeding sections. 2.1 Arithmetic Coding The fundamental principle of data compression is to decompose a data set into a set of independent events, and then represent the events with as few symbols as possible. Compression is accomplished i f shorter codewords can be used to represent more probable events and longer codewords to represent less probable events. Given a source that outputs a set of independent events, A(, from the data set, S, the average number of bits required to code the output o f the source is given by: H(S) = -1\u00C2\u00A3P(Ai)\ogP(Ai), 6 Chapter 2: Image Compression where P(Aj) is the probability of the event A , [15]. The quantity H is called the entropy of S. Shannon proved that, for lossless, compression, the smallest possible average number o f bits an algorithm can use to encode an output of a source is equal to the entropy of the source [16]. Arithmetic coding is a popular method of generating variable length codes. It is capable o f coding data at a rate arbitrarily close to the entropy. When dealing with sources with highly skewed probabilities, arithmetic coding provides better compression rates than other coding methods, such as Huffman coding [15]. Arithmetic coding can also be used in conjunction with an adaptive model that provides probability estimation of events based on the statistical history of the source. These characteristics make arithmetic coding very well suited to be used as the coder for image compression. In arithmetic coding, a unique identifier or codeword is assigned to the sequence of data being encoded. The codeword is chosen from the unit interval [0, 1). Since there is an infinite number of available codewords in this interval, any sequence of data can be uniquely identified. A s we wi l l see, longer sequences are represented by more precise, and therefore longer, fractions than shorter ones. When encoding a sequence o f data, the unit interval is first divided into subintervals using the cumulative distribution function (cdf) o f the random variable associated with the source. In other words, the size of the subintervals is proportional to the probabilities 7 Chapter 2: Image Compression of the symbols. Based on the first symbol in the sequence, the algorithm chooses the corresponding interval as the current interval. The current interval is again divided into subintervals using the same approach. The next symbol in the sequence dictates which subinterval is chosen. This process is repeated until the end o f the sequence is reached. Finally, the algorithm selects a number in the final interval as the codeword to represent the original sequence. ai bi 1 CI2 0.11001 Figure 1: Example of Arithmetic Encoding 8 Chapter 2: Image Compression Figure 1 illustrates an example of an encoding process [16]. In this example, the sequence bia2b3 is encoded. The probability distributions for the first, second and third symbol in the sequence respectively are p(a,) = % and p(bi) = %, p(a2) = j/2 and p(b2) = Vi > P(ai) = 2A and p(b3) = % . The unit interval is first divided into subintervals [0, %) and [ 2/ 3, 1) using p(a/) and p(bj). Since the first symbol is b,, the [%, 1) subinterval is chosen. The [ % , 1) interval is then divided into subintervals [ % , 5/6) for a2 and [ % , 1) for 62. Interval [ % , %) is chosen because the next symbol in the sequence is a2. Using p(a3) and p(6j), subintervals [ 2/ 2, % ) and [ % , %) are created next. The final symbol is b3, therefore, the final interval corresponding to the original sequence is [ % , % ) . The binary representation, of this interval is [0.110001..., 0.110101...).. A n y number within the final interval can uniquely represent the sequence bia2b3. Since all numbers beginning with 0.11001 are within this interval, this number wi l l be used as the codeword. In fact, only the binary code 11001 is needed because the codeword is always a fraction between 0 and 1. Table 1 summarizes the example. 9 Chapter 2: Image Compression Step Current Interval Subintervals Symbol 1 [0,1) [o, % ) , [ ' K . ' i ) 6/ 2 [ K , % ) , [ % , i ) . 3 ' [%, % ) , [ % , X ) final interval [ % ' , % ) codeword 0.11001 Table 1: Example of Arithmetic Encoding The decoding process is just the reverse of the encoding process. To decode a sequence from a codeword, the decoder must have the same statistical model used by the encoder. A s in encoding, the decoding process starts by dividing the unit interval into subintervals using the cdf of the first symbol. The subinterval containing the codeword is chosen to be the current interval and the first symbol in the sequence is the symbol corresponding to that subinterval. The current interval is then divided into subintervals using the cdf of the second symbol. Checking which subinterval contains the codeword, the second symbol is decoded, and the current interval is again narrowed down. This process repeats itself until the entire sequence is decoded. 10 Chapter 2: Image Compression 0.11001 \u00E2\u0080\u0094 at 2 \" \u00E2\u0080\u00A2 T 0 . 1 1 0 0 1 \u00E2\u0080\u0094 5_ 6 1 2 ^ j 21 0.11001 \u00E2\u0080\u0094 - | Figure 2: Example of Arithmetic Decoding Figure 2 illustrates the decoding process of the sequence encoded in the previous example. From the binary code 11001, the decoder first constructs the codeword C = 0.11001, or 0.78125 in decimals. A n y arbitrary sequence can be appended to the right of the codeword, 0 is used in this case for convenience. Using the same statistical model of the previous example, the unit interval is first divided into subintervals [0, %) and [%, 1). Since C = 0.78125 \u00E2\u0082\u00AC , 1), it is selected to be the current interval, and the first symbol 11 Chapter 2: Image Compression is bi. The [2A, I) interval is. then divided into [ 2/ 3 , 5/6) and [% , 1) for a2 and b2 respectively. [%, %) and a2 are chosen because C e [2/3, % ) . Next, the [%, interval is divided into subintervals [ 2/3, % ) and [ % , %) . The final symbol in the sequence is b3 since C \u00E2\u0082\u00AC [2%0, 5/6 )\u00E2\u0080\u00A2 The decoded sequence is bia2b3, which is exactly the same as the original sequence. Table 3 summarizes the decoding process. Step Cur ren t Interval Subintervals Symbol codeword 0.11001 2 = 0.78125 1 [0,1) [0, K M ' % , 1 ) bi 2 t .K . l ) [K, %),[%,!) 3 i2A,%) [%, %),[%,%) b 3 Table 2: Example of Arithmetic Decoding 2.2 Adaptive Modeling There are many ways to estimate the probabilities of events in a source. Effective estimation schemes often make use of some prior knowledge of the source such as the type of the file or statistics from a previous sample of the file. Most probability estimation models fall into three categories: static, semi-static (also called semi-adaptive) and adaptive modeling. The method that uses the same model for all file types is called a static model. A static model is normally used only i f it is known that all files being 12 Chapter 2: Image Compression compressed have a very similar statistical model - for example, an all English text. Compression performance wi l l be poor i f the file has a statistical behaviour that is quite different from the model. To achieve a good performance with many varieties of fdes, one can generate a specific model for each file being compressed. The semi-static approach uses a preliminary pass through the file to gather statistics before actually compressing the file. Because each file has its own probability model, this approach provides better compression than the static approach. However, compression time is generally longer because two passes are needed for processing. In addition, the model used for compression that was obtained in the preliminary pass must be transmitted before the actual encoded data. For complex models, the cost of transmitting the model may become a significant overhead. Adaptive modeling provides compression performance that is just as good as semi-static modeling. It requires only one pass through the file and does not incur the overhead of first transmitting the probability model. A n adaptive model dynamically estimates the probability of each event based on previously occurred events in the file. For example, an encoder of text files may use previously encoded strings to estimate probabilities; an image compression engine may make use of patterns it has previously seen for the same purpose. In general, an adaptive model begins with a bland probability distribution and gradually alters it as more symbols are encountered. B y decoding the file in the same order as it was encoded, the decoder can make use of the same events to produce exactly the same probability estimates. Therefore, it is not necessary for the encoder to transmit 13 Chapter 2: Image Compression any statistical information. This is also the main disadvantage o f adaptive modeling - all files must be decoded from the beginning. This behaviour makes adaptive modeling unsuitable for applications that require random access to files. 2.3 Context Models Adaptive models are usually used in conjunction with context models to predict the probability o f an upcoming event. A context model of text files could be a window containing a number of characters to appear just before the current one. For each context that occurs, both the encoder and decoder maintain a probability distribution for the next character. The actual character can then be encoded or decoded with respect to this distribution using arithmetic coding. A s in text files, context models can be used similarly in bi-level pictures. The probability distribution of a pixel being black or white is dependent on the colour of its preceding pixels. In contrast to text, pictures are two dimensional in nature. Contexts of picture must therefore contain pixels that surround the current pixel in both the vertical and horizontal directions. One popular plan is to use a template o f pixels surrounding the current one, but preceding it in raster scan order. In other words, the template contains only pixels that are above or to the left o f the current pixel. Each combination of colours in this template provides a different context to the model. Figure 3 shows a seven-pixel context. The pixel marked with a cross is the one being coded. Pixels included in the 14 Chapter 2: Image Compression template are shown in dark grey. The pixels in light grey have values yet to be determined. A l l other pixels have already been processed and therefore have known values. It 9 X Figure 3: A Seven-pixel Context for Image Compression The seven pixels in the template give, at most, 128 different contexts. In each context, the number of occurrences of a white pixel and a black pixel is recorded. During coding, the pixels in the template can be matched to the contexts by concatenating the seven bits in the template to form a context number between 0 and 127. The context number is used to index the list of occurrence counts. The counts are then used to drive an arithmetic coder. For adaptive coding, both the encoder and decoder update a list o f counts as they move through the image. Each pixel is coded according to the current set of counts. Context models can be used for static and semi-static coding as well . For static coding, the counts can be predetermined based on an analysis of representative images. In the semi-static case, a separate pass first computes the counts and transmits the statistics before coding actually begins. 15 Chapter 2: Image Compression 10 i \u00E2\u0080\u00A2 X \u00E2\u0080\u00A2 1 I I L I I I I I I I 1.89 1.99 2.53 3.08 3.33 12 14 18 22 I I I I I I I \u00E2\u0080\u00A2 W k * x \u00E2\u0080\u00A2am r n l Lk&J tri^; JL*^ *.~A L ' ^ J l V.1C >I as well as the horizontal (x x) and vertical (xy) offsets for the new adaptive pixel. A s mentioned previously, the J B I G facsimile standard does not allow movement of the adaptive pixel in the vertical direction; therefore, the value of ty is always zero. Y A T is ah offset value with respect to the start of the following stripe so that, for example, i f the movement is to be effective on the initial line of a stripe, Y A T equals zero. 27 Chapter 3: The JBIG Bi-Level Image Compression Standard The N E W L E N marker segment changes the length of the image from the value specified in the B I H . Only one N E W L E N marker segment may appear within a BIE . This marker is necessary because in many applications such as fax, the length of the image is often unknown when encoding begins, making it impossible to specify image length in the B I H . The N E W L E N marker could reduce the image of the immediately preceding stripe due to an unexpected termination of the image or the use of only one stripe. Since the B I D must be terminated with an S D E , a null stripe (a stripe with no data) immediately follows the marker segment. The final type of floating marker segment is the C O M M E N T marker. The C O M M E N T marker specifies the length of the comment segment in bytes, and is followed immediately by the comment segment. 3.3.2.2 Stripe Data Entity (SDE) Field PSCD end-of-stripe marker Size (Bytes) Varies 2 Table 7: SDE Format A s shown in Table 7, each S D E contains a protected stripe coded data (PSCD) field and is terminated with an end-of-stripe marker. Two types of end-of-stripe marker exist: 28 Chapter 3: The JBIG Bi-Level Image Compression Standard S D N O R M and SDRST. S D N O R M indicates the normal stripe termination, which implies all state information is saved for the succeeding stripe. In contrast, S D R S T indicates that processing of the succeeding stripe should proceed as i f it were the top of the image. Furthermore, the adaptive pixel in the template should be restored to its default position i f necessary. A J B I G encoder compresses an image by producing stripe coded data (SCD) for each stripe in the image from its encoding engine. In order to protect against inadvertent appearances o f markers in the S C D , an encoder must insert S T U F F bytes after all occurrences o f E S C bytes in the S C D , thereby creating P S C D . A J B I G decoder must create S C D from P S C D by replacing all S T U F F markers (an E S C byte followed by a S T U F F byte) with a single E S C byte before feeding S C D to it decoding engine. 3.4 Adaptive Templates The J B I G standard uses model templates to define a neighbourhood surrounding a pixel to be coded. The values of the pixels covered by the template define the context ( C X ) of the particular target pixel to be coded. Adaptive probabilities are kept separately for each value of C X . In other words, pixels associated with different contexts are coded with different adaptive probabilities in the arithmetic coder. 29 Chapter 3: The JBIG Bi-Level Image Compression Standard X X x X X X x AO A127 ... A5 A4 A3 X X ? (a) X X X X X AO A127 A5 X X X X ? (b) Figure 5: Adaptive Templates in J B I G : (a) Three-Line Template, (b) Two-Line Template Figure 5 shows the two defined templates in the JBIG facsimile standard. The pixel denoted by \"?\" corresponds to the target pixel and is not part of the template. The pixels denoted by \" X \" correspond to ordinary pixels in the template, while the pixel denoted by \"AO\" corresponds to the adaptive (AT) pixel. The A T pixel is a special pixel in the template whose position is allowed to change during the encoding of an image. In the facsimile standard, the range o f allowable positions of the A T pixel, when moved from its default position, is limited to a maximum of 127 pixels to the left and on the same line of the target pixel. Since the A T pixel is not allowed to overlap with any regular pixels in the template, the number of allowable positions of the A T pixel in the two templates is slightly different. The allowable ranges of the A T pixel in the three-line and two-line templates are shown in Figure 5(a) and Figure 5(b), respectively, as extensions to the left * of the templates. The new position o f the A T pixel is specified in the A T M O V E marker 30 Chapter 3: The JBIG Bi-Level Image Compression Standard segment as an offset with respect to the target pixel. A n offset of zero denotes the default position. The movement of the A T pixel allows an encoder to capture a structure, such as halftone filter, that may exist in the image. When the target pixel is near the edges of an image, it is possible for the template to reference pixels out of the bounds of the image. These out-of-bounds pixels are assumed to have the same colour as the background colour of the image. Furthermore, the template may reference pixels across stripe boundaries. The actual value o f the pixel is used i f the pixel lies within the image; otherwise, the background colour is used. The order in which the pixels in the template are combined to form C X is not important. However, each pixel in the template must contribute to one bit in C X . Since both allowable templates in the J B I G facsimile standard contain 10 pixels, 1024 different values of C X can be generated. The implementation can choose any order in which to form C X , as long as a consistent scheme is used throughout the coding process. The decoding of an image does not necessarily use the same order to form C X as that used by the encoding process of the same image. 3.5 Typical Prediction Typical prediction allows certain pixels in the image to be coded without going through the arithmetic coder. Wi th typical prediction enabled, a line is said to be typical i f it is 31 Chapter 3: The JBIG Bi-Level Image Compression Standard exactly the same as the line immediately above it. A line-not-typically-predicted (LNTP) bit for each line indicates whether the line is typical or not. Prior to coding each line in the image, a J B I G encoder first checks whether the line is typical. Then, a pseudo-pixel S L N T P , which is equal to the change in L N T P value between the current and the preceding line, is encoded. If the current line is non-typical, pixels in the line are encoded sequentially through the arithmetic coder. On the other hand, pixels in a typical line are known to have exactly the same values as the pixels immediately above; their values, therefore, need not be encoded arithmetically. A J B I G decoder with typical prediction follows a similar process to find out whether each line is typical before deciding whether to arithmetically decode values in the line. Since the time needed to compare or copy the values of pixels is, in general, much shorter than the time needed to code pixels arithmetically, typical prediction can shorten the processing time needed to code most images. 0 0 1 1 1 0 0 1 0 1 ? (a) 0 1 1 0 0 1 0 1 0 1 ? (b) Figure 6: Reused Contexts for Coding the SLNTP Pseudo-pixel with (a) Three-Line Template and (b) Two-Line Template 32 Chapter 3: The JBIG Bi-Level Image Compression Standard Figure 6 shows the reused contexts for coding the S L N T P pseudo-pixel with the three-line and two-line templates. A \" 0 \" in the figure denotes the background colour, while a \" 1 \" denotes the foreground colour. When determining the value of S L N T P for the first line of the image, pixels above the image are assumed to have the background colour, and the value of L N T P in the preceding line is assumed to be one. 3.6 QM-Coder The compression engine that J B I G defines in its standard is called the QM-coder. The QM-coder is a finite precision implementation of the adaptive arithmetic coder for a binary source. A s described in section 2.1 , an arithmetic coder represents a sequence of data by sequentially dividing an interval into sub-intervals with sizes proportional to the probabilities of occurrences of each symbol in the sequence, then choosing a sub-interval based on the appearance of each symbol in the sequence. The examples given in section 2.1 represent an interval by keeping track of the upper and lower limits of the interval. A n equally effective way of representing an interval is to keep track of one endpoint and the size of the interval. Such an approach is adopted in the QM-coder. The QM-coder uses the C register to keep track of the base of the interval, and uses the A register to keep track of the size of the interval. 33 . . Chapter 3: The JBIG Bi-Level Image Compression Standard Instead of dealing directly with the Os and Is in the source, the QM-coder categorizes each symbol in the source as either a More Probable Symbol (MPS) or a Less Probable Symbol (LPS). The definition of whether a symbol is more probable is based on previous occurrences of symbols within the same context. In dividing an interval into sub-intervals, the interval for L P S is always above the interval for M P S in the QM-coder. Denoting the probability of occurrence of the L P S as Qe, the sizes of the L P S and M P S sub-intervals are A * Qe and A * (1 - Qe), respectively. A n occurrence of M P S , therefore, results in the following updates: C = C A = A * ( l - Q e ) = A - A * Q e . A n occurrence of L P S results in the following updates: C = C + A * ( l - Q e ) = C + A - A * Q e A = A * Qe. To avoid the computationally expensive multiplications in the updates o f the C and A registers, the QM-coder assumes that the value of A is close to one. B y making this assumption, the multiplication A * Qe can be approximated with Qe. The update equations, therefore, become: 34 Chapter 3: The JBIG Bi-Level Image Compression Standard For M P S : C = C A = A - Qe F o r L P S : C = C + A - Q e A = Qe. Figure 7 illustrates the interval structure in the QM-coder. LPS T o C + A - * C + A - Q E MPS CD 1 Figure 7: QM-coder Interval Structure In order to keep the assumption concerning A valid, the QM-coder always keeps the value in the A register in the range [0.5, 1], or [0x8000, 0x10000] in hexadecimal. Whenever the value of A falls below 0x8000 as a result of the update, its value is doubled recursively (via left shifts) until it falls back into the allowable range. To keep all parameters in sync, the value of C is doubled at the same time. This processing is called renormalization. The approximation made when updating the A register sometimes 35 Chapter 3: The JBIG Bi-Level Image Compression Standard makes the size of the L P S sub-interval larger than the size of the M P S sub-interval/ To avoid an inversion in interval sizes, the M P S and L P S updates are exchanged whenever such an inversion is detected. The bits shifted out of the C register as a result of the renormalization process make up the output of the QM-encoder. Periodically, the QM-coder produces a byte of data from the high order bits of the C register. These bits in the C register are subsequently cleared. In other words, the C register contains only the least significant bits of the QM-encoder's output at any time. Since the output of the encoder represents all the pixels being coded in a stripe, JBIG refers to the output as the stripe coded data (SCD). Instead of calculating probabilities of each symbol on the fly, the.QM-coder contains a static table that lists a pre-defined set of probabilities. Each value of Qe in the table corresponds to an index (I). The QM-coder keeps track of the values of I and M P S corresponding to each possible value of C X . Together, the values of I and M P S completely capture the adaptive probability estimate associated with the particular context. A t the top of the image, the indexes for all values of C X are initialized to 0, while the values of M P S are assumed to be the background colour. In addition to indexing a value o f Qe, I also indexes the values N M P S , N L P S and S W I T C H . N M P S and N L P S give, respectively, the next possible value of I when an M P S and an L P S are coded. The update to I happens only when renormalization occurs, 36 Chapter 3: The JBIG Bi-Level Image Compression Standard regardless of whether an M P S or an L P S is being coded. S W I T C H is a boolean value that indicates that the occurrences of L P S corresponding to a particular C X have happened often enough that its value should be assigned to M P S instead. The QM-coder tests the value of S W I T C H only when an update of I given by N L P S occurs. The operation of the QM-decoder mimics the encoder operation in many ways. In the decoder, the most significant bits of the S C D are shifted into the C register to form the base of the interval. B y checking whether the interval lies in the upper or lower sub-interval, the decoder determines whether the symbol being decoded is L P S or M P S . The C and A registers are then updated accordingly to reflect the base and size of the corresponding interval. Renormalization and the update o f I and M P S are performed in the same fashions as in the encoder. 37 Chapter 4: High Level Design of the JBIG Processing Engine Chapter 4 High Level Design of the JBIG Processing Engine With the J B I G standard introduced, the design of the J B I G processing engine is ready to be presented. This chapter provides a description of the high level design of the JBIG processing engine. The first section o f this chapter describes the intended use of the engine, along with its capabilities and limitations. The next section then presents the design of the J B I G processing engine by separating it into encoding and decoding parts. The design of each part of the engine is broken into function blocks and is illustrated with a functional block diagram. Subsections following the block diagrams describe in detail the functions and signal interfaces of all functional blocks. 4.1 Engine Capabilities The main functions of the J B I G processing engine are to compress and uncompress images according to the J B I G facsimile profile. In other words, the engine must be able to encode data from uncompressed images into JBIG compliant datastreams; likewise, the engine must be able to decode valid J B I G datastreams generated by other JBIG compression engines into uncompressed images. This section w i l l describe in detail the 38 Chapter 4: High Level Design of the JBIG Processing Engine intended use of the engine, the features of J B I G supported by the engine, and the interface of the engine. 4.1.1 Intended Use The intended use of the engine is as a co-processor for specific operations. Upon receiving a task for encoding or decoding a J B I G file, the host processor, or other supporting circuit, is responsible for setting up the J B I G engine correctly and passing appropriate resources to the engine for processing. Setting up the engine requires assigning values o f X D , Y D , U, M x , L R L T W O , V L E N G T H , T P B O N and background colour to the registers in the engine. Once the engine is in operation, the host processor is free to perform other tasks. The host processor has the option o f interrupting the engine's operations at any time. This option allows the host processor to re-claim resources for higher priority tasks, or to simultaneously process multiple images, using the task switching capability of the engine. 4.1.2 Supported J B I G Features A s mentioned previously, the original intent was to build an engine that supported the full range of all functionalities in the J B I G facsimile standard. However, in the design of the engine, support for certain features was dropped because these features either added too much to the complexity of the J B I G engine, or were more logically processed by the host processor instead of by the J B I G engine. 39 Chapter 4: High Level Design of the JBIG Processing Engine 4.1.2.1 BIH The B I H specifies all the information needed to set up the JBIG engine for processing. Since the host processor must set up the J B I G engine prior to each encoding or decoding task, support for the B I H is more appropriate when placed in the host processor instead of in the J B I G engine. The engine does not perform any processing on B I H ; the host processor must attach the 20-byte B I H at the beginning of the output stream for encoding. For decoding, the host processor must first process the B I H to set up the engine, and then feed only the B I D as input to the engine for processing. 4.1.2.2 Typical Prediction Hardware implementation of typical prediction in the encoding engine is expensive in terms of both circuit complexity and memory bandwidth. Recall from section 3.5 that typical prediction requires the J B I G encoder to check i f a line is exactly the same as the previous line to determine whether the line is typical. Pixels in a line are encoded only after determining i f the line is non-typical. For implementation, the J B I G encoder must first read pixels in the current and previous lines for comparison. Then, i f a mismatch is found, the encoder moves back to the beginning of the line, and re-reads pixels in the current line and, depending on the template being used, the previous 1 or 2 lines for arithmetic encoding. Although the pixel-to-pixel comparison circuit is simple, the circuitry needed to re-fetch data at the beginning of each non-typical line adds greatly to 40 Chapter 4: High Level Design of the JBIG Processing Engine the complexity of the engine. In addition, the amount of memory access might also increase significantly when the majority of lines in an image are non-typical. Given these disadvantages, the encoder in the J B I G engine does not support typical prediction, even though the feature can improve compression speed for most images. Since an encoder is considered J B I G compliant as long as it produces outputs that other J B I G decoders can decode, dropping support for typical prediction does not affect the ability of the engine to produce J B I G compliant data. In contrast, the decoding engine must support typical prediction because of the possibility that other J B I G encoders may incorporate this option in their datastream. Fortunately, implementation of typical prediction in the decoder requires minimal additional hardware and does not increase the required memory bandwidth. B y decoding a single bit in the J B I G datastream, the engine can find out whether the upcoming line is typical or not. For typical lines, the engine reads pixels in the line above and quickly copies them onto the current line as decoded pixels. For non-typical lines, the engine proceeds normally and decodes each pixel using the arithmetic decoder. Unlike the encoder, the J B I G decoder does not need to re-fetch data from the beginning of a line even with the typical prediction option enabled. Implementing typical prediction requires only the extra hardware to copy data from one line to another, which is relatively simple. 41 Chapter 4: High Level Design of the JBIG Processing Engine 4.1.2.3 Markers The J B I G decoding engine supports all markers defined in the J B I G standard for the same reason as the case for typical prediction. However, the J B I G encoding engine does not support several markers for various reasons. Amongst the seven defined markers in the J B I G standard, S D N O R M , S D R S T and N E W L E N are fully supported by the J B I G encoding engine. S D N O R M and S D R S T separate the floating marker segments from the S D E ' s in the B I D and must be supported by the J B I G engine. N E W L E N is another very important marker in the JBIG facsimile profile, because the true length of an image is often unknown at the beginning of processing in such applications. Since one of the primary usages of the J B I G engine is digital facsimile, the J B I G encoding engine also supports the N E W L E N marker. On the other hand, the encoding engine does not support the A T M O V E , A B O R T , C O M M E N T and R E S E R V E D markers. Although A T M O V E can improve compression for some halftone images, the logic for deciding when and where the adaptive pixel should be moved is too complicated for hardware implementation. A B O R T is not supported because detection of encoding failures must be done outside the engine using mechanisms such as timeouts. Therefore, it is the host processor's responsibility to insert A B O R T markers upon discovering encoding failures. C O M M E N T and R E S E R V E D are not supported because there is simply no use for them in the encoding engine, although the host processor may insert these markers during post-processing. 42 Chapter 4: High Level Design of the JBIG Processing Engine 4.1.2.4 Variables'Range To reduce the size of the JBIG engine, support of certain variables in the J B I G standard has been reduced to a smaller, and more reasonable range. J B I G specifies the X D , Y D and L 0 variables as 32-bit values. In most real-life applications, however, 16 bits are sufficient to represent the values of these variables. Therefore, the engine needs only to . support these variables up to the maximum specified by a 16-bit value. In other words, the engine can only encode and decode images with widths and lengths of less than 65535 pixels. Since the variable Y A T in the A T M O V E marker and the variable Y D in the N E W L E N marker are associated with the stripe length and image length respectively, support for these variables is also reduced to the 16-bit range. Similarly, the supported maximum length of comments is also reduced to a 16-bit value. 4.1.3 D a t a Access The J B I G engine core does not contain any memory accessing circuitry. Instead, the engine relies on external circuits to move data between itself and memory. This approach allows for flexibility when designing the memory access circuitry to suit the applications and resources used with the J B I G engine. Recall that J B I G processing requires the use of pixels from the preceding 1 or 2 lines for construction of C X . These pixels, by definition,, have previously been either read or written by the J B I G engine. If the available memory bandwidth is small, the designer can choose to cache inputs of the encoder and outputs of the decoder to reduce access to main memory. On the other hand, i f there are significant 43 Chapter 4: High Level Design of the JBIG Processing Engine numbers o f context switching, the designer can choose to access all data directly from main memory to reduce the overhead associated with context switching. Data moves in and out of the engine via data I/O ports provided by the J B I G engine. In addition to the main data input and output ports, the JBIG engine also has two reference input ports for reading data from the preceding two lines in the uncompressed image. A l l I/O ports are 16 bits wide with separate handshake signals for data communications. 4.1.4 Context Switching With help from the host processor, the J B I G engine can process multiple images simultaneously by switching between multiple tasks. To perform a task switch, the host processor can interrupt processing of the J B I G engine at any time. Once an acknowledgement is received, the host processor should first save the internal state of the engine for the current image. The previously saved state for the next image should then be restored to the engine. After that, the host processor can enable the J B I G engine to continue processing the image from the previously interrupted state. The J B I G engine provides an interface to access internal registers so that the engine state can be saved and restored. A l l registers that affect the engine state can be accessed via their assigned addresses. Each address allows access of up to 16 bits of the engine state. Since parameters such as image width and image length are considered part of the engine 44 Chapter 4: High Level Design of the JBIG Processing Engine state, the initial setup of the engine is done via the same interface, with different addresses assigned to various image parameters. 4.2 Engine Architecture The J B I G engine can be viewed as two separate entities providing different functions -encoding and decoding. Although combining the encoder and decoder can reduce engine size, having separate entities allows the encoding and decoding units to process simultaneously. For clarity, the encoder and decoder architectures are described separately. 4.2.1 Encoder Design RefO in RefO in handshakes Ref1 in ^ Ref1 in handshakes ^, Data in Data in handshakes Image controls Engine controls Data out CX Construction Unit ^ Data out handshakes ^ 1 CX W w ^ operations handshakes fc w \u00E2\u0080\u0094 , w )\u00E2\u0080\u0094 w f \u00E2\u0080\u00A2w Reg data Reg address Reg write w reset units Reg data Reg address Reg write Qm-Encoder Unit SRAM read address JBIG Processing and Control Unit SRAM read strobe SRAM data in SRAM write address SRAM write strobe SRAM data out Reg data in Reg data out Reg address w Reg select Reg write Figure 8: Block Diagram of J B I G Encoder 45 Chapter 4: High Level Design of the JBIG Processing Engine The J B I G encoder is divided into three main functional modules - the JBIG Processing and Control Unit, the C X Construction Unit and the Q M - E n c o d e r Unit. Figure 8 illustrates the block diagram of the JBIG encoder. 4.2.1.1 JBIG Processing and Control Unit A s its name suggests, the J B I G Processing and Control Unit has two main functions -management of the J B I G datastream and control of operations in the entire engine. It manages the J B I G datastream by attaching appropriate markers and stuff bytes to the coded data produced by the Q M - E n c o d e r Unit. Controls of the J B I G engine include accepting image parameters, initializing the C X Construction Unit and Q M - E n c o d e r Unit at the start of each stripe, overseeing all I/O handshakes to pause and resume operations o f the engine when necessary, decoding internal register addresses for context switching, and keeping track of all variables related to the image. Referring to Figure 8, the I/O's on the right hand side of the J B I G Processing and Control Unit comprise the interface for context switching and engine setup. The host processor can access all registered data in the engine via this interface. Data is relayed from/to the C X Construction Unit and the Q M - E n c b d e r Unit when accessing registers in these units. To set up the engine, the host processor must write the image parameters X D , Y D , LO, M X , L R L T W O , V L E N G T H , T P B O N and background colour to their assigned addresses before enabling the engine for encoding. 46 Chapter 4: High Level Design of the JBIG Processing Engine The engine control I/O on'the left hand side of the JBIG Processing and Control Unit in Figure 8 includes an engine enable signal to start and pause processing of the engine, a context switch signal to prepare the engine for context switching, and an engine acknowledge signal to indicate that the engine is ready for context switching or that the image is completed encoded. Data out is the main data output o f the JBIG encoding engine. Encoded J B I G image data are written through this 16-bit interface. The handshaking signals for Data out include a write strobe from the engine and a busy signal from the outside to temporarily pause the engine from writing data in case external memory is busy. Image parameters that are passed to the C X Construction Unit include image width, stripe length, template selection, typical prediction, usage of N E W L E N marker and background colour. Image length is not part of the output because the C X Construction Unit is concerned only with stripe information, not image information. However, the J B I G Processing and Control Unit does indicate whether the last line of the image is being encoded as part of the image information in case the image length does not divide evenly by the stripe length. Other image information and control signals are exchanged between the J B I G Processing and Control Unit and the C X Construction Unit during the operation of the engine. The C X Waiting I/O signal indicates whether the C X Construction Unit is currently waiting for data from one o f its inputs. I f the engine is waiting for any data I/O, the J B I G Processing and Control Unit w i l l disable all the operations of the engine via the 47 Chapter 4: High Level Design of the JBIG Processing Engine CX-uni t Enable and the QM-unit Enable signal until data become available. The C D signal from QM-Encoder Unit is the main output of the arithmetic encoder. The C D write signal indicates that data on C D are ready for reading. The J B I G Processing and Control Unit attaches stuff bytes and markers to the data on C D to produce JBIG compliant image data. Table 8 summarizes the signals connected to the JBIG Processing and Control Unit. | Signal DIR Connection Description Clock IN Outside Synchronous Clock for the engine Reset IN Outside Asynchronous reset Reg Data In IN Outside Data input for internal registers Reg Data Out OUT Outside Data output for internal registers Reg Address IN Outside Address lines for internal registers Reg Select IN Outside Selected register writes data on Reg Data Out when asserted Reg Write IN Outside Selected registers read data at rising clock edge on Reg Data In when asserted CX Reg Data IN CX Unit Registered data from CX Construction Unit QM Reg Data IN QM Encoder Registered data from QM-Encoder Unit Internal Reg Data OUT CX Unit and QM Encoder Relays data on Reg Data In Internal Reg Writes OUT CX Unit and QM Encoder Write enable signals for the selected register Engine EN IN Outside Start/Pause operation of the engine Context SW IN Outside Prepares engine for context switching Engine Ack OUT Outside Indicates engine is ready for context switching or image is completely encoded Data Out OUT Outside Main output of the encoder, outputs BID Data Out Wr OUT Outside Write enable signal for Data out Data Out FFF IN Outside Engine must not output data when asserted Image Width OUT CX Unit Image parameter 48 Chapter 4: High Level Design of the JBIG Processing Engine j Signal DIR Connection Description Stripe Width OUT CX Unit Image parameter LWLTWO OUT CX Unit Image parameter: two line template on/off Vlength OUT CX Unit Image parameter: NEWLEN marker in use Background colour OUT CX Unit Image parameter SDRST OUT CX Unit Indicates the SDRST marker is used instead of SDNORM Current Line OUT CX Unit Currently encoding line number of the image Last Line OUT CX Unit Indicates the last lime of the image is being encoded End of Line IN CX Unit Indicates the current line is completed encoded Stripe Completed IN CX Unit Stripe is completed encoded End of Page IN CX Unit Image data is read entirely Unit Waiting I/O IN CX Unit Indicates CX Construction is waiting for input data CX Unit EN OUT CX Unit Start/Pause operation of CX Construction Unit Reset Units OUT CX Unit and QM Encoder Resets the CX Construction Unit and QM-Encoder Unit at the start of each stripe QM Unit EN OUT QM Encoder Start/Pause operation of QM-Encoder Unit CD IN QM Encoder Compressed data from arithmetic encoder CD write IN QM Encoder Write enable signal for CD Table 8: Interface Signals of J B I G Processing and Control Unit in J B I G Encoder 4.2.1.2 CX Construction Unit The C X Construction Unit is responsible for controlling the operation of the QM-Encoder Unit according to the encoder flow diagram defined by J B I G and providing inputs to the QM-Encoder Unit for arithmetic encoding. Since the arithmetic encoder requires the value o f the pixel (D) and its associated context ( C X ) for encoding, the main function of the C X Construction Unit is to produce values of C X and D for each pixel in the image. 49 Chapter 4: High Level Design of the JBIG Processing Engine Since the image is encoded on a stripe-by-stripe basis, the C X Construction Unit keeps track of all stripe related information. In fact, the C X Construction Unit maintains the X and Y position of the currently encoding pixel with respect to the start of the stripe. The exact location of the pixel is needed for two reasons: first, the starting and ending points of the stripe are easily found out; second, the construction of C X depends highly on the location of the associated pixels (see section 3.4). The C X Construction Unit tells the QM-Encoder Unit to perform the I N I T E N C procedure at the start of each stripe. When I N I T E N C is completed, the C X Construction Unit prepares itself to produce C X and D for the first pixel of the image, and tells QM-Encoder Unit to perform the E N C O D E procedure. After all the pixels in the stripe have been encoded, the C X Construction Unit instructs the QM-Encoder Unit to perform the F L U S H procedure. Encoding for the stripe is completed when the F L U S H procedure is finished. To construct C X , the C X Construction Unit covers the image with the selected two- or three-line model template. The value of each bit in C X is determined by selecting either the value of the pixel covered by the template or a hard-coded value. The value of the template-covered pixel is used in the normal case, whereas a hard-coded value is used when the pixel is out o f the bounds o f the image. The template is moved one pixel at a time in raster scan order starting from the top left hand corner of the image. A s the template moves through the image, image data are read from external memory when necessary. The J B I G engine has separate interfaces for reading pixels on different lines in the template. Since the true image length may not be known during engine setup, an 50 Chapter 4: High Level Design of the JBIG Processing Engine image control signal indicates the end of the image so that the J B I G engine can terminate properly. Table 9 summarizes the interface for the C X Construction Unit. Signal DIR Connection Description Clock IN Outside Synchronous Clock for the engine Reset IN Control Unit Asynchronous reset Reg Data OUT Control Unit Internal registered data of CX Construction Unit Internal Reg Data IN Control Unit Data from Reg Data In Internal Reg Writes IN Control Unit Write enable signals for the selected register Data In IN Outside Main input of the encoder for reading pixel on the current line Data In Rd OUT Outside Read enable signal for Data In Data In FFE IN Outside Indicates data is not available on Data In. Engine must not read data when asserted RefO In IN Outside Secondary input of the encoder for reading pixel on the line directly above the current line RefO In Rd OUT Outside Read enable signal for RefO In RefO In FFE IN Outside Indicates data is not available on Refl) In. Engine must not read data when asserted RefO Enable OUT Outside Enable usage of RefO In Stream Refl In IN Outside Secondary input of the encoder for reading pixel on the second line above the cunent line Refl In Rd OUT Outside Read enable signal for Refl In Refl In FFE IN Outside Indicates data is riot available on Refl In. Engine must not read data when asserted Refl Enable OUT Outside Enable usage of Refl In Stream Unit EN IN Control Unit Start/Pause operation of the CX Construction Unit Context SW IN Control Unit Prepares engine for context switching Context SW ready OUT Control Unit Indicates unit is ready for context switching Image Width IN Control Unit Image parameter Stripe Width IN Control Unit Image parameter LWLTWO IN Control Unit Image parameter: two line template on/off Vlength IN Control Unit Image parameter: NEWLEN marker in use Background colour IN Control Unit Image parameter 51 Chapter 4: High Level Design of the JBIG Processing Engine 1 Signal DIR Connection Description SDRST IN Control Unit Indicates the SDRST marker is used instead of SDNORM Current Line IN Control Unit Currently encoding line number of the image Last Line IN Control Unit Indicates the last lime of the image is being encoded End of Line OUT Control Unit Indicates the current line is completed encoded Stripe Completed OUT Control Unit Stripe is completed encoded End of Page OUT Control Unit Data of image is read entirely Unit Waiting I/O OUT Control Unit Indicates the unit is waiting for input data CX OUT QM-Encoder Value of CX D OUT QM-Encoder Value of pixel to be encoded INITENC OUT QM-Encoder Calls INITENC procedure Clear CX Memory OUT QM-Encoder Indicates INITENC to set all values of CX memory to 0 ENCODE OUT QM-Encoder Calls ENCODE procedure FLUSH OUT QM-Encoder Calls FLUSH procedure QM Ready IN QM-Encoder Indicates procedure call is completed 9: Interface Signals of C X Construction Unit in J B I G Encoder 4.2.1.3 QM-Encoder Unit The QM-Encoder Unit performs all the arithmetic coding operations needed to compress an image. It executes three main procedures, I N I T E N C , E N C O D E and F L U S H , when encoding each stripe of the image. Prior to encoding each stripe, the QM-Encoder Unit must be properly reset. The first procedure call after reset must be I N I T E N C . Then, the E N C O D E procedure must be executed for each pixel in the stripe. The F L U S H procedure terminates the encoding procedure after all the pixels in the stripe have been encoded. Once F L U S H is executed, the QM-Encoder Unit w i l l not accept any procedure calls until it is reset to its initial state. 52 Chapter 4: High Level Design of the JBIG Processing Engine The QM-coder requires 8 bits of memory space, 7 bits for I and 1 bit for M P S , for each value of C X . Since C X is a 10-bit value, a total of 1 kilobyte of memory is needed. In addition, the processing of each pixel in the QM-coder requires one read operation and one write operation to memory. To achieve the targeted engine average throughput of close to one pixel per clock, a dual-port R A M that allows reading and writing operations to occur simultaneously is needed. The QM-Encoder Unit breaks down the processing o f individual pixels into multiple stages in such a way that encoding can be pipelined. Look-ahead circuits are used to prevent data dependency problems. The design o f the QM-Encoder does not include the dual-port R A M to provide flexibility during implementation. Instead of using a single 1-kilobyte R A M , the designer can choose to connect multiple memories to the engine v ia a memory selection circuit. This approach allows for quick swapping between images by simply selecting the appropriate memory space for different images. Referring to Figure 8, the signals on the right hand side of the QM-Encoder Unit provide the interface with the dual-port R A M . The interface uses 8-bit address lines and 32-bit data lines. The R A M must have a fast response with an asynchronous read interface and a synchronous write interface. The clock rate of the synchronous write interface must be the same as that of the engine. The QM-Encoder Unit provides an interface for executing its three supported procedures: I N I T E N C , E N C O D E and F L U S H . For each pixel, execution, of E N C O D E must be 53 Chapter 4: High Level Design of the JBIG Processing Engine accompanied by its associated input parameters of C X and D. A n acknowledge signal from the QM-Encoder Unit indicates that it is ready to execute the next procedure. A s the unit encodes data, it sends the encoded data, 8 bits at a time, to the J B I G Processing and Control Unit where data are further processed for output. Table 10 summarizes the interface for the QM-Encoder Unit. | Signal DIR Connection Description Clock IN Outside Synchronous Clock for the engine Reset IN Control Unit Asynchronous reset Reg Data OUT Control Unit Internal registered data of QM-Encoder Unit Internal Reg Data IN Control Unit Data from Reg Data In Internal Reg Writes IN Control Unit Write enable signals for the selected register SRAM Data In IN Outside Data input from dual port RAM SRAM Read OUT Outside Read enable signal for dual port RAM SRAM Read Addr OUT Outside Address for reading data from dual port RAM SRAM Data Out OUT Outside Data output to dual port RAM SRAM Write OUT Outside Write enable signal for dual port RAM SRAM Write Addr OUT Outside Address for writing data to dual port RAM Unit EN IN Control Unit Start/Pause operation of the CX Construction Unit CD OUT Control Unit Compressed data CD write OUT Control Unit Write enable signal for CD CX IN CX Unit Value of CX D IN CX Unit Value of pixel to be encoded INITENC IN CX Unit Calls INITENC procedure Clear CX Memory IN CX Unit Indicates INITENC to set all values of CX memory to 0 ENCODE IN CX Unit Calls ENCODE procedure FLUSH IN CX Unit Calls FLUSH procedure QM Ready OUT CX Unit Indicates procedure call is completed Table 10: Interface Signals of QM-Encoder Unit 54 Chapter 4: High Level Design of the JBIG Processing Engine 4.2.2 D e c o d e r Des ign RefO in RefO in handshakes Refl in Refl in handshakes Data out ^ Data out handshakes ^, Image controls CX Construction Unit Engine controls Data in Data in handshakes 1 CX ^ ^ D write ^ ^ operations handshakes ^ w \u00E2\u0080\u0094 \u00C2\u00AB t \u00E2\u0080\u0094w \u00E2\u0080\u0094 \u00C2\u00AB >\u00E2\u0080\u0094 w \u00E2\u0080\u0094 > k i\u00E2\u0080\u0094 Reg data Reg address Reg write w reset units Reg data Reg address Reg write Qm-Decoder Unit SRAM read address JBIG Processing and Control Unit SRAM read strobe SRAM data in SRAM write address SRAM write strobe SRAM data out <\u00E2\u0080\u0094 Reg data out Reg address Reg select ., Reg write Figure 9: Block Diagram of J B I G Decoder Like the J B I G Encoder, the J B I G Decoder is divided into three main functional modules - the J B I G Processing and Control Unit, the C X Construction Unit and the Q M -Decoder Unit. Figure 9 illustrates the block diagram of the JBIG encoder. 4.2.2.1 JBIG Processing and Control Unit A s in the encoding engine, the J B I G Processing and Control Unit in the decoding engine manages the J B I G datastream and controls the operations of the entire engine. The 55 Chapter 4: High Level Design of the JBIG Processing Engine controlling tasks the unit performs are exactly the same as its encoding counterpart, whereas it performs the exact opposite tasks of the encoder in the area of datastream management. In the J B I G decoder, the J B I G Processing and Control Unit reads B I D from an external source, processes the floating marker segments in B I D , and provides the QM-Decoder Unit with S C D by removing all stuff bytes and end-of-stripe markers. The unit also appends 0x00 bytes to the end of S C D for each stripe when necessary. Compared to the same unit in the encoder, the interface of the J B I G Processing and Control Uni t in the decoder is very similar in terms of similarities in their functionalities. The interface signals for accessing internal data are exactly the same as those in the encoder. The directions of several signals are reversed as a result of the opposite directions of data flow. A s opposed to reading coded data from the arithmetic encoding unit and writing data to external memory, the decoder reads data from the engine's main data input and sends data to the arithmetic decoder for image processing. Because adaptive pixel movement is supported in the decoder, the current position of the adaptive pixel is passed to the C X Construction Unit as an image parameter. To support typical prediction, the J B I G Processing and Control Unit provides storage space for the L N T P value. The L N T P value is stored in this unit instead of in the C X Construction Unit because it is an image variable whose value is normally carried across stripes. Table 11 summarizes the signals connected to the J B I G Processing and Control Unit. 56 Chapter 4: High Level Design of the JBIG Processing Engine | Signal DIR Connection Description Clock IN Outside Synchronous Clock for the engine Reset IN Outside Asynchronous reset Reg Data In IN Outside Data input for internal registers Reg Data Out OUT Outside Data output for internal registers Reg Address IN Outside Address lines for internal registers Reg Select IN Outside Selected register writes data on Reg Data Out when asserted Reg Write IN Outside Selected registers read data at rising clock edge on Reg Data In when asserted CX Reg Data IN CX Unit Registered data from CX Construction Unit QM Reg Data IN QM Encoder Registered data from QM-Encoder Unit Internal Reg Data OUT CX Unit and QM Decoder Relays data on Reg Data In Internal Reg Writes OUT CX Unit and QM Decoder Write enable signals for the selected register Engine EN IN Outside Start/Pause operation of the engine Context SW IN Outside Prepares engine for context switching Engine Ack OUT Outside Indicates engine is ready for context switching or image is completely encoded Data In IN Outside Main input of the decoder, provides BID Data In Rd OUT Outside Read enable signal for Data In Data In FFE IN Outside Engine must not read data when asserted Image Width OUT CX Unit Image parameter Stripe Width OUT CX Unit Image parameter Tx OUT CX Unit Image parameter: current position of adaptive pixel LWLTWO OUT CX Unit Image parameter: two line template on/off TPBON OUT CX Unit Image parameter: typical prediction on/off Background colour OUT CX Unit Image parameter SDRST OUT CX Unit Indicates the SDRST marker is used instead of SDNORM Current Line OUT CX Unit Currently decoding line number of the image Last Line OUT CX Unit Indicates the last lime of the image is being decoded LNTP IN CX Unit Decoded LNTP value for current line LNTP write IN CX Unit Write enable signal for LNTP 57 Chapter 4: High Level Design of the JBIG Processing Engine Signal D1R Connection Description prev LNTP OUT CX Unit Stored LNTP value for the previous line End of Line IN CX Unit Indicates the current line is completed decoded Stripe Completed IN CX Unit Stripe is completed decoded Unit Waiting I/O IN CX Unit Indicates CX Construction is waiting for input data CX Unit EN OUT CX Unit Start/Pause operation of CX Construction Unit Reset Units OUT CX Unit and QM Decoder Resets the CX Construction Unit and QM-Decoder Unit at the start of each stripe QM Unit EN OUT QM Decoder Start/Pause operation of QM-Decoder Unit CD OUT QM Decoder Compressed data to arithmetic decoder CD read IN QM Decoder Read enable signal for CD Table 11: Interface Signals for JBIG Processing and Control Unit in JBIG Decoder 4.2.2.2 CX Construction Unit The C X Construction Unit in the J B I G decoder controls the flow of the arithmetic decoder and provides the C X input to the D E C O D E procedure. In addition, the unit also collects sequentially decoded pixels from the QM-Decoder Unit, and outputs them 16 bits at a time to external memory. Buffers in the unit provide temporary storage space for C X construction. Pixels on the current line are received from QM-Decoder Unit, while pixels on the reference lines are read from external memory. Information stored in the C X Construction Unit is stripe-specific and must be reset prior to decoding pixels in each stripe. B y definition, the reference lines provide previously decoded pixels to the engine for use during decoding. The J B I G engine provides enable signals for each of the reference line 58 Chapter 4: High Level Design of the JBIG Processing Engine inputs. External memory controllers use these enable signals to feed reference data to the engine at the appropriate time to avoid accessing reference data before the engine produces the reference data. The JBIG engine turns on these enable signals after it has sent sufficient data to its output. A t the start o f each stripe, the C X Construction Unit requests that the QM-Decoder Unit to execute the I N I T D E C procedure. Subsequently, the unit uses the image parameters from the J B I G Processing and Control Unit to construct C X to decode each pixel in the stripe. A s the engine moves through the stripe, the C X Construction Unit communicates image information to the JBIG Processing and Control Unit. Decoding of a stripe continues until the QM-Decoder Unit produces the value of the last pixel in the stripe. Table 12 summarizes the interface signals of the C X Construction Unit. I Signal 1)1 R Connection Description Clock IN Outside Synchronous Clock for the engine Reset IN Control Unit Asynchronous reset Reg Data OUT Control Unit Internal registered data of CX Construction Unit Internal Reg Data IN Control Unit Data from Reg Data In Internal Reg Writes IN Control Unit Write enable signals for the selected register Data Out OUT Outside Main output of the decoder for writing decoded pixels Data Out Wr OUT Outside Write enable signal for Data Out Data In FFF IN Outside Indicates external storage is not available. Engine must not write data when asserted RefO In IN Outside Secondary input of the Decoder for reading pixel on the line directly above the current line RefO In Rd OUT Outside Read enable signal for RefO In 59 Chapter 4: High Level Design of the JBIG Processing Engine | Signal DIR Connection Description RefO In FFE IN Outside Indicates data is not available on RefO In. Engine must not read data when asserted RefO Enable OUT Outside Enable usage of RefO In Stream Refl In IN Outside Secondary input of the decoder for reading pixel on the second line above the current line Refl In Rd OUT Outside Read enable signal for Refl In Refl In FFE IN Outside Indicates data is not available on Refl In. Engine must not read data when asserted Refl Enable OUT Outside Enable usage of Refl In Stream Unit EN IN Control Unit Start/Pause operation of the CX Construction Unit Image Width IN Control Unit Image parameter Stripe Width IN Control Unit Image parameter Tx IN Control Unit Image parameter: current position of the adaptive pixel LWLTWO IN Control Unit Image parameter: two line template on/off TPBON IN Control Unit Image parameter: typical prediction on/off Background colour IN Control Unit Image parameter SDRST IN Control Unit Indicates the SDRST marker is used instead of SDNORM Current Line IN Control Unit Currently decoding line number of the image Last Line IN Control Unit Indicates the last lime of the image is being decoded End of Line OUT Control Unit Indicates the current line is completed decoded Stripe Completed OUT Control Unit Stripe is completed encoded Unit Waiting I/O OUT Control Unit Indicates the unit is waiting for input data LNTP Out Control Unit Decoded LNTP value for current line LNTP write OUT Control Unit Write enable signal for LNTP prev LNTP IN Control Unit Stored LNTP value for the previous line CX OUT QM-Decoder Value of CX D IN QM-Decoder Value of decoded pixel D write IN QM-Decoder Write enable signal for D INITDEC OUT QM-Decoder Calls INITDEC procedure Clear CX Memory OUT QM-Decoder Indicates INITDEC to set all values of CX memory to 0 DECODE OUT QM-Decoder Calls DECODE procedure QM Ready IN QM-Decoder Indicates unit is ready for next procedure call 60 Chapter 4: High Level Design of the JBIG Processing Engine Signal DIR Connection Description NOP OUT QM-Decoder Indicates no operations for the pipeline CX_0_valid OUT QM-Decoder Indicates validity of bit 0 in CX CX_l_valid OUT QM-Decoder Indicates validity of bit 1 in CX Table 12: Interface Signals of C X Construction Unit in J B I G Decoder 4.2.2.3 QM-Decoder Unit The QM-Decoder Unit performs al l the arithmetic decoding operations to determine the value of each pixel from the encoded data. It has an interface to execute the two main procedures I N I T D E C , D E C O D E in the decoding algorithm. Like the QM-encoder Unit, the QM-decoder Unit provides an interface to access 1 kilobyte of external dual port R A M via 8-bit address lines and 32-bit data lines. Prior to decoding each stripe, the QM-Decoder Unit must be properly reset. The first procedure call after reset must be I N I T D E C . The unit accepts D E C O D E procedure calls only after the completion o f I N I T D E C . Since the processing of each pixel is done in a multi-stage pipeline, the QM-Decoder Unit continues to accept D E C O D E procedure calls for subsequent pixels before producing the value of a pixel decoded earlier. This pipelining behaviour has two implications. First, the least significant bits of C X produced by the C X Construction Unit may not be valid because the values of the pixels are not known in time. Instead, the QM-Decoder Unit determines the final value of C X after it has decoded previous pixels. Second, in the case of typical prediction, the L N T P bit must be determined before deciding whether the pixels need to be decoded 61 Chapter 4: High Level Design of the JBIG Processing Engine arithmetically. A s a result, the QM-Decoder Unit must execute N O P operations in the pipeline between the beginning and completion of decoding the L N T P bit. Table 13 summarizes the interface signals of the QM-Decoder Unit. 1 Si\u00C2\u00ABnal 1)1 K Connection Description Clock IN Outside Synchronous Clock for the engine Reset IN Control Unit Asynchronous reset Reg Data OUT Control Unit Internal registered data of QM-Encoder Unit Internal Reg Data IN Control Unit Data from Reg Data In Internal Reg Writes IN Control Unit Write enable signals for the selected register SRAM Data In IN Outside Data input from dual port RAM SRAM Read OUT Outside Read enable signal for dual port RAM SRAM Read Addr OUT Outside Address for reading data from dual port RAM SRAM Data Out OUT Outside Data output to dual port RAM SRAM Write OUT Outside Write enable signal for dual port RAM SRAM Write Addr OUT Outside Address for writing data to dual port RAM Unit EN IN Control Unit Start/Pause operation of the CX Construction Unit CD IN Control Unit Compressed data CD read IN Control Unit Read enable signal for CD INITDEC IN CX Unit Calls INITDEC procedure Clear CX Memory IN CX Unit Indicates INITDEC to set all values of CX memory to 0 DECODE IN CX Unit Calls DECODE procedure CX IN CX Unit Value of CX D OUT CX Unit Value of decoded pixel D write OUT CXUnit Write enable signal for D QM Ready OUT CX Unit Indicates ready to accept next procedure call NOP IN CX Unit Indicates no operations for the pipeline CX_0_valid IN CX Unit Indicates validity of bit 0 in CX CX_l_valid IN CX Unit Indicates validity of bit 1 in CX Table 13: Interface Signals of QM-Decoder Unit in JBIG Decoder 62 Chapter 5: Detailed Description of the JBIG Processing Engine Chapter 5 Detailed Description of the JBIG Processing Engine This chapter presents a detailed description of the J B I G processing engine. A s mentioned in the previous chapter, the engine is broken into three functional blocks: the JBIG Processing and Control Unit, the C X Construction Unit and the QM-Coder Unit. The first section in this chapter describes the architecture of the QM-Coder Unit. The design of the multi-stage pipeline used in the QM-Coder Unit is first described, followed by a detailed description o f the structures o f the QM-Encoder Unit and o f the QM-Decoder Unit. The second section describes in detail how the C X Construction Unit produces the. context of each pixel in the image. Finally, the third section describes the functions of the J B I G Processing and Control Unit, as well as the state transitions during the processing of each image. 5.1 QM-Coder Unit The QM-Coder Unit (QM-Encoder in the encoding engine and QM-Decoder in the decoding engine) performs all the arithmetic processing in the J B I G engine. A s the most complex unit of the J B I G engine, the design of the QM-Coder Unit is critical to the performance o f the engine. Since the QM-Coder Unit is the hardware implementation of 63 Chapter 5: Detailed Description of the JBIG Processing Engine the QM-coder algorithms provided by the JBIG, the algorithms are carefully studied to exploit as much parallelism as possible for optimal performance. The result is a multi-stage pipeline design that provides an average throughput of close to one pixel per clock cycle. In both the J B I G encoder and decoder, the QM-coder unit comprises two subunits - the adapter and the coder. When the engine is in operation, the adapter manages the adaptive statistics, while the coder performs the arithmetic and logical operations to generate the appropriate output. The operations of the two subunits are pipelined to hide the memory access latency and to maximize performance. 5.1.1 P i p e l i n e d Des ign Regarding the J B I G encoding and decoding algorithms, one can break down the processing of each pixel into four stages as illustrated in Figure 10. Upon receiving a request to code a pixel, the engine first reads the values M P S and I corresponding to the given C X from memory. M P S is a 1-bit value that indicates the colour of the more probable symbol for the pixel; I is a 7-bit value that indexes the probability estimation table which contains the numeric values needed for coding. Together, they capture the adaptive probability estimate associated with a particular context. Using I as index, the engine then reads the values Qe, N M P S , N L P S and S W I T C H from an internal R O M table. Qe is the size of the L P S sub-interval; N L P S and N M P S give the next probability 64 Chapter 5: Detailed Description of the JBIG Processing Engine estimation state for an observation of the L P S and the M P S respectively when renormalization is necessary; lastly, S W I T C H indicates whether the value of M P S should be inverted whenever a change in I given by N L P S is necessary. After reading from the R O M table, the engine then performs the arithmetic and logical operations necessary to process the particular pixel, and updates the values registers and memory as needed. Depending on the results of the arithmetic and logical operations, renormalization of the A and C registers may be needed. The J B I G engine renormalizes and updates appropriate registers when necessary to complete the processing of a pixel. Read RAM Read ROM Code Renorm 1 read l(CX) from rr , MPS(CX) lemory read Qe(l), NMPS(I), NLPS(I), SWITCH(I) from ROM table perform arithmetic operations, update A, C registers and l(CX), MPS(CX) to memory r if necessary renormalize and update A, C, CT registers Figure 10: Stages of Processing for Each Pixel in QM-Coder 65 Chapter 5: Detailed Description of the JBIG Processing Engine To improve the throughput of the engine, one can create a multi-stage pipeline design so that the J B I G engine can make the processing of consecutive pixels overlap. The resources used in the first three stages illustrated in Figure 10 are non-conflicting. Their operations can therefore overlap. In other words, while the engine is performing the code stage for pixel n, it can also perform the read ROM stage for pixel n+1 and the read RAM stage for pixel n+2 at the same time. Data dependency conflict may occur i f the contexts for consecutive pixels are the same, because processing o f prior pixels may change the values o f I and M P S used for the processing of succeeding pixels. Introducing a data forwarding mechanism between different stages solves this problem because the engine can replace outdated data from external memory with the updated data within the engine. In contrast to the first three stages of processing, the fourth stage renorm cannot run in parallel with the stage code because both stages operate on values in the A and C registers, resulting in a conflict o f resources. A s a result, the processing o f the code stage for the succeeding pixel cannot begin until the processing of the renorm stage for the prior pixel has been completed. In other words, the pipeline stalls temporarily whenever the renorm stage is entered. Fortunately, making slight modifications to the code and renorm stages can minimize the time the pipeline is stalled. The renormalization stage doubles the values of the C and A registers until A is greater than 0x8000. In hardware, this means left-shifting C and A until bit number 15 of the A 66 Chapter 5: Detailed Description of the JBIG Processing Engine register becomes one. B y implementing barrel shifters and circuits to detect the number of leading zeros in the A register, the renorm stage can easily be combined with the code stage to eliminate pipeline resource conflict. The additional hardware, however, significantly increases the delay o f the critical path of the engine, thus slowing down the clock speed. A compromise solution involves recognizing that when M P S is coded, the A register needs only to be doubled once at most, in order to make its value greater than 0x8000. Recall that an arithmetic coder codes a symbol by reducing the interval to the size of the symbol being coded. B y definition, the statistical interval containing the M P S is always greater than that containing the L P S . Hence, when the QM-coder codes M P S , the A register, which contains the interval size, w i l l never be reduced to less than half its original value. Consequently, doubling the reduced value of A w i l l always make it greater than its original value. Since A is never less than 0x8000 at the start, only one doubling operation is needed when M P S is coded. Taking advantage o f this property and of the fact that the majority o f symbols are M P S ' s , one can append a 1-bit shifter logic to the arithmetic data path so that the frequently occurring single bit shifts do not require extra cycles for processing. A t the same time, the depth of the shifter is small enough that it does not add significantly to the cycle time of the engine. Furthermore, the additional cycles needed for multi-bit shifts when L P S ' s are encountered do not decrease the overall throughput of the engine significantly because of the infrequency of their occurrences. 67 Chapter 5: Detailed Description of the JBIG Processing Engine Read ROM Code Renorm 1 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 1 2 3 4 5 h 7 3 5 5 5 5 Clock Cycle 1 8 9 10 11 12 13 14 Figure 11: Possible Schedule of QM-Coder Unit's Pipeline Figure 11 shows a possible schedule of the QM-Coder Unit 's pipeline. A filled box indicates that the pipeline stage is active during a particular clock cycle, while an empty box indicates an inactive pipeline stage. A number in each box corresponds to the pixel being processed at the particular stage and time. A s shown in Figure 11, extra cycles are necessary when renormalizations of more than one bit are required (pixels 3 and 5). During renormalization, no other stages can process additional information. The renorm stage uses the same 1-bit shifter logic that is used in the code stage. Therefore, the engine must execute the renorm stage in consecutive cycles until the value in the A register is greater than 0x8000 (pixel 5). 68 Chapter 5: Detailed Description of the JBIG Processing Engine 5.1.2 Adapter Unit The adapter implements the memory interface for retrieving and updating the I and M P S pair for each possible value of C X , and R O M tables for converting I into useful information used in the arithmetic coding process. In other words, the adapter provides resources to execute the read RAM and read ROM stages, as well as the memory update part o f the code stage in the engine's pipeline. Although the encoder and decoder share the same memory and R O M table, differences in the way C X is generated between the encoder and the decoder make the architecture of the adapter slightly different in these t units. The following subsections describe, separately, the adapter for the encoder and the decoder. 5.1.2.1 Encoder Figure 12 shows the block diagram of the adapter in the encoding engine. The C X memory block in the figure represents the external dual-port memory, which is not included as part of the J B I G engine but is shown in the figure for the sake of clarity. The C X memory is configured as a 32-bits-by-256-entries entity, as opposed to the more obvious 8-by-1024 approach. This configuration is needed in the decoder so that four pairs o f I and M P S in the adaptive memory can be accessed at a time. The same configuration is kept in the encoder for the sake of design consistency. Data are read asynchronously and written synchronously to the C X memory. The adapter reads the C X and D inputs from the C X construction unit and provides the coder with values of Qe, 69 Chapter 5: Detailed Description of the JBIG Processing Engine M P S and D associated with the same pixel for processing. Furthermore, the adapter updates the C X memory and internal registers, using the results of the coder. cx M O b> readCX 10 r> romCX 10 h> write CX read address 32 write data write address CX Memory 4 32 bmem data 1 32 8 1 bmem data 2 32 32 b> 11, MPS1 t t Qe ROM 2\u00C2\u00A7L .16 fc> I, MPS t> Qe next I, MPS, Qe Logic 16 I I t t t from Coder Unit to Coder Unit > write enable > Figure 12: Architecture of Adapter in QM-Encoder Unit Once the E N C O D E procedure is executed, the adapter first reads the value of C X from the C X Construction Unit. It then uses the first eight bits of C X as the address to read 70 Chapter 5: Detailed Description of the JBIG Processing Engine data from the C X memory and the last two bits of C X to select the I -MPS pair that corresponds to the full value of C X . The selected result is stored in a register to complete the read RAM stage of the pipeline. Using I as the address, the adapter reads the value of Qe from a R O M table and stores its value in a register. A t the same time, the values of D, I and M P S are registered for processing by the coder. The result of the coder dictates whether an update to C X memory is necessary. If so, the adapter uses the result of the coder and current values o f I and M P S to determine new values for Qe, I and M P S . Since each memory address contains data for four separate C X s , the adapter creates the data to write into memory by combining the new values of I and M P S with partial results cached from memory in earlier stages. Feedback paths exist between the next Qe, I and M P S logic and registers in the adapter. The registers read data from the feedback paths instead of the main paths i f the updated data for the current pixel affect data already read from memory into the engine. 71 Chapter 5: Detailed Description of the JBIG Processing Engine T new I NMPS, NLPS table 7 7 Qe table I 1 6 new Qe (a) NMPS, NLPS table 16 16 i 1 1 new I (b) NMPS-Qe, NLPS-Qe table 16 4 \u00C2\u00B1 16 16 new Qe Figure 13: Implementations of New I and Qe Logic with (a) Longer Delay, (b) Shorter Delay Figure 13 shows two possible implementations of the logic to determine the new values of I and Qe. Figure 13(a) shows a straightforward approach that first determines the new value of I, and then uses it as an index to look for the corresponding value of Qe. Unfortunately, this approach requires the stacking of two table lookups, which causes a long delay between the original input I and the final output new Qe. To remove this undesirable effect, the logic shown in Figure 13(b) is implemented. A new table is created that effectively combines the N L P S - N M P S table and the Qe table. This table takes I as an input and returns the values o f Qe corresponding to N L P S and N M P S as defined in the original table by JBIG. The final value o f Qe can then be selected using the same logic that selects the new value of I. This approach has the drawback of requiring more hardware to be implemented than the original approach. The newly 72 Chapter 5: Detailed Description of the JBIG Processing Engine defined table returns two 16-bit values of Qe compared to just one in the original Qe table. Its size is therefore twice the size of the Qe table. In addition, an extra multiplexer is required to select the final value of Qe. Nevertheless, this approach is chosen to minimize delays in the circuit. 5.1.2.2 Decoder cx M O b> readCX 10 b romCX 10 b> write CX read address 32 write data write address 32 bmem data 1 32 1 t bmem data 2 32 4 t + b> I, MPS next I, MPS, Qe Logic CX Memory 1.6 Ah write enable 16 b>M,MPS1,IO, MPSO Qe ROM Qe ROM I 16, , 16 5 b> Qe from Coder Unit 16 to Coder Unit Figure 14: Architecture of Adapter in QM-Decoder Unit 73 Chapter 5: Detailed Description of the JBIG Processing Engine Figure 14 shows the block diagram of the adapter in the decoding unit. A s mentioned in section 5.1.2.1, the C X memory has a 32 bits wide data path, so that four pairs o f I and M P S corresponding to four different C X s can be accessed at a time. This configuration is required in the decoder because of the pipelined operations in the engine and the way C X is generated from the image. Since both J B I G templates (Figure 5) contain pixels immediately preceding the target pixel, the values of these pixels determine the final value of C X . In the pipeline, the memory read stage happens two stages ahead of the coding stage, where the pixel is decoded; therefore, the two bits in C X corresponding to the two pixels immediately preceding the target pixel are not generated in time to become part of the memory address. B y using only the available eight bits o f C X as an address, the adapter reads the values of I and M P S for all four possible values of C X . The appropriate pair can be selected with fast multiplexers when the final two bits in C X are decoded. During decoding, the adapter performs the read_RAM stage by using the first eight bits of C X as the address to read data from C X memory. A t the same time, the coder decodes the value o f the pixel that corresponds to the ninth bit in C X . A s soon as the value of the pixel is known, the number of possibilities in C X is reduced to two. The adapter stores either the upper 16 bits or the lower 16 bits of the C X memory data into an internal register, depending on whether a 1-bit or 0-bit is decoded. In the readROM stage, as the coder is decoding the last bit o f C X , the adapter converts the two stored indexes into Qe, 74 Chapter 5: Detailed Description of the JBIG Processing Engine using two Qe-tables running in parallel. The final values of Qe, I and M P S are selected and stored into registers as soon as the coder has decoded the pixel. The adapter provides these values to the coder for decoding of the current pixel in the code stage. A s in the encoder, the adapter updates the C X memory and internal registers when necessary. 5.1.3 Coder Unit The coder unit implements all the arithmetic operations and decision logic defined in the QM-coder. Using data provided by the adapter, the coder manipulates values in the internal C and A registers to produce compressed data in the encoder, and to decode the value of each pixel in the image from compressed data in the decoder. During the design process, the arithmetic of the QM-coder algorithms is analyzed carefully in order to efficiently implement the coder. The following subsections describe the architecture of the coder in the encoding and decoding engine. 5.1.3.1 Encoder Figure 15 shows the architecture o f the coder in the QM-Encoder Unit. The C and A registers have precisions o f 28 bits and 16 bits, respectively. Furthermore, the C register is divided into two parts: C l o w and Chigh. C l o w represents the lower 16 bits of the C register, while Chigh represents the upper 12 bits. Analyzing the E N C O D E procedure of the QM-coder, we can see that the following arithmetic and logical operations are always executed regardless of whether an M P S or an L P S is coded: 75 Chapter 5: Detailed Description of the JBIG Processing Engine temp_A = A - Qe temp_C = C + temp_A temp_A < Qe (1) (2) (3) 16 16 ie 12 cout encode/ flush I Chigh select 4^ A\ Chigh + 1 12 k-^ ^ ~7 12 COUt \u00C2\u00AB 1 16 > B 8, \u00C2\u00BB\u00E2\u0080\u0094x 1 0 T + 1 ' 1 w. new Chigh encode/ flush Clow 16 16 \u00C2\u00AB 1 1 \u00C2\u00A3 16 SCD 16 new Clow 16 16 \u00C2\u00AB 1 \u00C2\u00AB 1 + \u00E2\u0080\u00A2 + + Qe 16 \u00C2\u00AB 1 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 * decision 16 4 V 4 new A D MPS To adapter Figure 15: Architecture of Coder in QM-Encoder Unit These operations are therefore executed without first determining whether the target pixel is M P S or L P S . Furthermore, by substituting equation (1) into equation (3), and then 76 Chapter 5: Detailed Description of the JBIG Processing Engine adding Qe to both sides of the equation, the comparison in equation (3) can be replaced by A < 2 * Qe. (4) Hardware implementation of equation (4) is preferable to that of equation (3), because the comparison can be done without waiting for the result of equation (1). Moreover, implementing equation (4) requires no more hardware than equation (3), because the doubling of Qe can be implemented as a left-shift operation. Because of the difference in precision between the C and A registers, the addition operation in equation (2) is implemented as a carry-select adder to reduce the delay in the carry ripple. In the carry-select adder, the 16-bit addition (Clow + temp_A) determines the lower 16 bits of temp_C. Depending on the value of the carry bit in the addition operation, the adder selects either the value Chigh or (Chigh + 1) as the upper 12 bits of temp_C. 77 Chapter 5: Detailed Description of the JBIG Processing Engine TEMP = (A - 1 + C) AND OxffffOOOO if TEMP < C C = TEMP + 0x8000 else C = TEMP end if Figure 16: Algorithms for the CLEARBITS procedure [18] The flushing of the C register executed at the end of each stripe includes a C L E A R B I T S procedure designed to clear the trailing bits in the compress datastream to zeros. Figure 16 shows the algorithms for the C L E A R B I T S procedure in the QM-coder [18]. The calculation of ( T E M P = (A - 1 + C) A N D OxffffObOO), followed by the comparison ( T E M P < C) , is equivalent to calculating the value (A - 1 + Clow) and then checking the carry bit o f the calculation. A carry bit o f zero means the comparison ( T E M P < C) is true, whereas a carry bit of one or C l o w = 0 means the comparison is false. Therefore, during execution of the C L E A R B I T S procedure, the selection part in the carry-select adder is reused to select the value of Chigh. However, the logic that determines how the selection is made differs from the logic used in the encoding process. A s mentioned previously, the coder employs inline 1 -bit shifters to handle single bit renormalization o f the C and A registers without the need o f additional clock cycles. 78 Chapter 5: Detailed Description of the JBIG Processing Engine This creates a number of possibilities when assigning the next values for the C and A registers. Possible assignments of the A register are: \u00E2\u0080\u00A2 A = temp_A, \u00E2\u0080\u00A2 A = temp_A \u00C2\u00AB 1, \u00E2\u0080\u00A2 A = Q e \u00C2\u00AB l , and \u00E2\u0080\u00A2 A = A \u00C2\u00AB 1 . Since the assignment o f Qe to the A register corresponds to the coding of L P S , this assignment is always followed by a renormalization. The assignment A = A \u00C2\u00AB 1 is for execution of the renorm stage only when multi-bits renormalization is needed. Possible assignments of the C register are: \u00E2\u0080\u00A2 C = (C + t e m p _ A ) \u00C2\u00AB 1, \u00E2\u0080\u00A2 C = C \u00C2\u00AB 1 , \u00E2\u0080\u00A2 C = ( T E M P + 0 x 8 0 0 0 ) \u00C2\u00AB 1, and \u00E2\u0080\u00A2 C = T E M P \u00C2\u00AB 1 . The coder uses multiplexers to select the final values assigned to the C and A registers. Because o f the use of a carry-select adder when calculating the next value of C , Chigh and C l o w are selected and assigned separately. The decision logic in the coder dictates 79 Chapter 5: Detailed Description of the JBIG Processing Engine how the coder selects the correct values for the C and A registers, and whether or not the adapter needs to update data in the C X memory. 5.1.3.2 Decoder new Chigh To adapter D Figure 17: Architecture of Coder in QM-Decoder Unit Figure 17 shows the architecture of the coder in the QM-Decoder Unit. The A register has 16 bits of precision as in the encoder. The C register, on the other hand, requires only 24 bits o f precision. A l l arithmetic operations in the coder are 16-bit arithmetic because 80 Chapter 5: Detailed Description of the JBIG Processing Engine arithmetic operations on the C register involve only the most significant 16 bits of C, Chigh. The lower 8 bits o f C are used only for reading the incoming compressed datastream, which is shifted into Chigh when renormalization occurs. In the QM-decoder Unit, the following operations are first calculated in the coder: temp_A = A - Q e A < 2 * Qe Chigh < temp_A (5) temp_Chigh = Chigh - temp_A. (6) Equation (5), however, can be replaced by C h i g h - t e m p _ A < 0 . (7) Equation (7) has an advantage over equation (5), because the subtraction in (7) can share the same hardware used in executing equation (6). The coder can then make the comparison simply by checking the sign bit of the result, temp_Chigh, of the subtraction. Thus, execution o f the 16-bit comparison in equation (5) is not necessary. When combined with the inline 1-bit shifter for renormalization, possible results of the new value o f A . register are: 81 Chapter 5: Detailed Description of the JBIG Processing Engine \u00E2\u0080\u00A2 A = temp_A, \u00E2\u0080\u00A2 A = temp_A \u00C2\u00AB 1, \u00E2\u0080\u00A2 A = Q e \u00C2\u00AB l , a n d \u00E2\u0080\u00A2 A = A \u00C2\u00AB 1 . Possible new values for the Chigh register are: \u00E2\u0080\u00A2 Chigh = Chigh \u00C2\u00AB 1, \u00E2\u0080\u00A2 Chigh = temp_Chigh \u00C2\u00AB 1 , and \u00E2\u0080\u00A2 Chigh = Chigh \u00C2\u00AB 8. Since Chigh and C l o w represent different parts of the C register, the most significant bits in the C l o w register are shifted into the least significant bits of the Chigh register when performing the left-shift operations. The 8-bit shift operation is used only when initializing the C register at the beginning of each stripe. In addition to determining the appropriate selections for assignment of new values in the C and A registers, the decision logic in the coder also dictates the colour of the target pixel, as well as whether or not data in the C X memory are updated. 82 Chapter 5: Detailed Description of the JBIG Processing Engine 5.2 CX Construction Unit The C X Construction Unit manages the compression and decompression of pixels in a stripe of an image. During the processing of a stripe, the unit reads pixels from memory to generate context ( C X ) for the QM-Coder Unit. It also generates the target pixel (D) for compression in the encoding engine and assembles the image from the decompressed pixels in the decoding engine. In addition, the C X Construction Unit controls the flow of the QM-Coder Unit by calling appropriate procedures of the QM-coder throughout the processing of a stripe. 9 8 7 '6 5 4 3 2 1 0 ? (a) 9 6 5 4 3 2 8 7 1 0 ? (b) Figure 18: Bit Assignments of CX in (a) Three-Line Template and (b) Two-Line Template The C X Construction Unit allows for the use of both the three-line and the two-line templates for context generation. Figure 18 shows the bit assignments of C X in the three-line and two-line templates. The number in each box of the templates identifies the 83 Chapter 5: Detailed Description of the JBIG Processing Engine assigned bit number in C X . In both templates, the least significant two bits in C X are assigned to the two pixels in the template immediately preceding the target pixel for convenience in choosing the memory address when accessing the C X memory. Bit number 2 of C X is assigned to the adaptive pixel. Although Figure 18 shows the adaptive pixel in its default position, the assignment of bit 2 in C X follows the movement of the adaptive pixel i n the template. Overall, the bit assignments in the three-line and two-line templates are designed so that the difference between the two templates affects the minimum number of bits in C X . The unit changes only the assignments of bits 7 through 9 in C X when changing between the three-line and two-line templates. Furthermore, the designed bit assignments dictate that the reused context for coding the typical prediction pseudo-pixel in both templates shown in Figure 6 be mapped to the same value of 0011100101. The C X Construction Uni t implements three buffers for storage pixels. The three buffers represent the three lines in the three-line template and hold only the active parts o f the image. Since the engine does not store a complete line for referencing, most pixels in the image are read multiple times into the engine. To minimize memory access, one of the buffers is inactive when using the two-line template. When coding a stripe, the C X Construction Unit first reads 16 pixels of each line in the template into the buffers. It then shifts pixels through the buffers as the template moves through each pixel in the image. Additional pixels are read into the buffers when necessary. To simplify the design, the unit synchronizes al l read requests. The reference line buffers are 19 and 21 84 Chapter 5: Detailed Description of the JBIG Processing Engine bits long. The current line buffer is 22 bits long in the encoder and 128 bits long in the decoder. The current line buffer length in the decoder is necessary for movement of the adaptive pixel. In addition to acting as the template, the current line buffer in the decoder also serves as the output buffer of the engine. The C X Construction Unit writes data to memory after every 16 pixels are decoded. Two 16-bit counters keep track of the current X and Y positions of the stripe and are updated after each pixel in the stripe is processed. 5.3 JBIG Processing and Control Unit The J B I G Processing and Control Unit controls the overall operation of the J B I G engine and manages the J B I G datastream. The unit oversees the availability of and requests to all I/O streams. When the unit detects a request to access an unavailable I/O, it pauses the operation of the engine temporarily until data on that particular I/O becomes available. Sixteen-bit registers are used for the storage of image parameters such as image length, image width, stripe length and the line number of the image currently being processed. Additional registers used in the unit include 16 bits for input buffer, 8 bits for the current S C D value, 7 bits each for the M x and tx, and 16 bits each Y A T and L c . Furthermore, the unit contains an address-decoding table for converting values in the internal registers into data i n memory space. The host processor can therefore access each internal register in the engine v ia its assigned address during context switching. 85 Chapter 5: Detailed Description of the JBIG Processing Engine not end of stripe Figure 19: State Transition Diagram of JBIG Encoding Engine The most important function o f the J B I G Processing and Control Unit, however, is to manage the processing of the J B I G datastream. After the unit has accepted all the parameters of the image, it informs the C X Construction Unit and the QM-Coder Unit to process data in each stripe o f the image until the end of the image is reached. Figure 19 shows the state transition diagram of the J B I G encoding engine. The unit starts encoding an image by first setting the C X Construction Unit and the QM-Encoder Unit to their initial state. Encoding of each stripe in the image then begins. A s the QM-Encoder Unit produces output, the unit redirects the produced data to the output of the engine. If the QM-Encoder Unit produces an E S C byte, which marks the beginning of a marker segment, during encoding, the J B I G Processing and Control Unit inserts a stuff byte into the output datastream. The unit inserts an end-of-stripe marker after the encoding for 86 Chapter 5: Detailed Description of the JBIG Processing Engine each stripe is finished. The unit also inserts a N E W L E N marker when a change in image length is detected. The engine detects a change in image length when the host processor indicates the end of image data has occurred, but the engine has not completed encoding of the image according to the Y D image parameter. The processing of the engine stops when the end of the image is reached. Figure 20: State Transition Diagram for JBIG Decoding Engine B y comparison, the management of the J B I G datastream in the J B I G decoding engine is more complicated than the management in the encoding engine. This is largely due to the 87 Chapter 5: Detailed Description of the JBIG Processing Engine need to support various options in the JBIG standard by the decoder. Figure 20 shows the state transition diagram of the JBIG decoding engine. Processing of each stripe begins by first detecting the existence of the floating marker segments in the JBIG datastream. The J B I G Processing and Control Unit processes each marker segment appropriately when detected. The non-existence of a marker means the start of an S D E segment. In this case, the unit resets the C X Construction Unit and the QM-Decoder to start the processing o f the stripe. During processing, the unit removes all stuff bytes present in the S D E . The engine continues to decode the stripe until the end-of-stripe marker is received. Because of the possibility of a N E W L E N marker changing the length of the image, the unit must look for its existence beyond the end-of-stripe marker in each S D E . If the unit detects a N E W L E N marker, it immediately processes the change in image length. The unit then appends 0x00 bytes to the end of the S C D stream to continue the decoding of the rest o f the stripe. Processing of the engine stops when the end of the image is reached or when an A B O R T or invalid marker is received in the JBIG datastream. 5.4 Implementation Results Logic Cells D Flip Flop Memory Bits Carry Bits Cascade Bits Encoder 1560 356 6780 173 709 Decoder 2587 576 8475 149 1239 Table 14: Synthesis Results for F L E X 10K200S at 50 MHz 88 Chapter 5: Detailed Description of the JBIG Processing Engine The model of the J B I G engine is developed using the V H D L hardware description language. The model is synthesized using the Exemplar LeonardoSpectrum level 1 Altera O E M version [24] synthesis software. Using an Altera F L E X 10K200S device library with a -1 speed rating [25], both the encoder and decoder achieve the target clock rate o f 50 M H z . Table 14 shows statistics on the results o f the synthesis. A s shown in the figure, the J B I G decoder uses about 66% more logic cells and about 62% more D -flip-flops than the J B I G encoder. Even though the arithmetic coder in the decoder is less complex than the one in the encoder, the size o f the decoding engine is bigger, mainly due to the more complicated state machine in the JBIG Processing and Control Unit to support of various options in the J B I G facsimile standard, and to the extra buffers needed to store 128 pixels on the current line o f decoding. Moreover, the decoder uses more memory bits than the encoder because of the parallel Qe-table it employs in the adapter unit. Average Engine Throughput pixels/cycle million pixels/sec (a> 50 M H z Typical Prediction Disabled 0.9761 48.80 Typical Prediction Enabled 1.3676 68.38 Table 15: Average Throughput of JBIG Engine 89 Chapter 5: Detailed Description of the JBIG Processing Engine Simulation on the J B I G engine model is done using the Model Technology ModelSim-Altera software version 5.3d. Table 15 shows the average throughput of the J B I G engine with and without typical prediction enabled. Test results are obtained by decoding a set of 22 images test images. The encoding performance of non-typically predicted images is similar. The test images are encoded using real world fax machines at resolutions of 200 x 100 dpi and 200 x 200 dpi. The statistics show that the J B I G engine is capable of processing images at a rate of close to 1 pixel per clock cycle. Furthermore, the performance of the engine is improved by approximately 40% when the typical prediction option is enabled. For letter-sized images at 600 x 600 dpi resolution, the engine is capable of processing about 80 ppm without typical prediction enabled, and about 110 ppm with typical prediction enabled. IP00C401\u00E2\u0084\u00A2 1PB SG3\u00E2\u0084\u00A2 P M - 2 2 \u00E2\u0084\u00A2 Maximum Clock Rate 66 66 75 Throughput (Mpixels/sec) 66 80 75 Table 16: Performances of JBIG Processors [26] [27] [28] Table 16 shows the performances of three processors (Sumitomo Metal 's IP00C401 [26], Advanced Hardware Architectures' IPB SG3 [27] and Pixel Magic 's PM-22 [28]) providing J B I G encoding and decoding functionalities. A comparison shows that the throughput of the J B I G engine is very similar to these processors based on the pixels per clock calculations. 90 Chapter 6: Conclusions and Future Research Chapter 6 Conclusions and Future Research The J B I G engine is a high-speed image processing engine capable of compressing and uncompressing images in compliance with the JBIG facsimile standard. Applications of the J B I G engine include high-speed facsimile transmission services and digital printer controls. The ability of the J B I G engine to switch between the processing of different images at any time makes it easy to satisfy the time constraints when multiple images are processed simultaneously in these applications. In this project, the J B I G facsimile standard is first studied to gain a full understanding of its data structure and coding techniques. The standard supports 32 bits of precision in most of its parameters. For applications of the J B I G engine, however, 16 bits are often enough to represent the values in these parameters. After careful consideration, implementation of the full 32-bit range supported by J B I G is deemed to be too costly. Therefore, the J B I G engine supports only 16 bits o f precision in parameters such as image width, image length and stripe length. Furthermore, the encoding engine does not support options such as A T pixel movement and typical prediction because of similar cost concerns. O n the other hand, the decoding engine supports all options in the J B I G standard. The J B I G engine is broken into three functional units: the J B I G Processing and 91 Chapter 6: Conclusions and Future Research Control Unit, the C X Construction Unit and the QM-coder Unit. The J B I G Processing and Control Unit is responsible for processing various control markers in the compressed datastream, as wel l as for monitoring all I/O's in the engine for the readiness of data and context switching. The C X Construction Unit assembles the image from uncompressed data to construct the context of each pixel. It also maintains al l stripe-related information, so that the QM-coder can be initiated and terminated properly at the beginning and completion of each stripe. The QM-coder Unit carries out all arithmetic and logical operations during the processing of an image. The probability estimation states and the M P S corresponding to each value of C X are maintained in an external 1-kilobyte C X memory. The QM-coder Unit updates the C X memory when necessary. A n analysis of the QM-coder shows that the QM-coder Unit can be implemented with a pipelined architecture for maximum engine throughput. The pipelined architecture of the QM-coder Unit requires the C X memory to have a dual-ported interface for simultaneous reading and writing operations. In addition, the QM-coder Unit accesses four I-MPS pairs at a time. The C X memory is therefore arranged as a 32-bit by 256 entries table. Simulation o f the J B I G engine shows that an average throughput o f 0.9761 pixels/cycle is obtained for images with typical prediction disabled and 1.3676 pixels/cycle for images with typical prediction enabled. When running at the clock rate of 50 M H z , the equivalent throughputs are 48.80 and 68.38 mil l ion pixels/sec for image disabling and enabling typical prediction respectively. 92 Chapter 6: Conclusions and Future Research The work done in this project not only proposes a J B I G processing engine that is practical for current applications, but also proposes an architecture which can provide the basis of future researches. Some possible directions in future research are as follows: \u00E2\u0080\u00A2 Simulation in this project shows that typical prediction improves the speed of the decoder by about 40%. The use of typical prediction in the encoder w i l l provide similar, although not as much, improvement in compression speed. Support of typical prediction in the encoding engine should therefore be considered when the J B I G engine is configured for other applications. \u00E2\u0080\u00A2 Other studies have used loop unrolling and speculative execution techniques to implement another finite-precision arithmetic coder, the Q-coder, with a throughput of over 1 pixel per clock cycle [3]. A similar analysis can be made on whether or not this approach can be used for implementation of the JBIG algorithms. Tradeoffs between area and speed should also be studied. \u00E2\u0080\u00A2 Separate implementations of the encoder and the decoder allow for parallel executions of these units. However, a combined engine supporting both encoding and decoding operations can dramatically reduce circuit size. Research can be done to study how this combined engine can be implemented efficiently, as most buffers and registers can be shared between the encoder and decoder. In many applications, a single engine that performs both encoding and decoding tasks is sufficient. 93 Chapter 6: Conclusions and Future Research The J B I G engine can be used as a basis for developing processing engines for the emerging JBIG2 [29][30] and JPEG2000 [31] standards. In particular, the M Q -coder specified in these new standards is very similar to the QM-coder used in JBIG. Therefore, the QM-coder Unit can easily be modified to implement the MQ-coder algorithms, thus forming the code generation engine for further development of the processors for the JBIG2 and JPEG2000 standards. 94 References References [1] \" P M - 2 m Family User's Guide,\" Revision 1.1, Pixel Magic Incorporated, June 1997 [2] \" P M - 2 m Programming Guide,\" Revision 1.0, Pixel Magic Incorporated, 1997 [3] G . Feygin, P. G . Gulak, P. Chow, \"Architectural Advances in the V L S I Implementation of Arithmetic Coding for Binary Image Compression,\" I E E E 1068-0314/94, 1994 [4] M . Tarui, M . Oshita, T. Onoye, I. Shirakawa, \"High-Speed Implementation o f J B I G Arithmetic Coder,\" I E E E 0-7803-5739-6/99,1999 [5] A . Higashi, H . Ohwada, K . Tamaki, N . Tahara, G . Goto, \" A Multipurpose Single Chip J B I G based Compresion/Decompression System,\" I E E E 0-7803-2140-5/95, 1995 [6] P. Tong, P. Ang , \" A J B I G Arithmetic Coder-Decoder Chip ,\" I E E E 0-7803-0768-2/92,1992 [7] H . Horie, H . Shirai, Y . Iizuka, M . Takahata, \"Bi- level Image High Speed Code Conversion Processor: ImPC2,\" I E E E 0-7803-5146-0/98,1998 [8] H . Y . Lee, L . S. Lan, M . H . Sheu, C . H . Wu , \" A Parallel Architecture for Arithmetic Coding and Its V L S I Implementation,\" I E E E 0-7803-3636-4/97, 1997 [9] G . Cena, P. Montuschi, L . Ciminiera, A . Sanna, \" A Q-Coder Algorithm with Carry Free Addit ion,\" I E E E 1063-6889/97, 1997 [10] M . J. Slattery, J. L . Mitchell , \"The Qx-coder,\" IBM Journal of Research & Development, 1998 [11] P. S. Colyer and J. L . Mitchel l , \"The I B M J B I G - A B I C Verification Suite,\" IBM Journal of Research & Development, 1998 95 References [12] K . M . Marks, \" A J B I G - A B I C Compression Engine for Digital Document Processing,\" IBM Journal of Research & Development, 1998 [13] W . B . Pennebaker and J . L . Mitchel l , JPEG: Still Image Data Compression Standard, V a n Nostrand Reinhold, 1993 [14] J. L . Mitchel l and W . B . Pennebaker, \"Optimal Hardware and Software Arithmetic Coding Procedures for the Q-Coder,\" IBM Journal of Research & Development, 1988 [15] Khal id Sayood, Introduction to Data Compression, Morgan Kaufmann Publishers, 1996 [16] P. G . Howard and J. S. Vitter. \"Arithmetic Coding for Data Compression,\" Proceedings of the IEEE, 82(6), June 1994, 857-865 [17] I. H . Witten, A . Moffat, T. C. Be l l , Managing Gigabytes: Compressing and Indexing Documents and Images, 2 n d Edition, Morgan Kaufmann Publishers, 1999 [18] \"Coded Representation of Picture and Audio Information - Progressive Bi- level Image Compression,\" ITU-T Recommendation T.82, March 1993 [19] \"Standardization of Group 3 Facsimile Apparatus for Document Transmission,\" C C I T T , 1980 [20] \"Facsimile Coding Schemes and Coding Control Functions for Group 4 Facsimile Apparatus,\" C C I T T , 1984 [21] Jerry D . Gibson, Toby Berger, Tom Lookabaugh, Dave Lindbergh, Richard L . Baker, Digital Compression for Multimedia, Morgan Kaufmann Publishers, 1998 [22] \"Application Profile for Recommendation T.82 - Progressive Bi- level Image Compression (JBIG Coding Scheme) for Facsimile Apparatus,\" ITU-T Recommendation T.85, August 1995 [23] Dennis Bodson, Kenneth R. McConnel l , Richard Schaphorst, \" F A X : Digital Facsimile Technology and Applications,\" Second Edition, Artech House, 1992 96 References [24] Application Note 102 (Improving Performance in F L E X 1 OK Devices with Leonardo Spectrum Software),\" Version 1.01, Altera Corporation, June 1999 [25] f l e x 10KE Embedded Programmable Logic Devices Data Sheet,\" Version 2.2, Altera Corporation, January 2001 [26] TP00C401\u00E2\u0084\u00A2 (JBIG) High-Speed Lossless Image Compression/Expansion L S I , \" Sumitomo Metal Industries, Ltd., http://www.smi-lsi.com/ip00c401.html [27] f P B S G 3 \u00E2\u0084\u00A2 Single-Chip Multi-Function Image,\"Advanced Hardware Architectures Inc., http://www.aha.cohTytechnology/showproduct.asp?iId;=33 [28] P M - 2 2 Multifunction Image Processor,\" Oak Technology, Inc., http.V/www.oaktech.com/products/imaging/processors/icodec/pm22.html [29] P. G . Howard, F . Kossentini, B . Martins, S. Forchhammer, W . J. Rucklidge, F. Ono, The Emerging JBIG2 Standard,\" IEEE Transaction on Circuit and systems for Video Technology, V o l . 8, N o . 5, September 1998 [30] Working Draft 14492, JB IG Committee, August 1998 [31] ' J P E G 2000 Part I Final Committee Draft Version ISO/IECFCD15444-1, March 2000 97 List of Acronyms A B O R T Abort Marker A T Adaptive A T M O V E A T Pixel Movement Marker B I D Bi-level Image Data B I E Bi-level Image Entity B I H Bi- level Image Header C C I T T Consultative Committee on International Telephone Telegraph cdf Cumulative Distribution Function Chigh High Order 16 bits of C Register C l o w L o w Order 16 bits of C Register C O M M E N T Comment Marker C X Context dpi Dots per Inch . E S C Escape Byte G3 Group 3 \u00E2\u0080\u00A2 G4 Group 4 IEC International Electrotechnical Commission I/O Input/Output ISO International Standards Organization J B I G Joint Bi-level Image Group J P E G Joint Photographic Expert Group L N T P Line Not Typically Predicted L P S Less Probable Symbol M H Modified Huffman M H z Mega Hertz M M R Modified Modified Read M P S More Probable Symbol M R Modified Read N E W L E N New Length Marker P S C D Protected Stripe Coded Data ppm Pages per Minute R A M Random Access Memory renorm Renormalization R E S E R V E Reserved Marker R O M Read Only Memory S C D Stripe Coded Data S D E Stripe Data Entity S D N O R M Normal Stripe Termination Marker S D R S T Reset Stripe Termination Marker S L N T P Switch in Line Not Typically Predicted S T U F F Stuff Byte "@en . "Thesis/Dissertation"@en . "2001-11"@en . "10.14288/1.0065089"@en . "eng"@en . "Electrical and Computer Engineering"@en . "Vancouver : University of British Columbia Library"@en . "University of British Columbia"@en . "For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en . "Graduate"@en . "Architecture of a high speed JBIG processing engine"@en . "Text"@en . "http://hdl.handle.net/2429/11968"@en .