UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

HDR image construction from multi-exposed stereo LDR images 2012

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
ubc_2012_spring_sun_ning.pdf
ubc_2012_spring_sun_ning.pdf [ 2.47MB ]
ubc_2012_spring_sun_ning.pdf
Metadata
JSON: 1.0072542.json
JSON-LD: 1.0072542+ld.json
RDF/XML (Pretty): 1.0072542.xml
RDF/JSON: 1.0072542+rdf.json
Turtle: 1.0072542+rdf-turtle.txt
N-Triples: 1.0072542+rdf-ntriples.txt
Citation
1.0072542.ris

Full Text

HDR Image Construction from Multi-exposed Stereo LDR Images by Ning Sun A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES (Electrical and Computer Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) January 2012 © Ning Sun, 2012 ii Abstract The vast majority of cameras in the market nowadays can only capture a limited dynamic range of a scene. To generate high dynamic range (HDR) images, most existing methods use multiple images obtained from a single low dynamic range (LDR) camera at consecutive instances. These methods can obtain good quality HDR images for still or slow motion scenes but not for scenes with fast motion. In this thesis, we propose the use of two LDR cameras which have different exposures. To generate an HDR image, the two differently exposed LDR images of the same scene are used. The two LDR images should be captured at the same instance, so as to deal with scenes with fast motion. The most challenging step in this approach is to obtain accurate estimates of the disparity maps of the scenes. This will allow us to correctly align the pixels from the two differently exposed pictures when forming the HDR images. Very few state-of-the-art stereo matching algorithms can deal with the problem of obtaining accurate estimates of the disparity map from two differently exposed images. This is because the input LDR images that are used to construct HDR images have large radiometric changes. In addition, the two input LDR images usually have saturations in different areas. To obtain accurate disparity maps, we present a novel algorithm that obtains an initial estimate of the disparity map. Then a refinement step is used to minimize the edge effect and interpolates the values in the saturated regions. Compared to other state-of-the-art methods, our algorithm has a simpler set up with only two standard commercial LDR cameras. The offline processing of the LDR images has a simpler cost function, especially the cost function we use in the refinement step of the disparity map. This reduces the computational complexity and thus the processing time of the LDR images to form the HDR image. Moreover, the disparity map computed by our algorithm can tolerate greater radiometric changes and saturations. Therefore, the HDR images constructed by our algorithm are smoother and have fewer defects than those constructed by other methods. iii Preface A version of chapter 3 has been published. [Ning Sun], M. Hassan, and R. Ward. (2010) HDR Image Construction from Multi-exposed Stereo LDR Images. 2010 17th IEEE International Conference on Image Processing. 2793 – 2796. I proposed the algorithm and conducted the experiments. I also wrote the manuscript. M. Hassan and R. Ward reviewed the manuscript and R. Ward provided supervision for this project. iv Table of Contents Abstract .................................................................................................................................... ii Preface ..................................................................................................................................... iii Table of Contents ................................................................................................................... iv List of Tables .......................................................................................................................... vi List of Figures ........................................................................................................................ vii List of Abbreviations .............................................................................................................. x Acknowledgements ................................................................................................................ xi Chapter 1 Introduction ....................................................................................................................... 1 1.1 Background ....................................................................................................................... 1 1.2 Thesis Objective ................................................................................................................ 3 Chapter 2 HDR Image Construction .................................................................................................. 5 2.1 Introduction ....................................................................................................................... 5 2.2 HDR Image Construction from a Single LDR Image ....................................................... 5 2.2.1 Tone Mapping .............................................................................................................. 5 2.2.2 Inverse Tone Mapping.................................................................................................. 6 2.3 HDR Image Construction from Multiple LDR Images ..................................................... 7 2.3.1 Algorithms Using Single Camera ................................................................................. 7 2.3.2 Algorithms Using Multiple Cameras ......................................................................... 11 2.4 Stereo Matching .............................................................................................................. 12 2.4.1 Disparity Map ............................................................................................................. 12 2.4.2 Local Stereo Matching Algorithm .............................................................................. 15 2.4.3 Global Stereo Matching Algorithm ............................................................................ 16 2.4.4 Adaptive Normalized Cross Correlation Function ..................................................... 17 2.4.5 Multi-view Multi-exposure Stereo Matching ............................................................. 19 Chapter 3 Constructing HDR Image from Two LDR Images ......................................................... 22 3.1 Overview ......................................................................................................................... 22 3.2 Disparity Map Computation ............................................................................................ 29 3.2.1 Imaging Model ........................................................................................................... 30 3.2.1.1 Gamma Correction ............................................................................................ 31 v3.2.1.2 Polynomial Camera Response ........................................................................... 32 3.2.2 Computing the Disparity Map .................................................................................... 33 3.2.2.1 Pixel Dissimilarity ............................................................................................. 33 3.2.2.2 Pixel Smoothness ............................................................................................... 35 3.2.2.3 Initial Disparity and Camera Response ............................................................. 36 3.2.3 Refinement ................................................................................................................. 40 3.3 Image Synthesis ............................................................................................................... 42 Chapter 4 Experimental Results ....................................................................................................... 45 4.1 Disparity Map Accuracy ................................................................................................. 45 4.1.1 True Disparity Maps ................................................................................................... 45 4.1.2 Disparity Maps Obtained Using Our Algorithm ........................................................ 45 4.1.3 Disparity Maps Obtained Using ANCC and Multi-view Multi-exposure Algorithms  53 4.2 Dynamic Range ............................................................................................................... 60 Chapter 5 Conclusion ....................................................................................................................... 61 Bibliography .......................................................................................................................... 65 Appendices ............................................................................................................................. 68 vi List of Tables Table 4.1 The root mean square error and percentage of invalid pixels in the final disparity maps using our algorithm ....................................................................................................... 58 Table 4.2 The root mean square error and percentage of invalid pixels in the final disparity map using ANCC as the matching cost function [10] ............................................................ 58 Table 4.3 The root mean square error and percentage of invalid pixels in the final disparity maps using multi-view multi-exposure algorithm[7] ............................................................. 59 vii List of Figures Figure 2.1 Epipolar geometry for the case of using two cameras to capture stereo images ... 13 Figure 2.2 Relationship between the disparity in the disparity map and the distance of the object from the image plan of the cameras ............................................................................. 14 Figure 3.1 It shows the input LDR images of the scene Clothes. In the first image has scattered areas of pixels with very low brightness. These patches are of relatively small size. The second image has small patches of saturated pixels in the center and right. ................... 23 Figure 3.2 This figure shows the input LDR images for the scene Dolls. This pair of images show larger areas of saturated and unsaturated pixels than those in the scene Clothes. In addition, the saturation and unsaturation occurs at the boundaries of the objects in the scene. For example, in the second image, the head of the central doll merged with the right arm of the bear above her. .................................................................................................................. 24 Figure 3.3 This figure shows the input LDR images for the scene Arts. This scene is more difficult to process than Clothes and Dolls. The head of the statue in the second image is saturated. The top part of the pen is also saturated in the second image. The two saturated regions are merged, making it hard to assign correct values at the edges of these two regions in the disparity map. In addition, it has thin objects such as the sticks to the right of the scene. Some parts of the copper kettle to the left of the image are also saturated (nearly white color) in the second image while the corresponding pixels in the first image are brown. ................ 25 Figure 3.4 This figure shows the input LDR image for the scene Baby. The radiometric difference between the two images is very large. In addition, because the background has high texture, it imposes additional challenge to the matching process because some of the textures shown in the left image is occluded by the baby and book in the right image. ........ 26 Figure 3.5 Our proposed scheme for HDR construction ........................................................ 27 Figure 3.6 Initial disparity map obtained by our algorithm after running step 1: Clothes ..... 38 Figure 3.7 Initial disparity map obtained by our algorithm after running step 1: Dolls ......... 38 Figure 3.8 Initial disparity map obtained by our algorithm after running step 1: Arts ........... 39 Figure 3.9 Initial disparity map obtained by our algorithm after running step 1: Baby ......... 39 Figure 4.1  Reference disparity map obtained using input images of the same exposure: Clothes .................................................................................................................................... 47 viii Figure 4.2 The final disparity map obtained using our algorithm: Clothes. The values at the erroneous pixels in the initial disparity map are successfully obtained by the refinement step. This is because the saturated regions in the input images are small and scattered. ................ 47 Figure 4.3 Reference disparity map obtained using input images of the same exposure: Dolls ................................................................................................................................................. 48 Figure 4.4 The final disparity map obtained using our algorithm: Dolls. Most of the pixels have the correct disparity value except the head of the left doll at the center. This is because the area is over exposed in the input images, causing the interpolation process in the refinement step to take the wrong value. ................................................................................ 48 Figure 4.5 Reference disparity map obtained using input images of the same exposure: Arts ................................................................................................................................................. 49 Figure 4.6 The final disparity map obtained using our algorithm: Arts. The disparity map is generally accurate and smooth except at the edges of different objects and the pencils. This is because it is very difficult to completely remove the fattening effect caused by using a window based cost function in stereo matching. .................................................................... 49 Figure 4.7 Reference disparity map obtained using input images of the same exposure: Baby ................................................................................................................................................. 50 Figure 4.8 The final disparity map obtained using our algorithm: Baby. The disparity map is not as smooth as the previous three maps. This is because of the large areas of under- saturated and over-saturated regions in the input images. However, the refinement step has still assigned correct values close to the true disparity values of the erroneous pixels in the initial disparity map, Fig. 3.13. ............................................................................................... 50 Figure 4.9 The tone-mapped reconstructed HDR image of Clothes. The image shows the details in the folds which are under-saturated in one input image. The image also shows the textures of the central piece of the cloth which are over exposed in the other input image. .. 51 Figure 4.10  The tone-mapped reconstructed HDR image of Dolls. The picture shows the detail of both dark and bright cloths the dolls are wearing ..................................................... 51 Figure 4.11 The tone-mapped reconstructed HDR image of Arts. The statue at the center and the pot on the left of the image are not saturated. It also displays the details in the dark region of the paint in the background wall. ........................................................................................ 52 ix Figure 4.12 The tone-mapped reconstructed HDR image of Baby. The picture displays both the words and equations in the book and the details of the maps on the wall. ....................... 52 Figure 4.13  The disparity maps of Clothes obtained using ANCC [10] ................................ 54 Figure 4.14  The disparity maps of Dolls obtained using ANCC [10] ................................... 54 Figure 4.15  The disparity maps of Arts obtained using ANCC [10] ..................................... 55 Figure 4.16  The disparity maps of Baby obtained using ANCC [10] ................................... 55 Figure 4.17 The disparity maps of Clothes obtained using multi-view multi-exposure algorithm [7] ........................................................................................................................... 56 Figure 4.18 The disparity maps of Dolls obtained using multi-view multi-exposure algorithm [7] ............................................................................................................................................ 56 Figure 4.19 The disparity maps of Arts obtained using multi-view multi-exposure algorithm [7] ............................................................................................................................................ 57 Figure 4.20 The disparity maps of Baby obtained using multi-view multi-exposure algorithm [7] ............................................................................................................................................ 57 xList of Abbreviations ANCC: Adaptive normalized cross correlation CCD: Charge coupled device CMOS: Complementary metal–oxide–semiconductor HDR: High dynamic range LoG: Laplacian of Gaussian LDR: Low dynamic range NCC: Normalized cross correlation. SAD: Sum of absolute difference xi Acknowledgements I owe my gratitude to my supervisor, Dr. Rabab Ward, who has always provided me good advice and academic guidance where needed. She was also generous on making herself available over the weekends because I started to work full time after attending three semesters at UBC. I thank Mansour Hassan for sharing his knowledge and ideas. The help and encouragement I received from him helped me to overcome the difficulties I encountered during my research. I also thank NSERC for funding my research in my first year and the UBC scholarship board for funding my research in my second year. Tanaya, Di, Anshual, Qiang, Jack, Xinyi and all the people from the Image Processing Lab. I wish to say thank you very much for the peer support and good company during my Master study. The friendly and positive environment kept me motivated in the past few years. Finally, I would like to thank my family for their support during the hard times of my thesis completion period and for their investment in my education.              Ning Sun The University of British Columnbia April 2011 1Chapter 1 Introduction 1.1 Background We now live in a world with overwhelming information. One of the most common medium we use to communicate and store the information in is that of images. It is very important that the images can accurately represent the details in the original scene in order to preserve all the information and avoid miscommunication. The first digital camera, a camera that uses a charged coupled device (CCD) imager and digitizes the captured scene and store the digital info on a standard cassette, was invented in 1975. Since then, the imaging technology has leaped into a new era by significantly extending the dynamic range in capturing real world luminance. However, the majority of cameras in the market nowadays are still incapable of capturing the entire dynamic range of a scene. For a scene, the dynamic range refers to the ratio between the brightest and darkest parts of the scene. The dynamic range of real-world scene can be quite high. Ratios of 100,000:1 is quite common in the natural world. Human visual system can process a simultaneous dynamic range of 50,000:1 or more and can adapt to a much larger range. The images captured by the cameras in the market now can only have dynamic ranges between 300:1 to 1,000:1.  Therefore, these images are considered as low dynamic range (LDR) images. High dynamic range (HDR) imaging provides the capacity to represent the wider dynamic range of the human visual system (HVS) in digital form. It assigns pixels with floating point values. The use of floating point values gives HDR images several advantages over LDR images. In an LDR image, areas that are too dark are clipped to black and areas that are too bright are clipped to white. In an HDR image, pixel values are normalized between 0.0 and 1.0 with 0.0 representing black and 1.0 representing white. Dark areas and bright areas in HDR images are assigned different values close to 0.0 and 1.0. Therefore, HDR images can preserve details in a scene with large dynamic range. The use of the floating point also gives HDR images perceptual cues which increase the apparent brightness. Another advantage of 2HDR images is that they preserve optical phenomena such as reflections, refractions and transparent materials such as glass. In LDR images, the pixels representing all the bright light sources in a scene such as the sun are assigned to have the maximum possible integer value. However, the reflected light should have value less than the light source. In HDR images, the reflected light is assigned with values close but less than 1.0, while the bright source can assume values that equal to 1.0. This allows reflections off surfaces to maintain realistic brightness for bright light source. Therefore, HDR images are able to better represent scenes perceived by human eyes. In the past, there has been a strong effort to develop high-dynamic-range (HDR) display and camera hardware, as well as the supporting processing algorithms. The idea of using several differently exposed images to capture details in the extreme range of luminance was pioneered by Gustave Le Gray in 1850s. He tried to render images showing both the sky and the sea. The use of high dynamic range imaging (HDRI) in computer graphics was first introduced by Greg Ward in 1985 with his open-source Radiance rendering and lighting simulation software which created the first file format to retain a high-dynamic-range image. However, due to limitations imposed by the computer processing power and displaying technologies in the past, HDR images started to gain wider usage only in the past decade. In 2005, Photoshop CS2 introduced the “Merge to HDR” function which combines LDR images of a still scene taken under different exposures at consecutive time instances to form an HDR image. This year, iphone 4 also introduced HDR photography functionality in iOS version 4.1. Besides a wider usage of HDR images, there is also an increasing demand for HDR videos Modern movies are often filmed with cameras featuring a higher dynamic range, and legacy movies can be upgraded even if manual intervention would be required for some frames. In addition, special effects, especially those in which real and synthetic footage are seamlessly mixed, require both HDR shooting and rendering. HDR video is also required in all applications in which capturing temporal aspects of changes in the scene is required with high accuracy. This is important in particular in monitoring of some industrial processes such as welding, predictive driver assistance systems in automotive industry, surveillance systems, 3to name just a few possible applications. HDR video can also speed up the image acquisition in all applications where a large number of static HDR images are required, as for example in image-based techniques in computer graphics. Finally, with the spread of TV sets featuring enhanced dynamic range, broadcasting HDR video will be important, but may take a long time to actually occur due to standardization issues. For this particular application, enhancing current LDR video signal and transforming them to HDR videos by intelligent TV sets seems to be a more viable near-term solution. Even though there are a number of products in the market labeled with HDR capabilities, the dynamic range, the accuracy and the tolerance to the movement in the scene of the output images and videos are far from satisfactory. The major difficulty is in developing the hardware for HDR capturing devices that can be widely available in the market. It is very expensive to manufacture sensors that can register contrasts beyond low dynamic range. Few studios have so far managed to develop HDR cameras. However, their solutions are expensive and require a long time to capture the full dynamic range. A few companies such as RED and Arri have been developing digital sensors capable of a higher dynamic range, but have yet to be released or made affordable. Therefore, there is a need for low cost solutions that can synthesize HDR content using only LDR capturing devices before the hardware obstacle can be overcome. 1.2 Thesis Objective As mentioned in the previous chapter, the HDR cameras with a single sensor may not be available in the market soon due to the limitation of the hardware. Efforts have been put to derive algorithms that can generate HDR images from LDR images captured by LDR camera. Most of the existing methods perform well for still scenes but fail for scenes with motion. Videos however have become a major source for communicating and storing information nowadays. There is an increasing demand for algorithms to synthesis HDR videos with reasonable computational complexity. In this thesis, we aim to develop a method that generates HDR images from LDR ones and that are better than existing methods in three aspects: 4x The algorithm should require a minimum number, namely two, LDR cameras so that the equipment is portable. x The algorithm should not be computationally intensive. This is one of the key factors for a technology to be available in the market x The algorithm should be able to construct more accurate images for different cases ranging from stationary scenes to scenes with fast motion. The algorithm can be applied easily to generate HDR videos by combine corresponding frames of the input LDR videos. Chapter 2 reviews the existing algorithms in constructing HDR images from LDR images. It also reviews the state-to-art technologies in stereo matching. This is because the most challenging and critical part in the algorithm presented by this thesis lies in finding accurate special shifts between corresponding pixels in the input LDR images. In particular, two algorithms, Adaptive Normalized Cross Correlation (ANCC) [9] and Multi-view Multi- exposure algorithm [7], are covered in details. They are the most relevant algorithms which inspired me in developing the algorithm presented by this thesis. Chapter 3 gives the explanation of our algorithm. It puts special emphasis on the image model and the cost functions used by our algorithm to find accurate spatial displacement between corresponding pixels in the input LDR images. Chapter 4 compares the experimental results obtained using our algorithm. It also provides the statistics of the HDR images constructed as compared with the HDR images generated using the two state-of-the-art algorithms that have inspired me. The conclusion and proposals for future work are discussed in Chapter 5. The appendix describes a filter used in our algorithm: the Census filter. 5Chapter 2 HDR Image Construction 2.1 Introduction In recent years, several approaches have been developed to produce scenes with expanded dynamic ranges using LDR images. These approaches can be categorized into three types depending on the number of cameras and the number of LDR images used to construct the HDR images: x Single camera with single LDR image. These approaches are most useful to recover the details in the old low quality images with valuable information x Single camera with multiple LDR images taken under different exposures at consecutive time instances. x Multiple cameras with multiple LDR images taken under different exposures at the same time instance. 2.2 HDR Image Construction from a Single LDR Image For still scenes, one approach is to use a single LDR image and computes the inverse tone mapping curve of the camera used to capture this image. A tone mapping function converts a high dynamic range to a low dynamic range. Various tone mapping curves that compress the pixel values in images with high dynamic ranges to images with low dynamic ranges have been developed. By finding the inverse of these tone mapping curves, the pixel values in an LDR image are converted back to the high dynamic range [1]. 2.2.1 Tone Mapping Tone mapping is a process that converts a set of high dynamic range signals to a set of lower dynamic range signals. It addresses the problem of strong contrast reduction from the scene values to the displayable range while preserving the image details and color appearance. This is important as, in general, viewers appreciate viewing the original scene content. Tone mapping is widely used in image processing and computer graphics to display image signals 6on various displays with limited dynamic ranges such as print-outs, CRT or LCD monitors and projectors. Tone mapping are achieved by applying tone mapping operators on the high dynamic range signals. Various tone mapping operators have been designed to reproduce visibility and the overall impression of brightness, contrast and color of the real world onto a limited dynamic range display. Tone mapping operators can be classified into two categories: global and local. Global tone mapping operators are not computationally intensive. An example of a global tone mapping operator is the exponential function proposed in [REFERECE 21 in FRAMEWO]:      Z Z / S / G/ S H    where  G/ S  is the compressed luminance value at pixel S .  Z/ S  is the world luminance at pixel S . Z/  represents the average luminance of the scene. Although global tone mapping operators have the advantage of low computational cost, they do not cope well with huge contract ratios. Local tone mapping operators have mapping functions that vary spatially depending on the neighbourhoods of pixels. Such operators concentrate on preserving contrast between neighbouring regions rather than absolute value. They are motivated by the fact that the human perception is most sensitive to contrast in images rather than absolute intensities. These operators are generally capable of compressing greater dynamic range. They usually also produce very sharp images, which preserve small contrast details very well. An example of a local tone mapping operator is the Gradient domain high dynamic range compression operator [18]. However, local operators often produce artefacts around the edges with high contrast. 2.2.2 Inverse Tone Mapping One fairly straight way to recover the HDR image from the tone mapped LDR image is to find the inverse tone mapping operators. A framework for such an approach is laid out in [1]. The framework consists of four steps: 7x Generate an initial HDR image by applying the calculated inverse tone mapping operator. The inverse tone mapping operators of many monotonically increasing tone mapping operators can be found easily. x Identify the areas of high luminance using methods associated with importance sampling of light sources such as median cut. x Create a map from density estimation of the area identified in step 2. This is to levitate the visual artefacts in the final HDR image due to the saturation in the bright areas in the LDR image. x The final HDR image is composed through weighted linear interpolation of the input LDR image with the inverse tone mapping operator found in step one. The weight is calculated based on the map found in step 3. Although this approach tries to solve the problem of the lost details in the saturated or coarsely quantized regions of the LDR image, it still cannot recover all the lost information in the areas. As a result, the constructed HDR images have errors and blurred regions, which are unacceptable for commercial use. However, these algorithms are very useful when upgrading old images and videos. 2.3 HDR Image Construction from Multiple LDR Images 2.3.1 Algorithms Using Single Camera A better approach than that of using inverse tone mapping operators to construct HDR images is to use a single camera to capture multiple LDR images but with different exposures and at consecutive time instances. Each LDR image registers one part of the entire dynamic range of the scene. For example, one LDR image can have very low exposure to capture the details in the bright regions. Another LDR image can have very high exposure to capture the dark areas. A final HDR image that spans the entire dynamic range can be generated by combining these LDR images [5]. Such an approach avoids the loss of information due to saturation. This problem arises when inverse tone mapping operators are used to construct HDR images. Such approaches generally consist of the following four steps: 8x Extract the luminance component of the input LDR images. This is because the change in the exposure only affects the luminance channel x Find the camera response function from the input LDR images. The camera response function is a radiometric model which represents the mapping of the radiance (falling on the camera sensor) to the luminance pixel values in the LDR images. x Obtain the inverse camera response function. Use the inverse camera response function to obtain the radiance maps of the LDR images. Combine the radiance maps to form the HDR image radiance map. x Construct the final HDR image by integrating the HDR image radiance map with the chrominance components in the input LDR images. The resultant integration forms the HDR image. The key step in such approaches is to find an accurate camera response function so that the pixel values in the LDR images can be correctly mapped to the radiance falling on the camera sensor. When the image of the scene is captured, the camera processes the amount of light (radiance) fallen on the sensor in two main steps, to convert the radiance to the luminance pixel values in the output image. In the first step, the camera compresses the dynamic range of the scene to the dynamic range that can be handled by the display system. In the second step, the compressed values are quantized to integer values. The integer values form the output as the pixel values in the image. This whole process can be represented by a non-linear function, called the camera response function. It is a function that represents how the radiance arriving on an image film/CCD, after passing through the lens, is transformed to actual pixel "brightness" values. The most popular function used by cameras for compressing the dynamic range of the scene is gamma correction function. Therefore, most algorithms use a gamma correction to model the camera response function. Various algorithms have been developed in the past to find an accurate camera response function. Most algorithms start with modeling the camera response function as a gamma correction function because most of the cameras use such a function to compress the dynamic range of the scene as explained above. Then they adopt the model suggested by Mitsunaga in 9[5]. Mitsunaga models the camera response function as a high order polynomial because it is proved that the gamma correction function can be approximated by a high order polynomial with satisfactory accuracy [5].  We have adopted this model in our algorithm presented in this thesis and will explain it in details in the next chapter. The algorithms that use a single LDR camera to capture multiple images at consecutive time instances and then construct HDR images work will for still scenes. Some of the algorithms have been implemented for commercial use, such as the HDR function in Photoshop. Many photographers also use this approach to synthesis HDR images. Such approaches however result in defects when there is motion in the scene. This is because the positions of corresponding pixels in different LDR images shift their locations in the area of an image having motion. If the HDR images are constructed by directly summing the pixels with the same coordinated in the left and right LDR images, errors appear at the areas with motion. Such a requirement limits the use of these algorithms because most of the scenes in the world have motion. It also restricts the use of these algorithms in constructing HDR videos from LDR ones. The static scene (i.e. that with no motion) requirement can be removed by designing special capturing systems. Beam splitters are used to generate multiple copies of the optical image detector whose exposure is pre-set by using an optical attenuator or by changing the exposure time of the detector. This approach is able to produce high dynamic range images in real time. Since the LDR images captured by such method are taken at the same instances, the scene objects are free to move during the capture process. However, such a set up is expensive. This is because it requires multiple image detectors, precision optics for the alignment of all the acquired images. It also requires additional hardware to process the LDR images. Another approach is to use a different CCD design [19]. In this approach, each detector cell includes two sensing elements (potential wells) of different sizes (and hence sensitivities). When the detector is exposed to the scene, two measurements are made within each cell. The camera also has a special chip with an algorithm that processes the signals and outputs the final HDR image directly. However, this technique is expensive as it requires a sophisticated 10 detector to be fabricated. In addition, the spatial resolution is reduced by a factor of two since the two potential wells take up the same space as two pixels in a conventional image detector. Moreover, the technique is forced to process the outputs of the two wells on the chip. This forces the algorithm to be simple because the chip power is limited. Therefore, it limits the accuracy of the results. A novel idea has been proposed to set up sensors that have spatially varying pixel exposures [2]. In this configuration, pixels in the image are grouped into a square of four.  Each of the four pixels has different exposures. This pattern repeats over the entire images. The key feature of this set up is that it simultaneously samples the spatial dimensions as well as the exposure dimensions of image radiance. If one pixel in the image is saturated, its surrounding pixels (which would be captured at different exposures) are unlikely to be saturated. When a pixel produces zero brightness (under-saturation because the area is dark and the exposure time is short), it is likely to have a neighborhood pixel that produces non-zero brightness. Since images are expected to be smooth, except at the edges, the unsaturated surrounding pixels can be used to interpolate the central saturated pixel. The camera response function is calculated from the valid pixel values. The pixel values with zero brightness and maximum brightness are recovered by interpolating its neighboring unsaturated pixels. The advantage of this set up is that it does not require the scene to be still nor it requires any on chip processing of the signals before outputting the image. The spatially varying exposure can be achieved by directly etching the pattern on the CCDs. It can also be achieved by lower cost solutions such as placing a mask outside the imaging lens next to the detector plane. A primary lens is used to focus the scene onto the mask plane. The light rays that emerge from the mask are received by the imaging lens and focused onto the detector plane. A diffuser may be used to remove the directionality of rays arriving at the mask. Then the imaging lens is focused at the diffuser plane. However, this setup would increase the cost of the LDR cameras as manufacturers have to redesign their production process to include the mask in the camera. In addition, it also reduces the effective resolution of the resulting HDR because some of the pixels with high exposure are expected to be saturated and some of the pixels with very low exposure are expected to produce low and 11 noisy intensities [2]. Due to the high cost and low effective resolution, this solution is also not suitable for manufacturing commercial cameras. Recently, another algorithm that uses a single LDR camera to capture multiple LDR images that are differently exposed has been proposed [3]. This algorithm use commercial LDR camera to capture images at consecutive instances and process the LDR images offline to generate HDR image. This algorithm distinguishes itself from other algorithm in following two aspects: x The capturing system is programmed to automatically vary the exposures under which consecutive LDR images are taken. x The offline algorithm calculates the motion vectors between consecutive images. The HDR image is formed by combining different LDR images after spatial adjustment [4]–[6] using the motion vectors. Since the algorithm calculates the motion vectors, it is not limited to still scenes and can be used to generate HDR videos. However, the post processing algorithm is computationally expensive. In addition, the resulting pictures have significant artifacts in those scenes with high motion because computing accurate motion vectors for scenes with fast motions is not easy [3]. 2.3.2 Algorithms Using Multiple Cameras Because constructing the fast motion scene in HDR images is difficult to achieve using a single camera, the use of multiple cameras to obtain multi-view, multi-exposed LDR images have been proposed [7]. Our algorithm also falls into this category. The two major advantages of such an approach are: x They use standard commercial LDR cameras. They do not require any special designs of CCDs. Therefore, the set up cost using such an approach is low and thus possible for commercial uses. x This approach takes LDR images at the same instance. No temporal adjustment (calculation of motion vectors) is required to find the corresponding pixels in the 12 different LDR images. Therefore, they are capable of construct HDR images for scenes with high motion. However, one challenge faced by such an approach is to find the spatial adjustment for corresponding pixels in the LDR images. This is because the LDR cameras are placed adjacent to each other. The pixels that register the same area in a scene appear at slightly different locations in the LDR images. It is important to correctly find the shifts for each pixel in the LDR images for the following two purposes: x To accurately calculate the camera response function. The camera response function is calculated by exploiting the difference in the values of corresponding pixels under different exposures. x To accurately construct the HDR images. The pixel values of the HDR images are usually calculated as a weighted sum of the corresponding pixels in the input LDR images. 2.4 Stereo Matching Stereo matching is a field in computer vision that has matured over the last few decades. It is mainly used to estimate the distance between the camera and the objects in the scene. Such distance is called the depth of the objects in the scene. In order to find the depth of the objects, more than one camera are aligned and placed next to each other. Images of the same scene are captured by these cameras. Since images are taken at slightly different locations, the positions of pixels representing the same objects shift their locations in these images. The pixels representing objects nearer to the camera have larger shifts than those pixels representing objects further away from the camera. 2.4.1 Disparity Map A convenient way of representing the depth of the objects is to plot an image, named disparity map. The horizontal (vertical) disparity is defined as a difference between the column (row) coordinates of pixel locations of corresponding pixels in the left and right 13 image of a stereo pair. In stereo matching, images are usually rectified first before calculating the disparity map so that the search for the corresponding pixels can be performed in one direction (either horizontal or vertical) only. This is because the cameras used to capture the stereo images are usually very hard to be aligned to have the exactly same image plane. Assume the cameras used to capture the stereo images can be approximated by the pin-hole camera model with positions (exaggerated) shown in Figure 2.1. /2 and 52  represent the focal points of left and right cameras. /3  and 53  represent the image plane of the left and right cameras. / /3 (  and 5 53 ( are called the epipolar lines of the left and right images. If the two lines are not in the same plane, the displacement of pixel register the point 3 in the right image relative to the left images is in both horizontal and vertical direction. This complicates the matching process because the algorithm has to search in both directions for each pixel in the left image to find the corresponding ones in the right images. In order to simplify the matching process, the image coordinates from the cameras can be transformed to emulate having a common image plane so that their epipolar lines coincide with each other. Namely,  / / 5 53 ( 3 ( . This process is called image rectification. As a result, for each point in one image, its corresponding point in the other image can be found by looking only along a horizontal line. Figure 2.1 Epipolar geometry for the case of using two cameras to capture stereo images In a disparity map, pixels representing the objects nearer to the camera are assigned greater values, and pixels representing the objects further away from the camera are assigned with 14 smaller values. Equation (1) shows such a relationship between the disparity value and the depth of the object.  The depth is inversely proportional to disparity and is obtained using the property of similarity triangles shown in Figure 2.2.  EI' G O  ሺͳሻ where  SL[HOVPHWHUO   ሺʹሻ ' represents the radial distance, which is the perpendicular distance between the object and the baseline of the two cameras. I represents the focal length of the camera. E is the distance between the two cameras along the baseline. G represents the displacement in terms of the number of pixels of the corresponding pixel between the two images. Figure 2.2 Relationship between the disparity in the disparity map and the distance of the object from the image plan of the cameras 15 As mentioned in the previous section, if we use multiple cameras placed next to one another to capture multiple LDR images under different exposures and at the same time instance to construct HDR images, one of the key step is to find an accurate disparity map for the input LDR images. All the stereo correspondence algorithms have a way of measuring the similarity of image locations in order to construct the disparity map. The function used to measure this similarity is called the matching cost. For each pixel in the image, the matching cost is for all the possible disparities. The disparity value that produces the smallest matching cost is chosen to be the value of this pixel in the disparity map. Depending on how this matching cost function is defined, stereo matching algorithms can be categorized into two categories: local and global. 2.4.2 Local Stereo Matching Algorithm Local stereo algorithms are window based. They usually aggregate the sum of a certain cost measure over a window and use it as the cost function. Simple cost functions are the sum of absolute differences (SAD), sum of squared differences, normalized cross correlation (NCC) or sampling-insensitive absolute differences [9]. More complicated cost functions involve applying filters such as Laplacian of Gaussian (LoG) filter, Rank filter and Census filter (explained in the Appendix) to the images first and then calculating the absolute differences (for the LoG filter) or Hamming distance (for Rank and Census filters). The latter functions are more robust to noise [9]. This is because of their ability to tolerate small differences between corresponding pixel values in different input images. The matching decision for a pixel (when using local stereo methods) is made solely based on its surrounding local pixels within the window the method uses. The main problem with such methods is how to determine the optimal size, shape of the window and the weight distribution of the pixels within the window relative to the central pixel. These restrictions cause the local stereo methods to form disparity maps with more errors. In addition, since the local stereo methods lack global information about the image, the disparity maps they generate tend to have less smoothness and less well defined edges. 16 2.4.3 Global Stereo Matching Algorithm Global stereo methods, in most cases, rely on Bayesian stereo matching. The relationships among all the pixels in the two input images are encoded into a global objective function which usually consists of a cost term and a smoothness term. Therefore, the global objective function avoids the optimization problems faced by local stereo matching methods, related to the size and shape of the matching window. Most global stereo methods share the following two common assumptions: x The stereo images are rectified (explained in the previous chapter) so that the disparity between two pixels in two images is in the horizontal direction. x The disparity map between two rectified images can be modeled as a Markov Random field (MRF). MRF is very powerful at modeling spatial relationships. The global objective function usually consists of an observation term and a smoothness term. The observation term usually represents the data in the images, while the smoothness term ensures the continuity in the disparity map. The disparity map is calculated by optimizing this global objective function usually via belief propagation or graph cuts [8]. Most local and global algorithms share the common assumption that corresponding pixels in the stereo images have similar values, i.e. they have small radiometric changes between corresponding pixels. Thus, these methods perform well on images of the same illumination and exposure, which is not the case in our present application. Global stereo matching methods are more robust to noise introduced by a camera during the image capture process. However, when the radiometric variations between the input images are large, very few algorithms (such as the adaptive normalized cross-correlation (ANCC) algorithm proposed in [10]) can successfully estimate accurate disparity maps [9].  Even fewer algorithms can tolerate the presence of saturated regions. ANCC has been shown to fail in saturated image areas [10]. As mentioned earlier, the LDR images in our setup may have large radiometric changes and saturated regions. 17 2.4.4 Adaptive Normalized Cross Correlation Function Recently, the adaptive normalized cross-correlation (ANCC) function was proposed as the matching cost function to find the disparity maps of stereo images captured under illumination and camera variations [10]. It uses a modified version of the standard matching cost function NCC and performs matching in the Log space to enable the algorithm to find disparity maps for the stereo pair which has the large radiometric changes between the corresponding pixels. Before deriving ANCC function, the paper [10], as most of other algorithms, models the camera response function as a gamma correction as discussed in the previous chapter:  ( ) ( ) ( )JU N c cC p p R p  ሺ͵ሻ where ( )C p  is the value at pixel p  in channel c . This value measures the amount of light ( )cR p , namely radiance, falling on the sensor of the camera. ( )pU  is a function that depends on the angle between the light direction and the surface normal at the point registered by pixel p. The motivation of the paper [5] is to propose a cost function that is independent of factors ( )pU  and k . This is achieved by transforming the pixel value in each channel through a series of steps we are going to discussed shortly. As a result, the algorithm can tolerate large radiometric changes in the input images. This is achieved in the following steps. First, the algorithm transforms the pixel values in R, G and B channels to the Log space. This is for the use of the invariant property of NCC to the affine transformation of pixel values in the images [19] [20]. Assume /N , 5N , /Y  and 5Y  are constants and /,  and 5, are the pixels values corresponding to each other in the left and right images , the invariant property of NCC enables us to use NCC as the cost function to match pixels in the left and right stereo images if the pixels values in images undergo the following transformation:  / / / /, , YNo   18  5 5 5 5, , YNo   Next, choromaticity normalization is used to eliminate the effect of light direction, namely the factor ( )pU . This is achieved by subtracting the average value of a pixel over three color channels from its actual value in each of the color channel. Assume ( )rR p , ( )gR p  and ( )bR p  are the values of the pixel p  in each of the channel R, G and B. rk , gk  and bk  are the factors defined in equation (3) in each of the channel R, G and B. The effect of factor ( )pU  can be eliminated by applying the following equation to the pixel values in each of the R, G and B channel.  '' ' '( ) ( ) ( ) C p C p C p  where ' ( )C p  is the value of pixel S in the color channel F  in the Log space. It is given by   ORJ ORJ ORJF F& S S 5 SU N J    and ' ( )C p  is given by           ORJORJORJ   5 * % 5 * %5 * % 5 3 5 S 5 S& S 5 S 5 S 5 SS JN N NU       After rearranging the log terms, the pixel values in the left image after transformation can be expressed as          ORJ ORJ 5 * % 5 / / 5 * % / / / / / 5 S& S 5 S 5 S 5 S S N JN N N D J    ƒ  Similarly, the pixel values in the right image after the transformation can be expressed as:    S 5 5 S5& S I S ID J  ƒ  Since the corresponding pixels in the left and right images register the same point in the scene,   / Sƒ  is expected to be the same as  5 SS Iƒ  . Using the invariant property of 19 NCC, we can use NCC as the cost function to match for   /& S  and    S5& S I  as well. This means that after the above transformations of the pixels values, the light direction and radiometric changes do not affect the matching result if NCC is used as the cost function. This modified version of NCC (after transforming the pixels values using above mentioned steps), named ANCC, is able to find disparity maps for the images with large radiometric changes. However, there is no interpolation process designed in ANCC function. It purely relies on the presence of textures to find the corresponding pixels in the input stereo images. As a result, ANCC cannot handle the stereo image with saturated areas [10] because the texture information is lost in the saturated areas and thus the matching process fails. 2.4.5 Multi-view Multi-exposure Stereo Matching Another approach that computes the disparity map from multiple images is proposed in [7]. The algorithm can handle large radiometric changes between the input stereo images as well as the presence of saturated regions in the stereo images. The algorithm can be summarized in following steps: x Compute the initial disparity maps from the unsaturated pixels in the multiple (more than two) input LDR images captured by multiple LDR cameras with the same camera response function. x To compensate for the difference in exposure among the input LDR images, compute the common camera inverse response function from the initial disparity maps. Then convert each input LDR images to its radiance map using this inverse response function. x Refine the initial disparity map by running step 1 again, using the radiance maps obtained in step 2. In order to improve the accuracy of the disparity map, the study in [7] focused on deriving a method that can estimate the camera inverse response function from pixels aligned incorrectly using inaccurate disparity maps. 20 The algorithm defines the camera response function as:      , S I H5 S  where I is a non-linear function that maps the scene radiance 5  at pixel S to the intensity value , when the image is captured under exposure H . It uses the Brightness Transfer Function (BTF) [21] to solve this camera response function. For a pair of stereo images, the BTF is defined as:   L M L M7 , ,  where L,  and M,  are the corresponding pixel brightness in the left and right images. The BTF is usually semi-monotonic. Depending on the exposure ratio L MH  between the two input images, following constrains are applied:        L M L M L M L M L M L M 7 , , H 7 , , H ­ d d°® t t°̄  To compute the BTF between the input images, the algorithm groups all the input images into pairs. For each pair of input images, it plots a two-variable joint histogram - , in which a give entry  L M- , ,  stores the corresponding pixel values in the two input images. The joint histogram is then partitioned into two triangles along the line L M, , . To maintain the semi-monotonicity of the BTF, the following conditions are applied:             L M L M L M L M XSSHUWULDQJH ORZHUWULDQJH L M L M L M L M XSSHUWULDQJH ORZHUWULDQJH 7 7 , , - , , - , , 7 , , - , , - , ,  ­ d t°® t d°̄ ¦ ¦ ¦ ¦  The algorithm then uses the dynamic programming technique proposed by Kim and Pollefeys [22] to estimate the BTF that satisfies the above conditions. Once the two BTFs for each pair of the input LDR images,  L M L M7 , ,  and  M L M L7 , , , are calculated, the algorithm 21 uses the EMOR model of Grossberg and Nayar [23] to find the inverse camera response function. When calculating the inverse camera response function, the above algorithm is more robust to large changes in the brightness of the corresponding pixels in the input images. As a result of a more accurate inverse camera response function, the radiance maps recovered in step 2 are expected to have fewer errors than the radiance maps recovered using other techniques. This in turn improves the accuracy of the disparity maps obtained in the refinement step by re-running the matching process used to obtain the initial disparity maps. However, this approach involves a complex capturing system because it uses more than two cameras. This increases the amount of work to align the cameras and to rectify the captured images. In addition, the input images are grouped into pairs and processed individually. This also increases the amount of the computations and thus limits its use in the industry. 22 Chapter 3 Constructing HDR Image from Two LDR Images 3.1 Overview In this chapter we present and describe a method that obtains an HDR image from two LDR images. In [7], a method that uses many (more than two) images obtained from different cameras is proposed. The images are all captured with different exposures and at the same instance of time. Our approach is inspired by the work in [7] where we propose the use of two cameras. Two images are captured using different exposures and at the same instance. The four sets of input LDR images we used to test our algorithm are shown in Fig. 3.1 to Fig. 3.4. The ones with the lower exposure register the information in the dark regions of the scene. The ones with higher exposure register the information in the bright regions of the scene. Each set of two images therefore complement each other in that they contain more information for forming a HDR image with a higher dynamic range. This is because one image has more details about the dark regions of the captured scene and the other contains more information about the bright regions. Thus more information is available than that obtained from one image. Using two cameras allows us to overcome the temporal adjustment in the fast motion scenes as the two LDR images are captured at the same instance of time .The use of more than one camera however has the problem of image spatial disparity. Thus the key step in such approach is the accurate computation of the disparity map, so that the pixels from the different LDR images can be aligned correctly to form the HDR image. This is a challenging task because the input LDR images have large radiometric changes as well as saturated regions. Not many studies that deal with such cases have been proposed. The four sets of the input LDR images we chose to test the performance of our algorithm impose different degrees of challenges for calculating the disparity map. Our algorithm is inspired by [7] but faces a greater challenge in computing accurate disparity maps. In [7], many (more than two) input LDR images are used to construct the HDR image. 23 If one pixel is saturated in one picture, it is unlikely that its corresponding pixels in the rest of the images are saturated. Therefore, the saturated pixels can be ignored. Instead, the pixels values in the rest of images are used calculate the disparity map and construct the HDR image. Figure 3.1 It shows the input LDR images of the scene Clothes. In the first image has scattered areas of pixels with very low brightness. These patches are of relatively small size. The second image has small patches of saturated pixels in the center and right. 24 Figure 3.2 This figure shows the input LDR images for the scene Dolls. This pair of images show larger areas of saturated and unsaturated pixels than those in the scene Clothes. In addition, the saturation and unsaturation occurs at the boundaries of the objects in the scene. For example, in the second image, the head of the central doll merged with the right arm of the bear above her. 25 Figure 3.3 This figure shows the input LDR images for the scene Arts. This scene is more difficult to process than Clothes and Dolls. The head of the statue in the second image is saturated. The top part of the pen is also saturated in the second image. The two saturated regions are merged, making it hard to assign correct values at the edges of these two regions in the disparity map. In addition, it has thin objects such as the sticks to the right of the scene. Some parts of the copper kettle to the left of the image are also saturated (nearly white color) in the second image while the corresponding pixels in the first image are brown. 26 Figure 3.4 This figure shows the input LDR image for the scene Baby. The radiometric difference between the two images is very large. In addition, because the background has high texture, it imposes additional challenge to the matching process because some of the textures shown in the left image is occluded by the baby and book in the right image. 27 In our set up, we only have two input LDR images. From these images the disparity map is calculated and then used to construct the HDR image. The saturated pixels in one image cannot be ignored when calculating the disparity map and constructing the final HDR image. Therefore, for the saturated regions in one image, we have to derive a method to get the correct disparity value using the unsaturated information from the other image and thus assign correct values to those pixels in the constructed HDR image. In order to increase the accuracy of the disparity map and recover the lost information in the saturated regions, in [7] the focus has been to improve the accuracy of the camera response function. Our algorithm however focuses on improving the initial estimate of the disparity map as well as the refinement of the initial estimate of the disparity map. Figure 3.5 illustrates a block diagram of our proposed scheme. Our algorithm differs from the algorithm in [7] in the several aspects. Figure 3.5 Our proposed scheme for HDR construction First, our method uses only the luminance values of the input LDR images to calculate the initial disparity map. Luminance is an indicator of how bright the surface appears. Usually the value of a pixel in an image is formed by luminance and chrominance information. In [7], the disparity map is calculated using both the luminance and chrominance information. Second, our method finds an initial estimate of the disparity map using the normalized cross correlation (NCC) function as the matching cost. We use a different weighting function use in [7] to calculate NCC for each pair of pixels to be matched. The weighting function improves the accuracy of the initial disparity maps. 28 Third, instead of using all the pixels in the initial disparity maps as [7], we use a subset of the pixel values in the LDR images. This subset of pixels consists of pixels are obtained as follows: We construct two initial disparity maps for the two input LDR images. The first uses the left image as the reference and the second uses the right image as the reference. The subset of pixels we are after consists of every pixel whose values in both disparity maps match with each other, i.e. have the same value. Since we exclude the inaccurate disparity values from the initial disparity maps, we can use a less computationally intensive algorithm than the one in [7]. In addition, to refine the initial estimate of the initial disparity map, we apply the census filter (covered in details in Appendix) to the radiance maps of the two input LDR images and use the Hamming distance as the cost function. This is because the radiometric variations (the difference in the luminance values) between the left and right radiance maps are less than the variations between the two input LDR images. Therefore, the stereo matching algorithm in the refinement step can be handled by a less computational expensive cost function than that used in estimating the initial disparity map. The refinement step is also designed to interpolate the values in the saturated regions using those in the unsaturated pixels in the input LDR images. Our algorithm improves on the algorithm in [7] in three aspects: x By using two cameras, our method simplifies the set up and reduces the computational complexity. Such setup also helps improve the estimation of the disparity map. The two major factors that reduce the accuracy in computing the disparity map are occlusion and radiometric variation. Our setup minimizes the problem of occlusion as the two cameras are placed in proximity. As a result, the objects captured by one camera are less likely to be blocked in the image captured by the other camera as the angles of views of the two cameras in our setup are almost identical. x Our algorithm is less computationally intensive. This is because our algorithm uses only the intensity channel, instead of R, G, B channels [7] to in the cost function when estimating the initial disparity maps. In addition, in the refinement 29 step, we apply Census filter to the radiance maps and then use Hamming distance between two pixels, which is faster to compute than NCC, as the matching cost to calculate the disparity map. In [7], the refinement step still uses NCC between two pixels as the matching cost to calculate the initial disparity map. x  As a result of a better refinement step, our algorithm generates the disparity map with less errors and better smoothness and in turn constructs an HDR image that exhibits fewer artifacts. 3.2 Disparity Map Computation In this paper, we propose a global stereo matching method to find the disparity map of two input LDR images that have 1) different exposures; 2) large radiometric changes; 3) areas with saturated regions. We model the matching problem via a Bayesian approach. In this approach, we use a vector  to represent the disparity pf ’s of every the pixel S  in the disparity map.  The optimal vector  should contain disparities pf ’s that minimize an energy function  defined as equation(4) [11]. The energy function is composed of a pixel dissimilarity term  and a pixel smoothness term . It measures the degree of difference between corresponding pixels and the discontinuities of the disparity map for the disparity vector .  ( ) ( ) ( )d sE f E f E f   ሺͶሻ  and  are  ( ) ( )d p p p E f D f ¦  ሺͷሻ  , ( ) ( ) ( , )s pq p q p q N p E f V f f   ¦  ሺ͸ሻ where  measures the summation of dissimilarities of all pixels in the disparity map and  controls, i.e. maximizing, the smoothness of the disparity map. The terms  and  in (3) are the disparity values corresponding to a pixel  and a neighboring pixels  that lies in a 30 window  1 S  centered at pixel S .  measures the cost associated with matching a pixels  in the left LDR image and   in the right LDR image.  is a function at pixel  in the left LDR image that measures the sum of the differences between the disparity ( ) at pixel  and the disparities  of every neighboring pixel T  of  in the window  1 S . Thus, minimizing the function  will maximize the smoothness in the disparity map at pixel . Before obtaining the energy terms  and , in the rest of this section, we will first discuss how the imaging model and the differences in lighting and exposure affect our choice of these energy terms in the rest of this section. 3.2.1 Imaging Model To determine the scene radiance from the measured pixel data, imaging models are used. Different imaging models have been presented in the literature. In this paper, we employ two models. The first model is the gamma correction model, equation (8) and (9) below, as many other methods mentioned in the previous chapter. This model approximates the image luminance values in terms of the scene radiance. The second model the polynomial model, equation (10). This model is used to approximate the inverse of first model. It obtains the scene radiance at a pixel in terms of its luminance values in the image. We use the relationship between the pixel luminance values and the scene radiance, and the first gamma correction model to obtain a suitable cost function for the dissimilarities in the disparity map . This cost function is used to find an initial disparity map. Then we use the initial disparity map to compute the coefficients in the second polynomial model [5] so that we can obtain the scene radiance from the image intensities. Finally, we refine our initial disparity estimate of the true disparity map using the scene radiance estimated from the polynomial model. 31 3.2.1.1 Gamma Correction Every camera defines and quantizes an estimate of the scene radiance  explained in the previous chapter. In a stereo setup, both images are obtained from the same scene radiance but the recorded intensities have horizontal shifts in the pixel locations of both images. The image intensities recorded by a camera can be modeled as scaled gamma corrections of the scene radiance :       F F, S S N5 S JU  ሺ͹ሻ This is the same model used in method proposed in [5] to derive ANCC function.  is the value at pixel  in the color channel . It is approximated as a scaled gamma correction of the radiance , i.e., the amount of light that falls on the sensor of the camera corresponding to pixel .  is dependent on the angle between the light direction and the surface normal at the point registered by pixel  .  is the correction factor employed by the camera response curve to convert the amount of light  fallen on the camera sensor to image intensities. In our set up, the two cameras are place very close to each other. As a result, the angles between the light directions and the surfaces normal to the corresponding pixels registered by the two cameras are to be almost the same. This means that the above gamma correction model defined by equation (7) can ignore the factor  and N . As a result, the chrominance channels of the input LDR images are expected to have little difference. The only difference is introduced by different exposures under which the images are taken. Since the exposure only affects the luminance values of the pixels in the image, we perform stereo matching using the information in the luminance channels of the input LDR images only. Therefore, the imaging model at pixel  is reduced to the following expressions [6]:  lI RJ  ሺͺሻ  ( )rI eR J  ሺͻሻ 32 where  and  are the left and right image intensities,  is the exposure ratio between the left and right images. The  factor can be eliminated by converting the luminance value of each pixel to the log space. 3.2.1.2 Polynomial Camera Response In [3], different camera response functions that model the relationship between the radiance fallen on the senor and the luminance value at a pixel in the image, are compared. It shows that the response curve can be modeled as an nth order polynomial function [7] of the pixel values I in the image.  ( ) nn n R I c I ¦  ሺͳͲሻ where ( )R I  represents the radiance fallen on the sensor at a pixel with intensity value I . The study also showed that it is sufficient to use  to build an accurate model to find the values of the radiance from the pixel intensities. To estimate the values of , we obtain two disparity maps  and  using the left and the right input LDR images as the reference respectively. For our algorithm, only those pixels in the two disparity maps that match each other correctly are used in obtaining the coefficients ,  i.e. those pixels that satisfy equation (11):  ( ) ( )l l r rDisp p Disp p  ሺͳͳሻ where ( )l lDisp p  is the value of a pixel lp  in the initial disparity map calculated using the left image as the reference. ( )r rDisp p  is the value of the pixel rp  in the initial disparity map calculated using the right image as the reference. lp  and lp  are the corresponding pixels in the input LDR images. Their relationship is given by:  ( )r l lp p Disp p   33 We also exclude the pixels with values equal to 0 so as to minimize the error introduced by under-saturation. The polynomial coefficients  are then found by minimizing the cost function:  2 ( ) ( ) ( )n nn n l n r p P n n J c c I p e c I p  ª º « »¬ ¼¦ ¦ ¦  ሺͳʹሻ where P  is the set of valid pixels defined by equation (11), nc  are the polynomial coefficients, and e  is the exposure ratio between the two images. 3.2.2 Computing the Disparity Map The disparity map represents the integer values that characterize the lateral displacement of pixels of an object in the left image compared to its position in the right image. This map is represented by the vector  and SI  is an element in the vector . To compute the vector , we minimize the energy function  defined in equation (4). However, we must first define the dissimilarity term  and the smoothness term . 3.2.2.1 Pixel Dissimilarity We choose the normalized cross correlation (NCC) as the pixel similarity measure. In [7] and [9], it is shown that NCC is the best cost function that copes with exposure variations. Before computing NCC, we convert the image pixel intensities to the log space so that the exposure ratio  does not affect the results  ' log logl lI I RJ   ሺͳ͵ሻ  ' log log logr rI I e RJ J    ሺͳͶሻ For a pixel p  whose corresponding disparity is pf , NCC is given by the following expression: 34      22 , , ( ) ( ) ( , ) , ( ) , ( )    u    ¦     l r p t l r p p l l r p t r p w p t w p f t f I p I p f NCC p f w p t I p w p f t f I p f  ሺͳͷሻ where pf f  is the disparity of pixel p , lw  and rw  are bilateral weights defined over a window ( , )W p t  centered at pixel p  in the left image and ( )pp f  in the right image respectively, and lI  and rI  are functions of the pixel values in the left and right images respectively which we will define below. W is a neighbouring pixel of the central pixel p in the window ( , )W p t . Notice that the effects of J  and e  in equation (13) and (14) are cancelled in equation (15). The fattening effect is one of the major drawbacks of the matching cost function that use windows [12]. The errors caused by the fattening effect occur all over the image, especially at the edges objects with strong depth discontinuity. This is because the central pixel of a window tends to inherit the disparity of the more contrasted pixels in the block [24]. As a result, at the edges of objects that have different distances from the camera, a brighter object tends to appear larger than its actual size. In our algorithm, we include bilateral weights ( , )lw p t  and ( , )rw p t  in NCC, equation (15) to reduce the fattening effect. The bilateral weights are functions of the pixel p and its neighbouring pixels W . The weight function is given by the following expression:  22 ' ' 2 2 ( ) ( ) ( , ) exp 2 2V V ª º« »  « »¬ ¼d s I t I pp t w p t  ሺͳ͸ሻ where  and  are the space and range smoothing parameters, respectively. t  represents a pixels surrounding the central pixel p  in the window ( )W p . The first term in the exponent measures the spatial difference between pixels p and t . The second term measures the difference in the intensities registered by pixels p and t . 35 The NCC is effective at finding similarities in highly textured surfaces. Therefore, we subtract the low frequency image components before performing the similarity matching. The functions I  are then chosen so that:     ' ( , ) ( , )' ( , ) ( , ) ( , ) ( , ) log log ( , ) ( , ) J    ª º« »   « »« »¬ ¼ ¦ ¦ ¦ ¦ l t W p t t W p t l l t W p t t W p t w p t I p w p t R p I p I p R p w p t w p t  ሺͳ͹ሻ Similarly,      ( , ) ( ) ( , ) ( , ) ( , )(log log ) (log log ) ( , ) ( , ) log log ( , ) J J     ª º« »  « »« »¬ ¼ ª º« » « »« »¬ ¼ ¦ ¦ ¦ ¦  t W p t r t W p t W p t t W p t w p t e R p I p e R p w p t w p t R p R p w p t  ሺͳͺሻ Note that equation (17) and (18) show that the NCC measure when calculated using I  is unaffected by J  and e . As NCC measures the similarity between pixels  O, S  and  U, S , the dissimilarity term can then be expressed as follows:    ( ) ( ) 1 ,d p p p p p E f D f NCC p f  ¦ ¦  ሺͳͻሻ 3.2.2.2 Pixel Smoothness In stereo imaging, the pixels representing the same solid object should have similar disparity values. Therefore, the disparity map should be smooth within area corresponding to solid objects. The potential function pqV  in the smoothness term ( )sE f  in equation (6) can be a quadratic function, a delta function, a truncated quadratic function or an even more complicated function. We express the smoothness term ( )sE f in terms of a total variation 36 function pqV  regularized by weights ( , )p qO  which are calculated using the perceptually uniform CIELab color space. Denote a pixel that falls within a neighbourhood window W centered at pixel p  by ( )q W p . The variation term pqV  is expressed as follows:       2 max, , min ,pq p q p qV f f p q f f VO   ሺʹͲሻ where maxV  is the maximum upper bound on the smoothness. The upper boundary is introduced so that all the discontinuities in the image have the same reasonable potential instead of having different big values that blows up the function. The regularizing parameter  ,p qO  is given by:  2 2 22 2 2 2 2, exp 2 2 2 2 L L a a b b s r r r I p I q I p I q I p I qp q p qO V V V V ª º  « »    « »¬ ¼ ሺʹͳሻ where , ,L a bI I I  are the CIELab color space components that best represent human perception of colors among all the available color spaces. Recent studies have shown that by grouping pixels with similar colors before matching can improve the accuracy of the resulting disparity map. Instead of segmenting images using computational intensive algorithms, this grouping can be simply coded by including bilateral weights, equation (21), in the smoothness term [13]. Introducing such bilateral weights in the smoothness term forces the resulting disparity map to agree with the color discontinuities in the reference image. The final smoothness term is expressed as follows:   , ( , )O   ¦ ¦s pq p q p q N p E f p q V f f  ሺʹʹሻ 3.2.2.3 Initial Disparity and Camera Response The total energy function given in equation (4) is then minimized using the graph cut algorithm [9, 14] to produce the initial disparity estimate. The resulting disparity map contains errors mainly in the over-exposed and under-exposed regions of the images. To obtain estimates for these regions, we calculate two disparity maps. The first has the left image as the reference and the second has the right image as the reference. Then we cross 37 validate the resulting maps. The pixels in the two disparity maps that match are treated as valid disparities. The remaining pixels are marked as erroneous and are represented by black pixels for further correction. The initial disparity map is formed of the valid disparities and the erroneous black pixels. Figure 3.6 to Figure 3.9 show the initial disparity map obtained after matching the left and right disparities for the images: Cloths, Dolls, Arts and Baby. The matched pixels in the two disparity maps are considered as valid disparity values. Therefore, only these pixels are used to compute the coefficients in the polynomial model of the camera inverse response function using the algorithm in [5]. The coefficients  are found by minimizing the cost function given by equation (12). 38 Figure 3.6 Initial disparity map obtained by our algorithm after running step 1: Clothes Figure 3.7 Initial disparity map obtained by our algorithm after running step 1: Dolls 39 Figure 3.8 Initial disparity map obtained by our algorithm after running step 1: Arts Figure 3.9 Initial disparity map obtained by our algorithm after running step 1: Baby 40 3.2.3 Refinement After finding the coefficients of the inverse polynomial camera response function, equation (10), we use this inverse polynomial camera response function to convert the left and right images into their respective radiance maps in radiance space . The radiance values of the brighter input image are also multiplied by the exposure ratio between the two input images so that the radiance of the corresponding pixels in the left and right images has the same value. We use the two resulting radiance maps to correct the erroneous pixels identified in the initial disparity map. This is done using the following interpolation method. We formulate the interpolation problem as a minimization problem of an energy function defined by equation (4).  In this step, we define a different pixel dissimilarity cost . In addition, the values of some entries in vector I  are already known. The known pixels are the pixels with the disparity values that match each other, obtained as the initial estimates when each of the left and the right LDR images was used as the reference. Therefore, we force the solution of the new energy function to pick up the same value at these pixels. This is achieved by assigning the minimum possible cost 0 to each pixel  at the desired value and a large cost at the disparity values other than this known value. Namely, assume is the valid known disparity at a pixel , then  0 p p p p p p if f f D f K if f f ­  ° ® z°̄    ሺʹ͵ሻ where  is a large number. This means that our refinement step finds the vector  that minimizes the new energy function and this vector has some of its values as already determined. The smoothness term in the energy function ensures that similar disparity values are assigned to pixels representing the same object. Therefore, an accurate disparity value that was determined in the initial 41 estimate for a pixel is expected to propagate to the neighboring pixels that belong to the same object but had erroneous values in their initial estimates. For the erroneous pixels in the initial disparity map, instead of using equation (19) as the cost function for stereo matching before, we define the cost function as the sum of two terms: the differences between the radiance of the corresponding left and right pixels and a Hamming distance, i.e.         , , ,p p l r p p p l rD f R p R p f C f W p R R        (24) where lR  is the radiance map of the left LDR image after applying the Census transform [15] over the window  W p . lR  is the radiance map of the right LDR image after applying the Census transform [16] over the window   pW p f  .   , , ,p p l rC f W p R R   is a cost function that calculates the Hamming distance between pixel p in the left radiance map lR  and pixel pp f  in the right radiance map. The Census transform is a non-parametric summary of local spatial structure. Let  be a pixel with intensity  and let  be the set of pixels in a square neighborhood of diameter surrounding pixel . The census transform  maps the pixels in  to a bit string representing the set of neighboring pixels whose intensity is less than  , the value of the central pixel . The value of a neighboring pixel in the bit string is set to 1 if its intensity is less than  and to 0 if its intensity is greater than  . Such a transform tolerates the presence of small variations in the values of the corresponding pixels in the left and right radiance estimates. This is because the variations are unlikely to alter the relationship between the surrounding pixels and the central pixel. Detailed explanation of Census transform is covered in the Appendix. 42 We apply the Census filter to the left and right radiance estimates. Then we add the Hamming distance to the cost function in the error correction step for the following two reasons: x As a non-parametric local image transformation, the Census filter differs from other operations in that it only responds to the relative ordering of intensities instead of the intensities themselves. This makes it robust to the variations in intensities between the two input images in our setting. x The speed of the algorithm: after the Cencus filter is applied, the image patches corresponding to the two LDR radiance maps become two strings containing only binary numbers. The correlation between the two patches is calculated from the Hamming distance which is the number of bits the have different values between the two binary strings. Binary operators greatly reduce the computational complexity in determining the correlation between two image patches. Notice that the new dissimilarity cost function pD  for erroneous pixels is composed of two terms. The first term ensures a smooth transition across object boundaries in the radiance map, while the second term ensures that pixel locations are accurately matched. However, strict disparity matching can cause edge artefacts in the final HDR image which result from occlusions in the stereo setup. Therefore, we enforce smooth transitioning in the radiance map so as to remove any possible artefacts that may arise. Finally, in order to speed up the minimization process, we bound the search range of feasible disparity values by the minimum ,minvf  and maximum ,maxvf  valid disparity values found in the initial disparity estimate Ц5 , such that ,min ,maxˆd dv vf f f . 3.3  Image Synthesis Once the disparity map and the coefficients  in equation (10) are computed, the left and right images can be fused into a single HDR radiance image [3]. This is achieved by the following three steps: 43 1) we construct the radiance maps of the two input LDR images. Each of the pixel S in the input LDR images is converted to a radiance value that represent the amount of light fallen onto the sensor at pixel S  using the inverse camera response function defined by equation (10). 2) We normalize the values in the two radiance maps by dividing the values in the right radiance map (with greater exposure) by the exposure ratio between the two input LDR images. According to the image model defined by equation (8) and (9), the amount of light fallen on the camera sensor depends on both the scene radiance and the exposure value used to capture the image. Therefore, the two radiance maps obtained in the first step differs by a scale equal to the exposure ratio between the two input LDR images. 3) We assign pixels values in the HDR image as a weighted average of the corresponding pixels in the two LDR images:       O O U U+'5 O U Z 5 S Z 5 S, S Z Z    In order to determine the weights used to calculate the pixel values in the HDR image, we put great emphasis on the signal-to-noise ratio (SNR). A good estimate of the radiance in the HDR image should maximize both the SNR and the sensitivity to radiance changes. The SNR for the scaled radiance value is given by:      'N R I SNR I R IV  ሺʹͷሻ where,  N IV  is the standard deviation of the measurement noise and can be assumed to be independent of I  in our case. In addition, a large value  R I  indicates that the sensor is set at the right exposure to detect the changes in the scene radiance. Therefore, we define the weighting function used to combine the input radiance maps as    ' R I w R I    This weighting function ensures that for the two corresponding pixels in the input LDR images, the pixel with a greater SNR and sensitivity to radiance change is assigned a greater 44 weight than the other pixel. As a result, the pixel value that contains clearer scene information and less noise in the input LDR image determines the value of this pixel in the constructed HDR image. 45 Chapter 4 Experimental Results 4.1  Disparity Map Accuracy We tested our algorithm using stereo images provided by Middlebury College [16]. The images provided are rectified so that we only need to search along the horizontal line for each point in the image when finding disparity maps. Before we can evaluate the quality of the disparity maps calculated using our algorithms, we have to find the true (ideal) disparity maps for the four scenes, Figure 3.1 to Figure 3.4, as the standard. The reference disparity map for each scene in Figure 3.1 to Figure 3.4 accurately represents the horizontal displacement of each pixel in the higher exposed image using the less exposed image as the reference. 4.1.1 True Disparity Maps In this paper, we present the results for the scenes Arts, Dolls, Baby and Clothes using input LDR images shown in Figure 3.1 to Figure 3.4. Figure 4.1, Figure 4.3, Figure 4.5 and Figure 4.7 show the reference disparity maps of the scenes Clothes, Dolls, Arts and Baby we used to test the performance of our algorithm. The true disparity maps are calculated using our algorithm when the input LDR images are captured under the same exposure. 4.1.2 Disparity Maps Obtained Using Our Algorithm The disparity maps for the four scenes obtained by using our algorithm are presented in Figure 4.2, Figure 4.4, Figure 4.6 and Figure 4.8. The input LDR images of Clothes and dolls, shown in Figure 3.1 and Figure 3.2, contain small saturated areas scattered over the images whereas the input LDR images Arts and Baby, shown in Figure 3.3 and Figure 3.4, have large areas of both under-saturated and over-saturated pixels. In our experiments, the size of the window used by our algorithm is (5 5)u  pixels. The window size is chosen to be small in order to reduce the amount of computation for the NCC 46 for every disparity value under consideration. After running the algorithm several times with different parameters, we found the following parameters give optimal results. The standard deviations s  and r  used in calculating the bilateral weights, equation (16) when estimating the initial disparity map are set to 2.6  and 14.0 . The standard deviation s  and r  used in calculating  S TO , equation (21), in the potential function  ,pq p qV f f , equation (20) are set to 2.6  and 16.0 . Comparing the disparity maps obtained by our algorithm, shown in Figure 4.2, Figure 4.4, Figure 4.6 and Figure 4.8 with the true disparity maps, shown in Figure 4.1, Figure 4.3, Figure 4.5 and Figure 4.7, we obtain the following conclusions: x If the input images contain scattered saturated regions of small areas, such as in the case of the scenes Clothes and Dolls, shown in Figure 3.1 and Figure 3.2, the disparity maps obtained by our algorithm, shown in Figure 4.2 and Figure 4.4, follow closely the true disparity maps, shown in Figure 4.1 and Figure 4.3. x If the input LDR images have large saturated regions such as the case of the scenes Arts and Baby, shown in Figure 3.3 and Figure 3.4, the final disparity maps obtained using our algorithm, shown in Figure 4.6 and Figure 4.8 has discernible difference from their true disparity maps, shown in Figure 4.5 and Figure 4.7. x The error pixel values in the disparity maps obtained by our algorithm have little effect on the constructed HDR images for Cloths, Dolls, Arts and Baby, shown in Figure 4.9 to Figure 4.12. This is because of the first term in the cost function (24) in our refinement step ensures that the pixels in one input LDR radiance map are mapped to the pixels in the other LDR radiance map with similar radiance. This is the case even for pixels with erroneous disparity values. x The constructed HDR images show in Figure 4.9 to Figure 4.12 have a higher dynamic range than either of the input LDR images for the scenes in Figure 3.1 to Figure 3.4. The constructed HDR images in Figure 4.9 to Figure 4.12 clearly display the details in both dark and bright regions of the scenes in Figure 3.1 to Figure 3.4. 47 Figure 4.1  Reference disparity map obtained using input images of the same exposure: Clothes Figure 4.2 The final disparity map obtained using our algorithm: Clothes. The values at the erroneous pixels in the initial disparity map are successfully obtained by the refinement step. This is because the saturated regions in the input images are small and scattered. 48 Figure 4.3 Reference disparity map obtained using input images of the same exposure: Dolls Figure 4.4 The final disparity map obtained using our algorithm: Dolls. Most of the pixels have the correct disparity value except the head of the left doll at the center. This is because the area is over exposed in the input images, causing the interpolation process in the refinement step to take the wrong value. 49 Figure 4.5 Reference disparity map obtained using input images of the same exposure: Arts Figure 4.6 The final disparity map obtained using our algorithm: Arts. The disparity map is generally accurate and smooth except at the edges of different objects and the pencils. This is because it is very difficult to completely remove the fattening effect caused by using a window based cost function in stereo matching. 50 Figure 4.7 Reference disparity map obtained using input images of the same exposure: Baby Figure 4.8 The final disparity map obtained using our algorithm: Baby. The disparity map is not as smooth as the previous three maps. This is because of the large areas of under-saturated and over- saturated regions in the input images. However, the refinement step has still assigned correct values close to the true disparity values of the erroneous pixels in the initial disparity map, Fig. 3.13. 51 Figure 4.9 The tone-mapped reconstructed HDR image of Clothes. The image shows the details in the folds which are under-saturated in one input image. The image also shows the textures of the central piece of the cloth which are over exposed in the other input image. Figure 4.10  The tone-mapped reconstructed HDR image of Dolls. The picture shows the detail of both dark and bright cloths the dolls are wearing 52 Figure 4.11 The tone-mapped reconstructed HDR image of Arts. The statue at the center and the pot on the left of the image are not saturated. It also displays the details in the dark region of the paint in the background wall. Figure 4.12 The tone-mapped reconstructed HDR image of Baby. The picture displays both the words and equations in the book and the details of the maps on the wall. 53 4.1.3 Disparity Maps Obtained Using ANCC and Multi-view Multi-exposure Algorithms Figure 4.13 to Figure 4.16 show the disparity maps obtained using ANCC [10]. Figure 4.17 to Figure 4.20 show the disparity maps obtained using the multi-view multi-exposure algorithm [7]. In this paper, we evaluate the performance of the multi-view multi-exposure algorithm by using three LDR images captured under different exposures as the input for the scenes Clothes, Dolls, Arts and Baby. The disparity maps of Clothes and Dolls have fewer errors because the saturated regions in these images are scattered small patches. The smoothing term in the energy function is successful in propagating the correct disparity values to the saturated pixels from their surrounding unsaturated pixels. However, the disparity maps of Arts and Baby contain significant errors because both matching cost functions in [7] and [10] do not have any term in the energy function designed specifically to propagate the correct disparity values for pixels in the saturated areas from the disparity values of the surrounding unsaturated pixels in the input LDR images. As a result, when the input LDR images have large saturated regions, instead of assigning correct disparity values to saturated pixels, the smoothing term tends to propagate the erroneous disparity values and thus assign wrong disparity values at pixels surrounding the saturated regions. Compared to the disparity maps for Arts obtained using the algorithms proposed in [7] and [10], the disparity map we computed has fewer errors and better smoothness. Our calculated disparity maps compared to those obtained using algorithms in [7] and [10] are presented in Table 4.1 to Table 4.3. The root mean square error (RMSE) of invalid pixels and the numbers of error pixels in the disparity maps obtained by our algorithm are significantly less than those obtained using algorithms in [7] and [10]. The percentage of error pixels in the disparity maps obtained by our algorithm is on average 16% less than that in [7] and 20% less than that in [10]. 54 Figure 4.13  The disparity maps of Clothes obtained using ANCC [10] Figure 4.14  The disparity maps of Dolls obtained using ANCC [10] 55 Figure 4.15  The disparity maps of Arts obtained using ANCC [10] Figure 4.16  The disparity maps of Baby obtained using ANCC [10] 56 Figure 4.17 The disparity maps of Clothes obtained using multi-view multi-exposure algorithm [7] Figure 4.18 The disparity maps of Dolls obtained using multi-view multi-exposure algorithm [7] 57 Figure 4.19 The disparity maps of Arts obtained using multi-view multi-exposure algorithm [7] Figure 4.20 The disparity maps of Baby obtained using multi-view multi-exposure algorithm [7] 58 Table 4.1 The root mean square error and percentage of invalid pixels in the final disparity maps using our algorithm  Exposure Ratio RMSE Error Percentage Clothes 4 0.9934 8.23 16 0.976 8.82 Dolls 4 0.8454 4.77 16 0.8591 5.58 Arts 4 1.5459 7.43 16 1.1556 8.15 Baby 4 1.432 9.42 16 1.4642 10.13 Table 4.2 The root mean square error and percentage of invalid pixels in the final disparity map using ANCC as the matching cost function [10]  Exposure Ratio RMSE Error Percentage Clothes 4 20.218 29.0 16 21.268 29.2 Dolls 4 15.689 14.4 16 16.512 15.3 Arts 4 10.216 9.35 16 11.653 9.72 Baby 4 28.265 38.0 16 28.044 37.6 59 Table 4.3 The root mean square error and percentage of invalid pixels in the final disparity maps using multi-view multi-exposure algorithm[7]  Exposure Ratio RMSE Error Percentage Clothes 4 25.705 53.3 16 25.798 53.7 Dolls 4 19.983 32.8 16 20.385 33.7 Arts 4 20.665 31.6 16 20.989 32.1 Baby 4 46.767 88.9 16 48.246 90.3 60 4.2 Dynamic Range The quality of an HDR image is determined not only by the accuracy of the pixel values, but also by the range of the scene radiance it can capture. The dynamic range of constructed HDR images depends on the number of input LDR images and the exposure values of the input images. It is bounded by the lowest radiance value in the image with the shortest exposure and the highest radiance value in the image with the greatest exposure. The larger the number of the input LDR images and the greater the difference among their exposure values, the larger the dynamic range of the scene radiance that can be captured by the reconstructed HDR image. The computational complexity however, increases exponentially with the increase in the input LDR images. Reducing the computational complexity is crucial to transfer a technology from the lab to the market. This is also the reason why we limit our input LDR images to two. In addition, Figure 4.2, Figure 4.4, Figure 4.6 and Figure 4.8 show that a large exposure ratio between the two input LDR images results in a decrease in the accuracy of the disparity map generated by our algorithm. This is because when the exposure ratio between the two input images is large, the amount of the overlapping radiance ranges between the two input LDR images decreases. This results in the deviation of the calculated camera inverse response function from its true value increases. Therefore, the exposure values of the input LDR images should be chosen carefully so that the dynamic range of the reconstructed HDR image is maximized and the error in the calculated camera inverse response function is minimized. One way to find the optimal exposure values for a scene is to minimize an objective function that is based on the derivative of the camera response function [17]. In order to reduce the amount of calculation during the runtime, the exposure ratios of the different kinds of camera response function can be pre-calculated and stored in a table. For the cameras whose inverse response functions are modeled by a Gamma curve, equation (3), the optimal ratio of the exposure values is 1:3.094 [17]. 61 Chapter 5 Conclusion In this thesis, we address the problem of obtaining a HDR image using two LDR images. The two images should be captured at the same instance by the two cameras, which should be placed close to each other. The images should have different exposures. One image captured under lower exposure registers the bright part of the scene, i.e. it does not have over-saturated but under-saturated areas. The other image captured under higher exposure register the details in the dark part of the scene, i.e. it does not have under-saturated but over-saturated areas. We presented a novel algorithm that calculates the disparity map of two differently exposed LDR images to generate the HDR images. Compared to existing methods, our algorithm has several advantages. In many stereo matching algorithms, such as the ANCC [10] and the Multi-view, Multi- exposure [7] algorithms we discussed in detail how the three R, G and B channels are used in the matching cost functions to find the disparity map of the input LDR images. These algorithms are developed for the case when the multiple input LDR images are taken by multiple cameras. Under such set up, the cameras at the two ends are not placed in proximity. The angle between the light and the norm of the surface, i.e. the factor  SU  in equation (3), affects the amount of light fallen on the sensor of cameras placed further apart from each other. This means that the amount of chrominance form one channel fallen on one of the input LDR images are not the same as that fallen on another input LDR images. As a result, most stereo matching algorithm such as the ones proposed in [7] and [10] cannot use the simplified camera model as is the case in our algorithm (equation (8) and (9)). These algorithms have to include both the chrominance and luminance information when computing the matching cost for every disparity value under consideration. Our set up enable our algorithm to simplify the camera model and thus uses only the information in the luminance channel of input LDR images to evaluate the pixel dissimilarities, equation (15). As the computation of the cost function is done for one channel 62 in our case compared to three channels in other stereo matching algorithms such as [7] and [10], our algorithm is less computationally intensive. Our algorithm also designs a unique refinement process to cope with changes in the exposures under which the two input LDR images are captured. This refine process also copes with the existence of saturated regions in the input LDR images. Most of the available algorithms such as [10] just run the matching process once. These algorithms focus on designing a single cost function which can tolerate radiometric changes between and the existence of saturated areas in the input stereo images. Some algorithms such as [7] have a refinement step. Their refinement step is simply a rerun, i.e. it applies the first stereo matching process again either once or several time, using the same matching cost using radiance maps of the input LDR images. Such refinement step increases the accuracy of the disparity map because there are less radiometric changes among the radiance maps of the input LR images. However, the refinement step does not increase the probability of propagating the correct disparity values of unsaturated pixels to the nearby saturated pixels in the input LDR images. Our algorithm proposed a refinement step which differs from the refinement steps in other state-of-the-art algorithms in the following steps: x We take the pixels with accurate disparity values found in the initial estimate of the disparity maps into consideration. This is done by pre-setting the disparity values of these pixels in the solution vector I when solving the energy function. As a result, we ensure that the accurate disparity values of unsaturated pixels are assigned to the nearby saturated pixels belonging to the same object. x  We take the luminance discontinuities in the input LDR images into consideration. Include this information in the smoothness term in the energy function also increase the accuracy and sharpness of the edges of objects in the image. x We use a different matching cost function in energy function to measure the dissimilarities between two pixels in the radiance maps of the input images. We apply Census filter first to the radiance maps and then calculated the Hamming 63 distance between corresponding pixels. This cost function is much less computationally intensive than NCC. As a result, the disparity maps calculated using our algorithms show less errors and better smoothness than other state-of-art algorithm for input stereo images with saturations and large radiometric changes. Our algorithm can be used with fast motion scenes since the proposed setup captures images with different exposures at the same instance. Therefore, no temporal adjustment is required. Every pair of images represents the same scene and is used to generate the HDR image for that instance. For constructing HDR videos, we can use the same set up, i.e. two LDR cameras placed next to each other to record the same video. Then the corresponding frames of each video can be paired up. Each pair can be treated separately as one pair of stereo LDR images. Our algorithm can then be used to process these pairs of images to form the HDR images. Finally, the constructed HDR images from each pair for input LDR frames from the LDR videos can be combined together to form the HDR video. In the future, we would like to explore the field of constructing 3D HDR images or videos since our algorithm is capable of computing accurate disparity map for a variety of scenes such as scenes with large radiometric change, existence of saturated areas and motions. There are many interesting and useful work and projects that can continue from this work and that we would like to explore in the future. These include: x Extension of our algorithm to the field of constructing HDR videos. This will be done using the framework outlined in the previous paragraph to obtain each frame of the HDR video.  One of the problems we might want to address is the flickering problem. The defects in constructed HDR frame may results in shifts of corresponding pixels between consecutive frames. Special steps may be required to make the video appear smooth. x We would also like to find the optimal exposure ratio which results in HDR images with the largest possible dynamic range, following the direction proposed in [17]. The algorithm in [17] takes several predefined camera response functions. These are the response functions of several popular LDR cameras in the market. 64 Such limitation prevents it from being accurate for all the cameras in the market. We would like to modify the algorithm so that it can calculated the camera response function dynamically and calculates the optimal exposure ratio specifically for the camera being used. 65 Bibliography [1] F. Banterle, P. Ledda, K. Debattista, A. Chalmers, and M. Bloj, “A framework for inverse tone mapping,” The visual Computer: International Journal of Computer Graphics, vol. 23, pp. 467–478, 2007. [2] P. E. Debevec and J. Malik, “Recovering high dynamic range radiance maps from photographs,” in Proceedings of SIGGRAPH 97, August 1997, pp. 369–378. [3] T. Mitsunaga and S. K. Nayar, “Radiometric self calibration,” in IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, 1999, pp. 374–380. [4] R. A. Varkonyi-Koczy, R. Hashimoto, S. Balogh, and S. Y., “Gradient based synthesized multiple exposure time HDR image,” in Instrumentation and Measurement Technology Conference Proceedings, May 2007, pp. 1–6. [5] T. Mitsunaga and S. K. Nayar, “High dynamic range imaging: spatially varying pixel exposures,” ACM Transaction On Graphics, vol. 3, pp. 267–276, 2002. [6] S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “High dynamic range video,” in International Conference on Computer Graphics and Interactive Techniques, ACM SIGGRAPH 2003, 2003, pp. 319–325. [7] B. Troccoli, S. B. Kang, and S. Seitz, “Multi-view multi-exposure stereo,” in Third International symposium on 3D Data Processing, Visualization, and Transmission, June 2006, pp. 861–868. [8] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222–1239, 2001. 66 [9] H. Hirschnuller and D. Scharstein, “Evaluation of cost function for stereo matching,” in Proceedings of Computer Vision and Pattern Recognition, 2007. [10] Y. S. Heo, K. M. Lee, and S. U. Lee, “Illumination and camera invariant stereo matching,” in IEEE Conference on Computer Vision and Pattern Recognition, 2008. [11] S. Z. Li, “Markov random field models in computer vision,” in ECCV ’94: Proceedings of the Third European Conference-Volume II on Computer Vision. London, UK: Springer- Verlag, 1994, pp. 361–370. [12] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Proceedings of International Conference on Computer Vision, 1998. [13] R. Brockers, “Cooperative stereo matching with color-based adaptive local support,” in Proceedings of the 13th International Conference on Computer Analysis of Images and Patterns, vol. 5702, 2009. [14] http://www.adastral.ucl.ac.uk/ vladkolm/software.html. [15] R. Zabih and J. Woodfill, “Non-parametric local transforms for computing visual correspondance,” Proceedings of ECCV, pp. 131–158, 1994. [16] http://cat.middlebury.edu/stereo/data.html. [17] S. P. P. D. E. Reinhard, G. Ward, “High Dynamic Range Imaging: Acquisition, Display and Image-Based Lighting,” Morgan Kaufmann, 2005. [18] R. Fattal,  D. Lischinski, M. Werman, “Gradient domain high dynamic range compression,” Proceedings of the 29th annual Conference on Computer Graphics and Interactive Techniques, Vol. 21, Issue 3, 2002 67 [19] R. A. Street, “High dynamic range segmented pixel sensor array”, U.S. Patent 5789737, 1998 [20] M. Murakoshi, “Charge coupling image pickup device,” Japanese Patent 59-217358, 1994 [21] M. D. Grossberg and S. K. Nayr. “Determining the camera response from images: What is knowable?” IEEE Transaction Pattern Analysis and Machine Intelligence, Vol 25, No. 11, 2003 [22] S. J. Kim, M. Pollefeys, “Robust Radiometric Calibration and Vignetting Correction.” IEEE Transaction Pattern Analysis and Machine Intelligence, Vol. 30, Issue 4, pp 562-576 , 2008 [23] M. D. Grossberg and S. K. Nayar, “What can be known about the radiometric response from images?” Proceedings of the 7th European Conference on Computer Vision-Part IV, pages 189–205, London, UK, 2002. [24] T. Kanade and M. Okutomi. “A stereo matching algorithm with an adaptive window: Theory and experiment.” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.12, Issue 9, 1994. 68 Appendices Appendix A  Census Transform The Census transform is one form of non-parametric local transforms for computing vision correspondence. Non-parametric local transforms rely on the relative order of the neighboring intensity values to the central pixel value, thus do not rely on the local intensity values themselves as in the case of parametric local transforms such as NCC which rely on the exact intensity values of the pixels in the input images. Therefore, non-parametric local transforms have the advantage of tolerating more significant differences between the corresponding pixel values in the left and right stereo images. It also improves the accuracy of pixel values near object boundaries in the disparity map. Similar to other local transforms, the Census transform is based on a window and computes the correspondence between pixels in the left and right stereo images. Let S be a pixel,  , S be its intensity value and  1 S  be the neighboring pixels T  within a G Gu  square window surrounding pixel S . The Census transform only depends on the sign between the intensities at pixels S  and T , i.e.  , S  and  , T . The sign is determined as:          , S , TS T , S , T[ ­ d ® !¯  ሺʹ͸ሻ The Census transform depends solely on the set of ordered pairs:         T 1 S S T S T[  (  *  ሺʹ͹ሻ where  S( , the transform of S , is a binary vector that represents the bit string of the set of neighboring pixels T  whose pixel values less than  , S . To compare the similarity of two binary vectors (each representing a pixel belonging to one of two Census transformed images), the Hamming distance is used. The Hamming distance   G [ \  between two vectors [  and \  is defined as the number of bits in which they differ. For example:   G   69 And   G   The correspondence between two pixels (one pixel belonging to the left image and the other belonging to the right image) is computed by minimizing the Hamming distance after applying the Census transform to the two images. 70 Appendix B  Code StereoMatching.cpp:  #include"stdafx.h"  ////////////////////////////////////////////////////////////////////////////// //ExampleillustratingtheuseofGCoptimization.cpp // ///////////////////////////////////////////////////////////////////////////// // //Optimizationproblem: //isasetofsites(pixels)ofwidth10andhight5.Thusnumberofpixelsis50 //gridneighborhood:eachpixelhasitsleft,right,up,andbottompixelsasneighbors //7labels //Datacosts:D(pixel,label)=0ifpixel<25andlabel=0 //:D(pixel,label)=10ifpixel<25andlabelisnot0 //:D(pixel,label)=0ifpixel>=25andlabel=5 //:D(pixel,label)=10ifpixel>=25andlabelisnot5 //Smoothnesscosts:V(p1,p2,l1,l2)=min((l1Ͳl2)*(l1Ͳl2),4) //Belowinthemainprogram,weillustratedifferentwaysofsettingdataandsmoothnesscosts //thatourinterfaceallowandsolvethisoptimizaitonproblem  //Formostoftheexamples,weusenospatiallyvaryingpixeldependentterms. //Forsomeexamples,todemonstratespatiallyvaryingtermsweuse //V(p1,p2,l1,l2)=w_{p1,p2}*[min((l1Ͳl2)*(l1Ͳl2),4)],with //w_{p1,p2}=p1+p2if|p1Ͳp2|==1andw_{p1,p2}=p1*p2if|p1Ͳp2|isnot1  #include<sstream> #include"LocalCost.h" #include"ImageIO.h" #include"PostProc.h"  usingnamespacestd;  int*resultl,*resultr; LocalCost*lc=newLocalCost();  //////////////////////////////////////////////////////////////////////////////// //smoothnessanddatacostsaresetuponebyone,individually //gridneighborhoodstructureisassumed 71 //////////////////////////////////////////////////////////////////////////////// intfactor=6; intsmoothFn(intp1,intp2,intl1,intl2) {  intw=768;  inth=768;  intcol1=p1%w;  introw1=(p1Ͳcol1)/w;  intcol2=p2%w;  introw2=(p2Ͳcol2)/w;  doublepower=0;   doubleexp1=0;  doubleexp2=0;   exp1=((row1Ͳrow2)*(row1Ͳrow2)+(col1Ͳcol2)*(col1Ͳcol2))/2.04/2.04;    if((col2+p2)<w&&(col2+p2)>=0&&(col1+p1)<w&&(col1+p1)>=0&&(p2+l1Ͳl2)<w &&(p2+l1Ͳl2)>=0)  {   exp2=pow((::lcͲ>images.left[p1]Ͳ::lcͲ>images.left[p2+l1Ͳl2]),2.0);   //exp2=pow((::lcͲ>images.left[p1]Ͳ::lcͲ>images.left[p2]),2.0);   exp2=exp2/16.0/16.0;   doublecost=(l1Ͳl2)*(l1Ͳl2)<=5?(l1Ͳl2)*(l1Ͳl2):5;   return(int)(cost*factor*exp(Ͳ0.5*(exp1+exp2)));  }  else  {   intcost=(l1Ͳl2)*(l1Ͳl2)<=5?(l1Ͳl2)*(l1Ͳl2):5;   returncost*factor;  } }  void GridGraph_Individually(int width,int height,int num_pixels,int num_labels, int reverse, int window,intlo) {  try{   GCoptimizationGeneralGraph *gc = new GCoptimizationGeneralGraph(num_pixels,num_labels);    //firstsetupdatacostsindividually   for(inti=0;i<height;i++) 72   {    for(intj=0;j<width;j++)    {     for(intl=0;l<num_labels;l++)     {      if(reverse==0)       gcͲ>setDataCost(i*width+j,l,lcͲ >dataCoFn_ncc(i*width+j,l,window,width,height,lo));      else       gcͲ>setDataCost(i*width+j,l,lcͲ >dataCoFn_ncc(i*width+j,Ͳl,window,width,height,lo));     }    }   }    //nowsetupagridneighborhoodsystem   //firstsetuphorizontalneighbors   for(inty=0;y<height;y++)    for(intx=1;x<width;x++)    {     gcͲ>setNeighbors(x+y*width,xͲ1+y*width);     //gcͲ>setNeighbors(x+y*width,xͲ2+y*width);     if((yͲ1)>=0)     {      gcͲ>setNeighbors(x+y*width,xͲ1+(yͲ1)*width);      //gcͲ>setNeighbors(x+y*width,xͲ2+(yͲ1)*width);     }     if((y+1)<height)     {      gcͲ>setNeighbors(x+y*width,xͲ1+(y+1)*width);      //gcͲ>setNeighbors(x+y*width,xͲ2+(y+1)*width);          }     /*if((yͲ2)>=0)     {      gcͲ>setNeighbors(x+y*width,xͲ1+(yͲ2)*width);      gcͲ>setNeighbors(x+y*width,xͲ2+(yͲ2)*width);     }     if((y+2)<height)s     {      gcͲ>setNeighbors(x+y*width,xͲ1+(y+2)*width);      gcͲ>setNeighbors(x+y*width,xͲ2+(y+2)*width); 73     }*/    }    //nextsetupverticalneighbors   for(inty=1;y<height;y++)   {    for(intx=0;x<width;x++)    {     gcͲ>setNeighbors(x+y*width,x+(yͲ1)*width);     //gcͲ>setNeighbors(x+y*width,x+(yͲ2)*width);    }   }    //nextsetupsmoothnesscostsindividually   /*for(intl1=0;l1<num_labels;l1++)    for(intl2=0;l2<num_labels;l2++){     intcost=(l1Ͳl2)*(l1Ͳl2)<=5?(l1Ͳl2)*(l1Ͳl2):5;     gcͲ>setSmoothCost(l1,l2,cost*10);    }   */   gcͲ>setSmoothCost(&smoothFn);   printf("\nBeforeoptimizationenergyis%d",gcͲ>compute_energy());   gcͲ>expansion(3);// run expansion for 2 iterations. For swap use gcͲ >swap(num_iterations)   //gcͲ>swap(5);   printf("\nAfteroptimizationenergyis%d\n",gcͲ>compute_energy());    for(inti=0;i<num_pixels;i++)   {    if(reverse==0)     resultl[i]=gcͲ>whatLabel(i);    else     resultr[i]=gcͲ>whatLabel(i);   }   deletegc;  }  catch(GCExceptione){   e.Report();  } }  intmain(intargc,char**argv) 74 {  intnum_labels=0;  char*leftImageFileName="";  char*rightImageFileName="";  char*leftIntermediateDispFileName="";  char*rightIntermediateDispFileName="";  char*leftDispFileName="";  char*rightDispFileName="";  intwindow=3;  intlo=8;    intwidth=768;  intheight=768;   try  {   leftImageFileName=argv[1];   rightImageFileName=argv[2];   leftIntermediateDispFileName=argv[3];   rightIntermediateDispFileName=argv[4];   leftDispFileName=argv[5];   rightDispFileName=argv[6];   window=atoi(argv[7]);   num_labels=atoi(argv[8]);   lo=atoi(argv[9]);  }  catch(char*e)  {   //askuserfortheinputsifit'snotprovided   cout << "This program takes in the intensity values of left and right images and outputsthedisparitymaps\n.";   cout<<"Filenameofleftimage(.txtfile):";   cin>>leftImageFileName;   cout<<"Filenameofrightimage(.txtfile):";   cin>>rightImageFileName;   cout<<"Pleasespecifythefilewhichstorestheresultof initialmatchingfrom left imageview:";   cin>>leftIntermediateDispFileName;   cout<<"Pleasespecifythefilewhichstorestheresultofinitialmatchingfromright imageview:";   cin>>rightIntermediateDispFileName; 75   cout<<"Pleasespecifythefilewhichstoresthefinaldisparitymapfromleftimage view:";   cin>>leftDispFileName;   cout<<"Pleasespecifythefilewichstoresthefinaldisparitymapfromrightimage view:";   cin>>rightDispFileName;   cout<<"Windowsize(recommendedvalue:3):";   cin>>window;   cout<< "Numberof labels considered in graph cut (Thisplaces a capnumber to reducetheamountofcomputation):";   cin>>num_labels;   cout<<"Minimum labelvalueaccepted inthefinallydisparitymap(This isused in thetuningstepformoreaccurateresult,recommendedvalue:8Ͳ10):";   cin>>lo;   factor=6;  }   intnum_pixels=width*height;    //factor=atoi(argv[10]);    resultl=newint[num_pixels];//storesresultofoptimization  resultr=newint[num_pixels];   //Readimagefiles  ImageIO*imgio=newImageIO(height,width);  imgioͲ>openImage(leftImageFileName,rightImageFileName);    //smoothnessanddatacostsaresetuponebyone,individually  GridGraph_Individually(width,height,num_pixels,num_labels,0,window,lo);  imgioͲ>writeData(leftIntermediateDispFileName,resultl,num_pixels);    imgioͲ>swapImage();  GridGraph_Individually(width,height,num_pixels,num_labels,1,window,lo);  imgioͲ>writeData(rightIntermediateDispFileName,resultr,num_pixels);   printf("\nFinished %d (%d) clock per sec%d\n",clock()/CLOCKS_PER_SEC,clock(),CLOCKS_PER_SEC);   int*disp=newint[num_pixels];  PostProc*pproc=newPostProc(width,height,num_labels);  //pprocͲ>txtRead("disp_books_f11.txt",resultl,num_pixels); 76  //pprocͲ>txtRead("disp_books_r11.txt",resultr,num_pixels);  pprocͲ>MatchLeftRightDisp(resultl,resultr,disp);  pprocͲ>RemoveSmallAreas(num_labels,leftDispFileName,disp);  pprocͲ>MatchRightLeftDisp(resultl,resultr,disp);  pprocͲ>RemoveSmallAreas(num_labels,rightDispFileName,disp);   //cleanthememory  deleteimgio;  delete[]resultl;  delete[]resultr;  deletepproc;  deletelc;  delete[]disp;  printf("Finish.....\n");  return0; }                           77 LocalCost.h:  #ifndef__LOCALCOST_H__ #define__LOCALCOST_H__  #include<stdio.h> #include<stdlib.h> #include<math.h> #include<string.h> #include<time.h> #include<iostream> #include<fstream>  #include"GCoptimization.h"  structImages {  doubleleft[768*768];  doubleright[768*768]; }; classLocalCost { public:  LocalCost(void);  virtual~LocalCost(void);  structImagesimages;  intdataCoFn_ncc(intp,intl,intwindow,intwidth,intheight,intlo);  intdataCoFn_Census(intp,intl,intwindow,intwidth,intheight,intlo);  private:  voidsubAvg(intwidth,intheight);  doubleweightingFn(intcolt,introwt,intcolp,introwp,intwidth,intheight,intleft);  doublenormConst(double*weights,intsize);  int ncc(int p, int l, int row, int col, int width, int height, double *weights_l, double *weights_r,doubleleftAvg,doublerightAvg,intoffset); };  #endif     78 LocalCost.cpp:  #include"stdafx.h" #include"LocalCost.h"  LocalCost::LocalCost(){} LocalCost::~LocalCost(){}  voidLocalCost::subAvg(intwidth,intheight) {  doublesuml=0.0;  doublesumr=0.0;  intnum_pixels=width*height;  for(inti=0;i<num_pixels;i++)  {   suml=suml+images.left[i];   sumr=sumr+images.right[i];  }  suml=suml/num_pixels;  sumr=sumr/num_pixels;  for(inti=0;i<num_pixels;i++)  {   images.left[i]=images.left[i]Ͳsuml;   images.right[i]=images.right[i]Ͳsumr;  } }  doubleLocalCost::weightingFn(intcolt,introwt,intcolp,introwp,intwidth,intheight,intleft) {  doublesigmad=14.0;  doublesigmas=3.8;  doubleweight=0.0;    intt=rowt*width+colt;  intp=rowp*width+colp;   if(colt>=0&&colt<width&&rowt>=0&&rowt<height)  {   weight=Ͳ(double)((coltͲcolp)*(coltͲcolp)+(rowtͲrowp)*(rowtͲrowp))/2.0/sigmad /sigmad;   if(colt>=0&&colt<width&&rowt>=0&&rowt<width)   { 79    if(left==0)     weight = weight Ͳ (images.left[p]Ͳimages.left[t])*(images.left[p] Ͳ images.left[t])/2.0/sigmas/sigmas;    else     weight=weightͲ(images.right[p]Ͳimages.right[t])*(images.right[p]Ͳ images.right[t])/2.0/sigmas/sigmas;   }   else   {    if(left==0)     weight=weight Ͳ (images.left[p]Ͳ0.0)*(images.left[p] Ͳ0.0) /2.0 / sigmas/sigmas;    else     weight=weightͲ(images.right[p]Ͳ0.0)*(images.right[p]Ͳ0.0)/2.0/ sigmas/sigmas;   }    returnexp(weight);  }  else  {   return0.0;  } }  doubleLocalCost::normConst(double*weights,intsize) {  doublesum=0.0;   for(inti=0;i<size;i++)  {   sum=sum+weights[i];  }  returnsum; }  int LocalCost::ncc(int p, int l, int row, int col, int width, int height, double *weights_l, double *weights_r,doubleleftAvg,doublerightAvg,intoffset) {  //CalculateNCC  doublenumerator=0.0;  doubledenorminator1=0.0; 80  doubledenorminator2=0.0;  doublepleft=0;  doublepright=0;   doubleweightl=0.0;  doubleweightr=0.0;  intindexl=0;  intindexr=0;   intcount=0;   for(inti=rowͲoffset;i<=row+offset;i++)  {   for(intj=colͲoffset;j<=col+offset;j++)   {    indexl=i*width+j;    indexr=i*width+j+l;    if(i>=0&&i<height&&j>=0&&j<width)     pleft=images.left[indexl];    if(i>=0&&i<height&&j+l>=0&&j+l<width)     pright=images.right[indexr];     weightl=weights_l[count];    weightr=weights_r[count];    count++;     numerator = numerator + fabs((pleftͲleftAvg) * (prightͲ rightAvg))*weightl*weightr;    denorminator1=denorminator1+pow((pleftͲleftAvg)*weightl,2);    denorminator2=denorminator2+pow((prightͲrightAvg)*weightr,2);   }  }   doubledenorminator=sqrt(denorminator1)*sqrt(denorminator2);  intcost=(1.0Ͳfabs(numerator/denorminator))*400;  delete[]weights_l;  delete[]weights_r;  returncost; }  intLocalCost::dataCoFn_Census(intp,intl,intwindow,intwidth,intheight,intlo) {  intcol=p%width; 81  introw=(pͲcol)/width;   if(abs(l)<lo)   return400;   if(col+l>=width||col+l<0)   return400;    intoffset=(windowͲ1)/2;  intwPixels=window*window;  doublepleft=0;  doublepright=0;  intindexl=0;  intindexr=0;  int*label1,*label2;  intcount1=0;  intcount2=0;   label1=newint[wPixels];  label2=newint[wPixels];   doublecenter1=images.left[p];  doublecenter2=images.right[p+l];    //applyCensustransformtotheimage  for(inti=rowͲoffset;i<=row+offset;i++)  {   for(intj=colͲoffset;j<=col+offset;j++)   {    if(j>=0&&j<width&&i>=0&&i<height)    {     indexl=i*width+j;     pleft=images.left[indexl];     if(pleft<center1)       label1[count1]=1;     else      label1[count1]=0;     count1++;    }    else    {     label1[count1]=0; 82     count1++;    }        if(j+l>=0&&j+l<width&&i>=0&&i<height)    {     indexr=i*width+j+l;     pright=images.right[indexr];     if(pright<center2)      label2[count2]=1;     else      label2[count2]=0;     count2++;    }    else    {     label2[count2]=0;     count2++;    }   }  }    //CalculatetheHammingdistancebetweenthetwobitstreams  inthammingd=0;   for(inti=0;i<wPixels;i++)  {   if(label1[i]!=label2[i])    hammingd++;  }  intcost=((double)hammingd)/wPixels*400;  delete[]label1;  delete[]label2;  returncost; }  intLocalCost::dataCoFn_ncc(intp,intl,intwindow,intwidth,intheight,intlo) {  intcol=p%width;  introw=(pͲcol)/width;  intoffset=(windowͲ1)/2;  doublesuml=0.0;  doublesumr=0.0; 83  doubleweightl=0.0;  doubleweightr=0.0;  intindexl=0;  intindexr=0;   if(abs(l)<lo)   return400;   if(col+l>=width||col+l<0)   return400;   double*weights_l,*weights_r;  intwPixels=window*window;  weights_l=newdouble[wPixels];  weights_r=newdouble[wPixels];  intcount=0;   //Calculateaverageofleftandrightwindow  for(inti=rowͲoffset;i<=row+offset;i++)  {   for(intj=colͲoffset;j<=col+offset;j++)   {    indexl=i*width+j;    indexr=i*width+j+l;    weightl=weightingFn(j,i,col,row,width,height,0);    weightr=weightingFn(j+l,i,col+l,row,width,height,1);        weights_l[count]=weightl;    weights_r[count]=weightr;    count++;        if(i>=0&&i<height&&j>=0&&j<width)     suml=suml+images.left[indexl]*weightl;    if(i>=0&&i<height&&j+l>=0&&j+l<width)     sumr=sumr+images.right[indexr]*weightr;   }  }    doubleleftAvg=suml/normConst(weights_l,wPixels);  doublerightAvg=sumr/normConst(weights_r,wPixels);  returnncc(p,l,row,col,width,height,weights_l,weights_r,leftAvg,rightAvg,offset);  } 84 ImageIO.h:  #ifndef__IMAGEIO_H__ #define__IMAGEIO_H__  #include<stdio.h> #include<stdlib.h> #include<math.h> #include<string.h> #include<time.h> #include<iostream> #include<fstream>  #include"LocalCost.h"  classImageIO { public:  ImageIO(intheight,intwidth);  virtual~ImageIO(void);  voidopenImage(char*leftn,char*rightn);  voidswapImage(void);  voidwriteData(char*name,int*data,intnum_pixels);  private:  inth,w; };  #endif              85 ImageIO.cpp: #include"stdafx.h" #include"ImageIO.h"  usingnamespacestd; externLocalCost*lc;  ImageIO::ImageIO(intheight,intwidth) {  w=width;  h=height; }  ImageIO::~ImageIO(){}  voidImageIO::openImage(char*leftn,char*rightn) {  intvalue;  ifstreaminFile;     inFile.open(leftn);  if(!inFile)  {   cout<<"Unabletoopenfile";   exit(1);  }    for(inti=0;i<w*h;i++)  {   inFile>>value;   ::lcͲ>images.left[i]=log((double)value);  }  inFile.close();   inFile.open(rightn);  if(!inFile)  {   cout<<"Unabletoopenfile";   exit(1);  }    for(inti=0;i<w*h;i++) 86  {   inFile>>value;   ::lcͲ>images.right[i]=log((double)value);  }  inFile.close(); }  voidImageIO::swapImage() {  doubletmp=0;  for(inti=0;i<w*h;i++)  {   tmp=::lcͲ>images.left[i];   ::lcͲ>images.left[i]=::lcͲ>images.right[i];   ::lcͲ>images.right[i]=tmp;  } }  voidImageIO::writeData(char*name,int*data,intnum_pixels) {  ofstreamoutFile;  outFile.open(name);  for(inti=0;i<num_pixels;i++)  {   outFile<<data[i];   outFile<<"\t";  }  outFile.close(); }              87 Segmentation.h:  #ifndef__SEGMENTATION_H__ #define__SEGMENTATION_H__  #include<stdio.h> #include<stdlib.h> #include<math.h> #include<string.h> #include<time.h> #include<iostream> #include<fstream> #include<stack> #include<map>  classSegmentation { public:  Segmentation(intwidth,intheight);  virtual~Segmentation(void);  intAssignLabel(int*disp);  voidInvalidateSmallSegments(int*disp,intlabel);   int*seg;  private:  intw,h;  intMIN_REGION_SIZE; };  #endif            88 Segmentation.cpp:  #include"stdafx.h" #include"Segmentation.h" usingnamespacestd;  Segmentation::Segmentation(intwidth,intheight) {  w=width;  h=height;  MIN_REGION_SIZE=150;  seg=newint[w*h]; }  Segmentation::~Segmentation() {  delete[]seg; }  intSegmentation::AssignLabel(int*disp) {  intlabel=0;  intleft=Ͳ999;  intup=Ͳ999;  intp=0;    for(inti=0;i<h;i++)  {   for(intj=0;j<w;j++)   {    p=disp[i*w+j];        if(p==0)    {     seg[i*w+j]=0;     continue;    }        if(i>0)     up=disp[(iͲ1)*w+j];    else     up=Ͳ999; 89     if(j>0)     left=disp[i*w+jͲ1];    else     left=Ͳ999;     if(abs(pͲleft)<3)     seg[i*w+j]=seg[i*w+jͲ1];    else    {     if(abs(pͲup)<3)      seg[i*w+j]=seg[(iͲ1)*w+j];     else     {      label=label+1;      seg[i*w+j]=label;     }    }    if(abs(pͲleft)<3&&abs(pͲup)<3&&abs(leftͲup)<3&&seg[(iͲ1)*w+j]!= seg[i*w+jͲ1])    {     intmin=0;     intmax=0;     //mergethetwolabels     if(seg[(iͲ1)*w+j]>seg[i*w+jͲ1])     {      max=seg[(iͲ1)*w+j];      min=seg[i*w+jͲ1];     }     else     {      min=seg[(iͲ1)*w+j];      max=seg[i*w+jͲ1];     }     for(intm=0;m<=i;m++)      for(intn=0;n<=j;n++)      {       if(seg[m*w+n]==max)        seg[m*w+n]=min;      }    }   } 90  }  returnlabel; }  voidSegmentation::InvalidateSmallSegments(int*disp,intlabel) {  stack<int>*index=newstack<int>[label];  int*count=newint[label];  intpIndex=0;  intl;   for(inti=0;i<label;i++)  {   count[i]=0;  }   //countthesizeofeachsegment  for(inti=0;i<h;i++)  {   for(intj=0;j<w;j++)   {    pIndex=i*w+j;    l=seg[pIndex];    count[l]++;    if(count[l]<MIN_REGION_SIZE)     index[l].push(pIndex);   }  }   //checkinvalidesegments  for(inti=0;i<label;i++)  {   if(count[i]<MIN_REGION_SIZE)   {    while(!index[i].empty())    {     disp[index[i].top()]=0;     index[i].pop();    }   }  } } 91 PostProc.h:  #ifndef__POSTPROC_H__ #define__POSTPROC_H__  #include<stdio.h> #include<stdlib.h> #include<math.h> #include<string.h> #include<time.h> #include<iostream> #include<fstream> #include<vector>  #include"Segmentation.h"  classPostProc { public:  PostProc(intwidth,intheight,intnum_labels);  virtual~PostProc(void);  voidRemoveSmallAreas(intnum_labels,char*filename,int*disp);  voidwriteData(char*name,int*data,intnum_pixels);  voidtxtRead(char*name,int*data,intnum_pixels);  voidMatchLeftRightDisp(int*displ,int*dispr,int*disp);  voidMatchRightLeftDisp(int*displ,int*dispr,int*disp);  //voidgetDisp(int*d);  //voidgetValidLabels(int*labels,intnum_labels);  private:  int*pixels,*flag;//*disp,  intw,h,min_region;  //voidMatchCbCrDisp(int*dispcb,int*dispcr); };  #endif       92 PostProc.cpp:  #include"stdafx.h"  #include"PostProc.h" #include"ImageIO.h"  usingnamespacestd;  PostProc::PostProc(intwidth,intheight,intnum_labels) {  w=width;  h=height;  //disp=newint[h*w];  min_region=1000;  pixels=newint[min_region];  flag=newint[num_labels];  }  PostProc::~PostProc() {  //delete[]disp;  delete[]pixels;  delete[]flag; }  voidPostProc::txtRead(char*name,int*data,intnum_pixels) {  intvalue;  ifstreaminFile;     inFile.open(name);  if(!inFile)  {   cout<<"Unabletoopenfile";   exit(1);  }    for(inti=0;i<num_pixels;i++)  {   inFile>>value; 93   data[i]=(int)value;  }  inFile.close(); }  /*voidPostProc::getDisp(int*d) {  for(inti=0;i<h*w;i++)  {   d[i]=disp[i];  } }*/  /*voidPostProc::getValidLabels(int*labels,intnum_labels) {  for(inti=0;i<num_labels;i++)   labels[i]=0;   for(inti=0;i<h*w;i++)  {   if(disp[i]!=0)    labels[disp[i]]++;  } }*/  voidPostProc::writeData(char*name,int*data,intnum_pixels) {  ofstreamoutFile;  outFile.open(name);  for(inti=0;i<num_pixels;i++)  {   outFile<<data[i];   outFile<<"\t";  }  outFile.close(); }  voidPostProc::MatchLeftRightDisp(int*displ,int*dispr,int*disp) {  intnum_pixels=w*h;  intdl=0;  intdr=0; 94    for(inti=0;i<h;i++)  {   for(intj=0;j<w;j++)   {    dl=displ[i*w+j];    dr=dispr[i*w+j+dl];    if(abs(dlͲdr)<3)     disp[i*w+j]=dl;    else     disp[i*w+j]=0;   }  }  //outputthematcheddisparitymap  //writeData("disp_statue.txt",disp,h*w); }  voidPostProc::MatchRightLeftDisp(int*displ,int*dispr,int*disp) {  intnum_pixels=w*h;  intdl=0;  intdr=0;    for(inti=0;i<h;i++)  {   for(intj=0;j<w;j++)   {    dr=dispr[i*w+j];    if(jͲdr<0)     disp[i*w+j]=0;    else    {     dl=displ[i*w+jͲdr];     if(abs(dlͲdr)<3)      disp[i*w+j]=dr;     else      disp[i*w+j]=0;    }   }  }  //outputthematcheddisparitymap  //writeData("disp_statue.txt",disp,h*w); 95 }  voidPostProc::RemoveSmallAreas(intnum_labels,char*filename,int*disp) {  Segmentation*segProc=newSegmentation(w,h);  intnum_l=segProcͲ>AssignLabel(disp);  cout<<"Numberoflabelsis"<<num_l<<endl;  segProcͲ>InvalidateSmallSegments(disp,num_l+1);   //outputthematcheddisparitymap  //writeData("statue_labels.txt",segProcͲ>seg,h*w);    //outputthematcheddisparitymap  writeData(filename,disp,h*w);  deletesegProc; }                           96 GenerateHDR.m  functionhdrImg=generateHDR(dimg1,dimg2,nleft,nright,disp,height,width,c)  num_pixels=width*height;  left_image=reshape(textread(nleft,'%d',num_pixels),width,height)'; right_image=reshape(textread(nright,'%d',num_pixels),width,height)';  left_image=double(left_image)/255; right_image=double(right_image)/255;  hdrImg=dimg1; %generateHDRimage form=1:size(dimg1,1) forn=1:size(dimg2,2)Ͳ30 r1=dimg1(m,n); ifr1<0 r1=0; end l1=left_image(m,n);  r2=dimg2(m,n); ifr2<0 r2=0; end if(n+disp(m,n))<=width l2=right_image(m,n+disp(m,n)); else r2=0; l2=0; end  w1=r1/diff(c,l1); w2=r2/diff(c,l2); ifr1>0||r2>0 hdrImg(m,n)=(r1*w1+r2*w2)/(w1+w2); else hdrImg(m,n)=0; end end end 97  functionv=diff(c,b) power=length(c); v=0; forp=2:power v=v+c(p)*b^(pͲ2); end v=abs(v);   ToneMapping.m  functionhdr=toneMapping(img) height=size(img,1); width=size(img,2); Lw=exp(sum(sum(log(img+10^Ͳ6)))/width/height); L=0.18*img/Lw; Ld=L./(1+L); Ld=Ld/max(max(Ld)); hdr=uint8(Ld*255); %hdr=uint8((0.9Ͳpower(LdͲ1,2))*255);

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
China 10 3
United States 4 3
Singapore 1 2
City Views Downloads
Beijing 10 0
Ashburn 3 0
Unknown 2 2

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}

Share

Share to:

Comment

Related Items