Single Exposure High Dynamic Range Imaging with a Conventional Camera Using Cross-Screen Filters by Mushfiqur Rouf M.Sc., Bangladesh University of Engineering and Technology, 2007 B.Sc., Bangladesh University of Engineering and Technology, 2005 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Computer Science) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) October 2009 c Mushfiqur Rouf 2009 Abstract Real world scenes often contain both bright and dark regions, resulting in a high contrast ratio, beyond the capabilities of conventional cameras. For these cases, High Dynamic Range or HDR images can be captured with expensive hardware or by taking multiple exposures of the same scene. However, these methods cost extra resources — either spatial or temporal resolution is sacrificed, or more than one piece of hardware is needed. In this thesis, a novel technique is presented that is capable of capturing High Dynamic Range images in only one exposure of a conventional camera. We observe that most natural HDR images have only 2–5% pixels that are too bright compared to the rest of the scene to fall inside the dynamic range of a conventional camera. Our method spreads energy from these bright regions into the neighboring unsaturated pixels by defocus blurring. Bright pixels still get clipped in the captured image due to saturation of the sensor; but some information about these clipped pixels gets encoded or multiplexed in the form of superimposed glare patterns in the image. Frequency preservation and decoding of this information can be further improved by using a crossscreen filter instead of using defocus blur. Superimposed glare patterns are recovered with the help of natural image statistics. These glare patterns provide information about how much energy there is in the saturated pixels, which allows a tomography-like reconstruction of the saturated regions. Once the saturated regions are known, the rest of the image can be restored by removing the estimated glare patterns. ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 High Dynamic Range Photography . . . . . . . . . . . . . . . . . . 3 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Objective and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5 Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1 HDR Image Acquisition . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.1 Nonlinear Sensors . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 Multiple Exposures . . . . . . . . . . . . . . . . . . . . . . 9 2.1.3 Spatially Varying Techniques . . . . . . . . . . . . . . . . . 9 2.1.4 Multiple Exposure for Video Abstract 1 2 . . . . . . . . . . . . . . . . . 11 2.2 Multiplexing Additional Scene Information . . . . . . . . . . . . . . 11 2.3 Restoration of Clipped Signal . . . . . . . . . . . . . . . . . . . . . 13 2.4 Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 iii Table of Contents 3 HDR by Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Clear Aperture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2.3 Artificial PSFs . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.4 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . 19 3.2.5 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Split Aperture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 HDR with Cross Screen Filters . . . . . . . . . . . . . . . . . . . . . . . 28 4.1 . . . . . . . . . . . . . . . . . . . . . . . 28 . . . . . . . . . . . . . . . . . . . . . . . . 30 Image Formation with Cross-Screen Filters . . . . . . . . . . . . . . 31 4.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3 Limited Interaction Model . . . . . . . . . . . . . . . . . . . . . . . 37 4.4 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.5 Removing Unsaturated Pixel Interactions . . . . . . . . . . . . . . . 43 4.6 Glare Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.6.1 Model of Glare . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.6.2 Estimating Line Integrals . . . . . . . . . . . . . . . . . . . 47 4.7 Recovering Saturated Regions . . . . . . . . . . . . . . . . . . . . . 49 4.8 Removing Glare . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2 Simulated Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3 3.4 4 Motivation and Overview 4.1.1 4.2 5 Chapter Outline Split Kernel iv Table of Contents 5.2.1 Effect of Noise . . . . . . . . . . . . . . . . . . . . . . . . . 55 Effect of Scene Brightness . . . . . . . . . . . . . . . . . . . . . . . 58 5.3.1 Day Shots . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.3.2 Night Shots . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.4 Effect of Smoothness Regularizer . . . . . . . . . . . . . . . . . . . 61 5.5 Failure Case: Napa Valley Image . . . . . . . . . . . . . . . . . . . 61 5.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.2.1 Additonal Constraints for Recovering Saturated Regions . . . 65 6.2.2 Generalized Hough Transform . . . . . . . . . . . . . . . . . 67 6.2.3 Better PSF Estimation . . . . . . . . . . . . . . . . . . . . . 68 6.2.4 Automatic Glare Orientation Ditection . . . . . . . . . . . . 68 6.2.5 HDR Video . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.3 6 v List of Figures 1.1 LDR vs HDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Dynamic range of different systems . . . . . . . . . . . . . . . . . . 2 1.3 Bright feature statistics . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Light sources are very bright compared to the rest of the scene . . . . 5 2.1 Multiexposure HDR reconstruction example . . . . . . . . . . . . . . 10 3.1 Clear aperture (disk) filter . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Using artificial clear aperture . . . . . . . . . . . . . . . . . . . . . . 19 3.3 Clear aperture result . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4 Clear aperture result for a large blur radius . . . . . . . . . . . . . . . 21 3.5 Clear aperture method on cases with clipping . . . . . . . . . . . . . 22 3.6 Concept of split aperture approach . . . . . . . . . . . . . . . . . . . 23 3.7 Simulated split aperture filter . . . . . . . . . . . . . . . . . . . . . . 24 3.8 Split aperture synthetic test case . . . . . . . . . . . . . . . . . . . . 26 4.1 Imaging with a cross-screen filter . . . . . . . . . . . . . . . . . . . . 28 4.2 Glare patterns give aggregate information . . . . . . . . . . . . . . . 29 4.3 Different glare rays give different aggregate views of the saturated region 30 4.4 A 4-point cross-screen filter . . . . . . . . . . . . . . . . . . . . . . . 31 4.5 How blurring occurs due to a 2-point cross-screen filter . . . . . . . . 32 4.6 Cross-screen filter profile . . . . . . . . . . . . . . . . . . . . . . . . 33 4.7 Cross-screen filter profile estimation . . . . . . . . . . . . . . . . . . 34 4.8 Black level in filtered images . . . . . . . . . . . . . . . . . . . . . . 35 vi List of Figures 4.9 Glare is in focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.10 Split kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.11 Variation in strengths of glare rays . . . . . . . . . . . . . . . . . . . 37 4.12 Limited interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.13 Saturated and unsaturated pixels . . . . . . . . . . . . . . . . . . . . 41 4.14 Split kernel convolution when clipping occurs . . . . . . . . . . . . . 42 4.15 Four kinds of pixel interactions . . . . . . . . . . . . . . . . . . . . . 42 4.16 Glare contribution from one side of an unsaturated region . . . . . . . 45 4.17 Glare contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.18 Left-right agreement constraint . . . . . . . . . . . . . . . . . . . . . 49 5.1 Simulated Test Case . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.2 Simulated test case . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.3 Effect of additive noise and quantization . . . . . . . . . . . . . . . . 56 5.4 Effect of noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.5 Day shot example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.6 Removing glare from a night shot . . . . . . . . . . . . . . . . . . . 59 5.7 Removing glare from a night shot . . . . . . . . . . . . . . . . . . . 60 5.8 Recovering hidden detail . . . . . . . . . . . . . . . . . . . . . . . . 61 5.9 Effect of smoothness regularizer . . . . . . . . . . . . . . . . . . . . 62 5.10 Failure case — Napa Valley image . . . . . . . . . . . . . . . . . . . 63 6.1 Edgelike tomograpy artifacts . . . . . . . . . . . . . . . . . . . . . . 66 6.2 Generalized Hough transform . . . . . . . . . . . . . . . . . . . . . . 67 vii Acknowledgements Dr. Wolfgang Heidrich, my supervisor, gave me the great opportunity of working with him, and offered constant guidance and encouragement. The almost insurmountable challenges he placed before me made me even more interested to pursue a PhD. The second reader, Dr. Uri Ascher, read through my thesis and gave valuable advice. Dr. Rafał Mantiuk deserves the credit for the idea of using cross-screen filters. He has worked hard othroughout this project, provided new ideas and answered my questions on numerous occasions. He is also the one who formalized most of my raw intuitive ideas into rigorous mathematics. Dr. Ivo Ihrke provided valuable input regarding the Tomography-like reconstruction problem. I am also indebted to my predecessors, Matthew Trentacoste and Cheryl Lau, for their invaluable contribution in laying the groundwork of this research project. Benjamin Cecchetto, Cheryl Lau and Allan Rempel helped me with revising the thesis, correcting the writing style and formal writing in general. My previous M.Sc. supervisor at BUET, Dr. Mostofa Akbar, an Associate Professor in the Dept. of CSE, has always been an invaluable source of inspiration and support. Jamiur Rahman, one of my students at BRAC University, Bangladesh, made me realize that Computer Graphics is the research area I should focus in. I got the aspirations and inspirations for pursuing my goals in academic research from my mother, Begum Shirin Akhtar, the first female computer programmer in Bangladesh, and from my father, Dr. Md. Abdur Rouf, a Professor of Civil Engineering at BUET. My wife, Maliha Sultana, has always been understanding, encouraging and, above all, the best friend. She taught me why and how to have a optimistic view towards evviii Acknowledgements erything in life. My sister Farzana Rouf (Alice) and my brother Shahriar Rouf (Nafi) always kept my morals high. My uncle Md. Abdur Rafiq introduced me to the fantastic world of Mathematics by giving me the book Mathematics Can Be Fun by Y. Perelman. These wonderful people taught me how to be independent, honest and confident, and always emphasized on learning the answers to the how and why questions, which I believe laid the foundations of my interest in pursuing a research career. I have to thank the people behind the Battlestar Galactica (reimagined) show for making such a wonderful show – which helped me go through some bad times. Finally, I would like to thank all my friends, and foes, for making me what I am today. ix Chapter 1 Introduction (a) LDR Image (b) Tonemapped HDR Image Figure 1.1: LDR vs HDR. A lot of details can be missing in a photograph taken by a conventional camera. The HDR image “Atrium Night” has been taken from http://www.mpiinf.mpg.de/resources/hdr/gallery.html, courtesy of Fr´ed´eric Drago. Greg Ward’s tonemapping operator has been used. The term dynamic range indicates the ability of an image capture or display device to adequately image the bright and dark regions of a scene. Informally, an image (a device) is high dynamic range or HDR (capable) if it has (can capture or display) details in the bright regions in the image as well as in the dark regions. 1 Chapter 1. Introduction LCD CRT Screen Projection over time Human eye − simultaneous Cameras Natural Scenes 0 4 8 12 16 18 log2 Dynamic Range Figure 1.2: Dynamic range of different systems. Although brightness or luminance is a continuous quantity, a minimum brightness or luminance is required for an object to be perceivable. The ratio of the brightness or luminance of the brightest to that of the darkest perceivable object gives the dynamic range of a scene. To capture or display every perceivable detail in a scene, an imaging device must be capable of handling a wide dynamic range. Natural scenes can have both very bright objects (such as light sources and specular reflections) and very dark regions (such as object in shadow), which results in a large dynamic range, often on the order of 108 : 1. Due to limitations of optical systems and hardware, imaging devices often have limited dynamic range. This limitation is expressed as having a low dynamic range or LDR. The images that they capture do not have HDR information, and therefore are LDR images. Conventional digital sensors have a dynamic range of 103.5 : 1 while the human visual system typically has 102 : 1 simultanous dynamic range, which can go up to 106 : 1 over time [20, 24] (Figure 1.2). Clearly, conventional cameras cannot capture all the details that can be seen with naked eye. Not surprisingly, display devices often have a limited dynamic range, and therefore an HDR image, when presented using an LDR device, needs to be tone-mapped [21] to 2 1.1. High Dynamic Range Photography make it fit into the limited dynamic range such that it does not lose details at different parts of the image (Figure 1.1). Evidently, HDR images match real world experience. That is why research around HDR capture and display devices has become a center of attention in the last decade. Technologies are available today that can fulfill this goal, but they are still either too expensive or too impractical for being as ubiquitous as the LDR technologies are today. This thesis addresses the HDR image acquisition problem and proposes a new method to capture HDR images using conventional LDR cameras, with a view to making this technology accessible today. 1.1 High Dynamic Range Photography As long as the dynamic range of the scene is smaller than that of a camera, the scene can be captured without any quality degradation. However, for larger scene dynamic ranges, special techniques need to be used to capture the full dynamic range. Otherwise bright regions may be saturated and hence get clipped, or locations that are too dark would produce numercially zero values. In both cases, image detail is lost and cannot be reconstructed. Dynamic range of images is defined as the ratio of the brightest pixel value to the smallest nonzero pixel value or the smallest noticeable difference [24]. Since image acquisition techniques inherently incur noise, the smallest pixel value does not always give a reasonable contrast ratio. Instead, the smallest pixel value greater than the expected noise may be used as the smallest discernible value [24]. Note that one of the components of the noise incurred by capture devices is photon shot noise which is Poisson noise, and therefore is related to the brightness of a pixel [9]. However, when HDR is mentioned, the following qualities are loosely referred to: • High contrast ratio : Bright areas need to be captured, as well as the dark regions. Black should appear black and bright objects should not look dim. 3 1.2. Motivation • High bit depth : enough to encode values with quantization levels as small as the just noticeable difference; so that no stepping is visible in a smooth color/intensity gradient. It should be noted that the human visual system has close to a logarithmic response curve, therefore encoding the captured luminances in such a way that makes the best use of bits is implied. For example, the quantization levels at lower intensities should have higher granularity than those at higher intensity levels [12]. • Details are preserved : there is no or little clipping due to over- or under-saturation. 1.2 Motivation 20 15 10 10 10 20 Max Third Quartile Median First Quartile Min 6 10 4 10 0 10 10 Dynamic range Dynamic range 10 15 10 10 10 Max Third Quartile Median First Quartile Min 6 10 4 10 0 10 20 30 40 50 60 70 80 90 100 % of pixels 10 95 (a) 10–100% 96 97 98 % of pixels 99 100 (b) 95–100% Figure 1.3: Bright feature statistics. The graphs above show the dynamic range necessary to adequately capture different pecentage of image pixels. The graph on the right zooms into the 95–100% region of the plot and clearly shows that the brightest 5% pixels are responsible for increasing dynamic rage from about 103.5 : 1 to about 106 : 1, in most cases. A few well-known HDR images were used to produce these statistics. Natural scenes often contain certain bright features such as light sources, specular reflections and highlights. These features are much brighter than the rest of the scene (often in the order of 1000:1), but they occupy very few pixels (2–5%) in an image (Figure 1.3). A typical example is shown in Figure 1.4. 4 1.2. Motivation 4 Frequency 6 x 10 4 2 0 −2 0 2 4 6 log Luminance 8 10 12 2 (a) Full Histogram (b) Long Exposure (c) Short Exposure 4 x 10 3000 Frequency Frequency 6 4 2 0 0 2 4 log2 Luminance 6 (d) Long Exposure Histogram 2000 1000 0 8 10 12 14 log2 Luminance (e) Short Exposure Histogram Figure 1.4: Light sources are very bright compared to the rest of the scene. Also, usually light sources take up only a few pixels in the image. (a) shows the full histogram of 2,316,480 values. (b) shows a simulated long exposure which can capture over 98.8% pixel data without clipping. (c) shows a short exposure which can capture the remaining pixels that comprise the highlight. The histograms below the simulated exposures (b) and (c) show the part of the histogram covered by that exposure. 5 1.3. Objective and Scope While HDR capture is important for visually pleasing images, human visual system cannot perceive details in highlights because of intra-ocular light scatter and limitations in local adaptations [20], therefore capturing highlights very accurately is rarely needed. Also, there are applications where accurate highlight detail is not necessary but the knowledge of overall energy can be useful, such as global illumination using an environment map. Single exposure dynamic range can only be increased at the cost of higher signalto-noise ratio. Other existing methods of capturing HDR images either require more images to be captured at different exposure settings, or divide the sensor array into multiple sub-arrays with different cell size or neutral density filtering (Section 2.1). These observation led to this research on developing a method that does not use up more resources, yet can perform reasonably well under certain conditions. 1.3 Objective and Scope The objective of this thesis is to capture High Dynamic Range images with only one exposure with a conventional camera. We propose a method that takes a computational photography approach towards HDR imaging. We modify the light path such that some additional information is encoded or multiplexed in the image. A postprocessing step is then required to demultiplex this information. Our postprocessing step results from the observation that some of the information present in a natural image is redundant; and a suitable prior, for example the sparse gradient prior for natural images [15, 28], can help reconstruct an image, to some extent, if part of the information is missing. A tomography-like reconstruction technique will be used to reconstruct the highlight. Only a very small number (typically 8) of views will be used, thus reducing the conditioning of the system to solve which puts a restriction on the size of the high-intensity regions. 6 1.4. Outline Recovering the complete image information is very hard, if not impossible, in cases where the saturation region is large. In our proposed work, the preference would be given to producing a visually pleasing result and failing gracefully, rather than producing bad artifacts. This thesis does not address under-saturation. Regions that are too dark give no information and hence they cannot be recovered using multiplexed information approach. However, under-saturation can be avoided with a sufficiently long exposure. 1.4 Outline The rest of the thesis is organized as follows. Related work is discussed in chapter 2. The fundamental concept behind the proposed method is described in chapter 3. A few failed attempts that we pursued at onset of the project along with the result analysis are also described here. The proposed method is detailed in chapter 4, which is followed by result analysis in chapter 5. Finally, chapter 6 points to a few possible future directions of research and concludes the thesis. 1.5 Symbols Throughout the thesis, bold uppercase letters (A, B, ...) denote matrices, and bold lowercase letters (a, b, ...) denote vectors. Same letters would be reserved to denote the same entity, for example, J denotes the observed image and j denotes the vectorized form of it. We have used column-major ordering for vectorizing matrices. Scalars are denoted by italicized lowercase letters (a, b, ...), while sets are denoted by uppercase blackboard bold letters (A, B, ...). Other specific symbols are defined as they are used for the first time in the text. 7 Chapter 2 Literature Review This chapter presents a literature review of related areas. A brief analysis of high dynamic range imaging techniques is given in Section 2.1 to describe the context of this research. Since the method proposed in this thesis employs the general idea of encoding or multiplexing scene information in the image, similar publications are listed in Section 2.2. It also relates closely to image restoration and enhancement, and so brief summary of relevant research work is presented in Section 2.3. 2.1 HDR Image Acquisition Conventional cameras act as photon counters, therefore they have a linear response. However, human perception exhibits approximately logarithmic response, a consequence of the Weber-Fechner law. This is due to percepted adaptation mechanisms. A comprehensive quantitative analysis and comparison of HDR image acquisition techniques can be found in [7]. 2.1.1 Nonlinear Sensors Nonlinear sensors, for example logarithmic sensors, can also capture high dynamic range information. The logarithmic compression is achieved utlizing the exponential I-V characteristics of MOS transistors in subthreshold region. While linear sensors would accumulate charge over an exposure period, logarithmic sensors directly 8 2.1. HDR Image Acquisition convert photocurrent to voltage for readout. This very non-integrating property limits maximum possible signal-to-noise ratio (SNR) [7]. 2.1.2 Multiple Exposures Conventional digital cameras can only capture at most 103.5 : 1 contrast ratio [23]. One solution to obtain an HDR image is to take photos at different exposure settings and blend them together. Longer exposures would capture details in the dark areas of an image while the shorter exposures would capture the bright areas (Figure 2.1). Although this method was known even before mid Twentieth century [21, 8], the underlying mathematics was first formalized by Mann and Picard [19]. Debevek and Malik [6] introduced this method to the Computer Graphics community. They gave a method to recover an HDR image and camera response curve simultaneously. An improved method was published later by Robertson et al [31]. Both of these methods compute the camera’s response function, and combine all unsaturated samples of the same image pixel weighted by some noise model, instead of using some simple last sample before saturation method which would have produced artifacts. However, the biggest limitation of this process is that the scene needs to remain static and lighting conditions have to remain the same throughout an exposure sequence. Even subpixel movements of the camera would require proper image alignment before merging them into an HDR image. This approaches are also limited by ghosting and related misalignment problems, which are still largely unsolved for general cases [22]. 2.1.3 Spatially Varying Techniques On the other extreme of this space-time tradeoff lies spatially varying exposure techniques [25], or Assorted pixels [26]. Such a technique emulates multiple capture but sacrifices spatial resolution. It modifies the Bayer pattern by adding an array of neutral density (ND) filters. Thus adjacent pixels will be exposed differently; giving a number 9 2.1. HDR Image Acquisition Figure 2.1: Multiexposure HDR reconstruction example. The graph below each image shows the log-log response curve (log value vs. log luminance) of that image. The first column shows three different exposures with lowest to highest exposure times. The second column shows the images after clipping values close to 0 or 1, and then scaling the values according to their exposure time. The image on the last column shows the HDR image, after merging these three LDR images; the dynamic range has been compressed by a simple linear tone mapping. 10 2.2. Multiplexing Additional Scene Information of lower resolution images each with different exposure settings, and also offset by at least one pixel. The space-time tradeoff can be won only if the exposure setting can be manipulated on a per pixel basis. Direct readout increases SNR as in logarithmic sensors; instead synchronous (and asynchronous) self reset with multiple capture methods [7] read out values to update per pixel exposure settings such that the pixels do not get saturated. If a pixel is saturated, its exposure interval is halved, pixel reading discarded, and the pixel is exposed again. However, these techniques suffer from spatial incoherence in the image if bright objects in the scene are moving too fast. 2.1.4 Multiple Exposure for Video Conventional video cameras have the same dynamic range limitation as still cameras. As a result, trivial HDR video acquisition solutions can be constructed using multi exposure techniques; however, since there has to be a number of frames captured in rapid succession, either a high speed camera has to be used with interleaving frames having different exposure or aperture settings or a number cameras can be combined using a beam splitter to form a compound HDR camera. Kang et. al. [11] presented a technique to capture HDR video with off-the-shelf camcorders. They capture a sequence of video frames while rapidly changing the exposure settings. Since the frames at different exposure settings are captured at slighly different points in time, they use global and local registration schemes to warp the images before combining color information at dark and bright regions. 2.2 Multiplexing Additional Scene Information Often images contain redundant information which can be predicted reasonably well using suitable prior information. Since the total amount of information (e.g. in terms of number of bits) per image remains the same, some redundant information has to be 11 2.2. Multiplexing Additional Scene Information given up in order to make room for additional information about the scene. Additional information of interest can include, for example, depth or velocity of objects. The fluttered shutter approach due to Raskar et. al. [29] encodes motion by opening and closing the shutter in a binary pattern. The idea is to produce a blur pattern that preserves high frequencies and therefore can be inverted. Levin et. al. [17], on the other hand, achieves motion invariance by driving the camera in a parabolic motion on a line parallel to the motion of the object of interest. The recovery process is an inverse problem and, in most cases, it is heavily underdetermined. To improve the overall conditioning of the problem, information can be changed such that a slightly wrong reconstruction would amplify artifacts and make them detectable or measurable. Then, finding the solution simply amounts to finding right parameter values that produce a result with the smallest artifacts. Levin et. al. [15] multiplex depth by utlizing defocus blur and a coded aperture. To measure depth, they use the defocus blur kernel, whose radius at different parts of the image would be related to the depth of the scene in that part. They have designed a binary patterned aperture filter that would increase the ringing artifacts in a reconstructed image when deconvolved with a point spread function of wrong radius. This paper also uses the sparse gradient prior for natural images. They approximate the heavy tailed natural image gradient prior by optimizing L0.8 norm of image gradients. The deconvolution technique that utilizes Iterative Reweighted Least Squares is described in detail in [15, 28]. Bando et. al. [4] do digital refocusing with a single image. They perform color image segmentation and apply their proposed blur radius estimation method to measure the depth of each segment. They look for the maximum radius of blur that produces ripple artifacts with oscillation measure below some threshold. However, their method was found to fail on images with clipping due to over exposure. 12 2.3. Restoration of Clipped Signal 2.3 Restoration of Clipped Signal Probably the most challenging kind of image distortion to restore is clipping due to over- or under-saturation. Saturation occurs due to insufficient dynamic range of a sensor. For band limited 1D signals, recostruction algorithms have been proposed for when the number of missing samples is low [1], and when a statistical model of an undistorted signal is known [27]. However, neither of these approaches can be trivially extended to images because of the difficulties in modeling statistics of complex images, and especially clipped image features. In the case of color images, pixels that are clipped in one or two color channels can be faithfully estimated using cross-channel correlation [34]. This technique tries to model the pixel values as a single three dimensional Gaussian distribution. More complex techniques that utilize the linear color model [10] to model an image pixel color distribution as a combination of Gaussians are also possible. Inpainting techniques [5, 32, 33], although designed to fill-in missing pixels, are not well suited for restoration of clipped signal since they tend to smooth out (interpolate) missing pixels, that, on the contrary, should be much brighter than the neighboring pixels used for interpolation. 2.4 Deconvolution Image deconvolution is a well-explored area, yet a generalized satisfactory algorithm is yet to be developed. Richardson [30] and Lucy [18] gave the first deconvolution algorithm based on Bayesian statistics. With stronger prior knowledge, better deconvolution methods can be designed. One such algorithm for natural images is due to Levin et. al. [15, 14]; they used the sparse gradient prior for natural images [15, 28]. Joshi et. al. [10] has used the color line model — a local color statistics and used it in combination with the sparse gradient prior for deconvolving and denoising blurred images. 13 Chapter 3 HDR by Deconvolution It is often the case that saturated regions are small in size. Energy in these regions is usually too high to capture with a conventional camera. However, many of the high bits at the neighboring unsaturated regions do not get used, simply because the pixel values are not big enough to use up full bit depth. Information about the high energy regions can be transferred in a systematic way to these neighboring unsaturated areas and these unused bits can be used to hold the extra information — given that it is possible to separate original information from encoded information at these pixels. In theory, if information about saturation regions can be spread to neighboring unsaturated areas without destroying any underlying information, the process can be reversed and the saturated regions can be reconstructed. One way of performing this systematic transfer is by convolution. Convolution can help spread the energy to neighboring areas and reconstructing the original image can then be posed as a deconvolution problem [13]. Convolution is a linear operation that can be easily applied with a camera, for example, by defocus or motion. However, a suitable blur kernel is needed so that the convolution can be inverted to get the original image back. 3.1 Model An observed image J can be expressed as a convolution of the (unaffected) original image I and the convolution kernel H, J = H ⊗ I + N, (3.1) 14 3.1. Model where N is noise. In matrix notation, j = H i + n, (3.2) where i, j and n are the vectorized forms of I, J and N. In our problem, noise will be assumed to be negligible and will be ignored in the derivations that follow. The idea here is to deconvolve J in order to compute I. This can be done by solving above system of linear equations, under the assumption of some noise distribution. There are two fundamental difficulties with this approach, however: • Deconvolution is an inverse problem and therefore ill-posed. This problem can be solved, at least partially, by using a frequency preserving convolution kernel. • The solution may enhance noise since deconvolution is a inherently a sharpening operation and noise is a high frequency component. • Each pixel of J contributes one equation. The pixel value J(x, y) is expressed in terms of a linear sum of pixel values in the original image I. However, since observed values are clipped at saturated pixels, the linear relation breaks down at these pixels. These pixels do not give valid constraints, and hence they cannot contribute towards the problem formulation. Therefore with clipping, there are even fewer equations and hence the problem is underdetermined. Some prior information about the image can add more constraints to the system. For example, in natural images, gradients tend to show a heavy-tailed distribution [15, 28]; a smoothness prior that imposes this property on the solution can be added to the optimization target. Thus, in general the problem can be formulated as an optimization problem, i = arg min ( jU − HU i i 2 +λ ∇i 0.8 ) , (3.3) where jU decontes the unsaturated observed pixels, HU is a (possibly nonsquare) submatrix of H to produce values at only the unsaturated pixel locations, ∇ represents the gradient operator and λ gives the smoothness regularization coefficient. The second term above enforces the sparse gradient prior [15, 28]. 15 3.2. Clear Aperture Unfortunately, the sparse gradient prior cannot be utilized to recover a large volume of missing information. Thus this approach can only be expected to recover small saturated regions, or to produce a blurry reconstruction of saturated regions. The proposed technique using cross-screen filters will be discussed at length in the next chapter, Chapter 4. However, before we formulated that method, we designed a few other methods that helped us improve our solution to the problem. The two most significant such methods are the clear aperture method and the split aperture method. Things that we learned from these methods helped shape the proposed method. Therefore these methods are important to discuss in their own right. The first approach to be discussed below uses a clear aperture to reduce diffraction (Section 3.2), while the second one uses a split aperture to increase frequency preservation (Section 3.3). 3.2 Clear Aperture 3.2.1 Motivation Coded aperture approaches have been proven useful in many publications; for example, [29] and [15] use binary coded apertures. However, for the purpose of capturing HDR images, these apertures introduce a number of limitations: • These apertures create diffraction patterns as they tend to introduce many edges in the aperture. • The full aperture is not used, hence energy spread capacity is greatly reduced. Since energy spread is the central idea of the proposed method, blocking off a significant portion of the aperture must be considered a limitation. Since regular apertures too can create the same diffraction pattern due to diaphragm blades [15] that control the size of the aperture of a camera, we used a clear aperture instead. Initially, we hypothesized that the diffraction patterns would be minimal. 16 3.2. Clear Aperture 3.2.2 Method Bando et. al. [4] used clear aperture blurring which they found to not work for images with clipping due to saturation. Having that knowledge, the initial target was to spread energy so that pixels having values greater than the saturation limit cand spread their energy to neighboring unsaturated regions rather than get clipped. Also, since the size of defocus blur would differ depending on the depth of the scene at different pixels, initially only scenes with objects at the optical infinity were considered. The goal was to use a method similar to [4] to estimate depth and to use different kernels to deblur different parts of the image. Bando et. al. [4] used a disk function as their PSF estimate. But in our case, the PSF was measured. How we measure the kernels is described next. Kernel Measurement Figure 3.1: Clear aperture (disk) filter. 4 different radii are shown. The original set contained 11 different radii. Top: measured filter images, Bottom: Fourier transforms of the filters above. Note that the Fourier transforms are not perfect 2D Bessel functions due to diffraction and noise present in the filter measurement. The Fourier transforms also show that these filters preserve the low frequencies very well, but most of the high frequencies are destroyed. Also, the larger the kernel is, the smaller range of frequencies it can preserve. 17 3.2. Clear Aperture Actual point spread functions (PSF) or blur kernels were measured by a simple deconvolution technique, described below. To measure the defocus blur kernel, one image of a test pattern was taken in focus, and then a number of images were taken out-of-focus. Different amounts of defocus yielded PSFs at different kernel sizes (Figure 3.1). For a given pair of images: one blurred, B and the other sharp, I, the computation of blur kernel K with size k×k becomes an optimization problem. B is a convolution of I and K, B = I⊗K = K⊗I, (3.4) by commutativity of the convolution operator. ⊗ denotes the convolution operator. By definition, since the convolution operator can be expanded into multiplications and additions, one equation can be formed per blurred image pixel Bi, j , i=1,2,...,n, j=1,2,...m, where n×m is the image size, k ∑ Ii+p, j+q ·K p,q = Bi, j . (3.5) p,q=−k Stacking up all these equations for all i=1,2,...,n, j=1,2,...m, in a large linear system gives an overdetermined system, the solution of which in a least square sense gives the ˆ best fit for the PSF estimate, K, ˆ = arg min K K k k ∑ ∑ p=−k q=−k Ii+p, j+q ·K p,q − Bi, j . (3.6) 2 In practice, defocusing also introduces some small scaling in the image. Therefore, proper image registration and warping was carried out before running the optimization step. Deconvolution For deconvolution, the deconvolution method described in [15, 28] was used. This method makes use of the sparse gradient prior for natural images. 18 3.2. Clear Aperture The clear aperture kernel acts as a two-dimensional (rotated) box filter, which is a Bessel function in frequency domain. Clearly, some of the frequencies will be destroyed, but the hope was that the information could be reconstructed with the help of the sparse gradient prior. Clipping We also tried to deal with clipping in the model described in Section 3.1 by simply discarding these observed values from the optimization problem described by Equation 3.6. 3.2.3 Artificial PSFs PSF estimation was difficult since it is by itself a deconvolution problem. That is why artificial PSFs have also been explored (Figure 3.2). (a) Artificial PSF (b) Real PSF (c) (Real − Artificial) × 10 + .5 Figure 3.2: Using artificial clear aperture. In (c), 50% grey (the background) denotes no difference. 3.2.4 Results and Analysis This method worked reasonably well for very small blur kernels, smaller than 5 × 5 pixels in size (Figure 3.3), but failed in case of larger blur kernels (Figure 3.4). 19 3.2. Clear Aperture (a) Blurred (b) Ground Truth Figure 3.3: Clear aperture result. The deconvolution algorithm uses the sparse gradient prior as described in [15, 28]. p is the measured PSF number, and w is the smoothness regularizer. Each column represents deconvolution with a measured PSF with a different regularization strength. 20 3.2. Clear Aperture (a) Blurred (b) Ground Truth Figure 3.4: Clear aperture result for a large blur radius. The deconvolution algorithm uses the sparse gradient prior as described in [15, 28]. p is the measured PSF number, and w is the smoothness regularizer. Each column represents deconvolution with a measured PSF with a different regularization strength. 21 3.2. Clear Aperture From these images, it can be deduced that: • When deconvolved with a wrong PSF, the result either remains blurry if a smaller blur radius is used, or becomes oversharpened if a larger blur radius is used. • The smoothness regularization parameter dominates if it is too high. When it dominates, virtually no deconvolution takes place and the method returns the original image. Thus, blurring occurs in the bottom left portion of the grid, and oversharpening occurs in the top right portion. A suitable solution can then be found somewhere in the middle section lying between the portions described above. However, it can be observed that if an image is originally convolved with a large PSF (larger than 5 × 5), the bottom left blurred portion will overlap with the top right oversharpened portion and thus no space will be left in the middle to contain the good deconvolved result. Thus this method fails to work for large PSFs. Artificial PSFs were tested to see if fractional-sized PSFs can give a better result for certain images. However, this approach showed little or no improvement in the results. (a) Ground Truth (b) Deconvolved results (c) Blurred Figure 3.5: Clear aperture method on cases with clipping. Note that the deconvolution results contain ringing artifacts. The smoothness regularizer falls off from left to right, and so does the ringing artifact, but at the same time results become more blurry. This algorithm also failed for cases with clipping due to saturation (Figure 3.5). Image reconstruction was very poor even without clipping. Also, high energy spread requires quite a large blur kernel, but then there are more zeros in the response function and more information gets lost. The sparse gradient prior was not found to be strong enough to recover this lost information. 22 3.3. Split Aperture 3.2.5 Observations It was observed that frequency preservation makes faithful reconstruction a bit easier, since much of the information is already in the image. This led to the next method — split aperture. 3.3 Split Aperture Scene Split Aperture Figure 3.6: Concept of split aperture approach. Green (top) beam shows the light path that is in focus. On the other hand, the blue beam (bottom) goes through a lens with slightly different focal length, and creates a blurred image on the image plane. 3.3.1 Motivation Frequency preservation seemed to be the key criteria for a good reconstruction of the in-focus image. Therefore, adding high frequency components to the PSF, such as a Dirac peak, simplifies the problem. 3.3.2 Method Since convolution is a distributive operator, kernels can be added if the end image is a superimposition of two individually blurred images. That is, if B = K1 ⊗I + K2 ⊗I then B = K⊗I where K = K1 + K2 . 23 3.3. Split Aperture + (a) Disk filter PSF × = (b) Dirac Peak (c) Split aperture PSF × 10 10 + (d) = (e) (f) Figure 3.7: Simulated split aperture filter. The split aperture filter is simulated by adding 75% Dirac peak with the 25% measured disk kernel shown in (a). (d)–(e) gives the Fourier transform of the PSFs in (a)–(c). Note the significant improvement in the frequency domain — there are no zeros. The straightforward approach to implement this is to take two images — one blurred and one sharp — and add them up. This can be done before capturing the image simply by dividing the light path. By splitting the aperture, different parts can be set to have different focal lengths, one part of it focusing on the image plane, while the other one focusing at a small distance apart (Figure 3.6). This would superimpose a sharp image on a blurred one. In effect, the blur kernel for this optical system would be a linear combination of a Dirac peak filter and a disk filter (Figure 3.7). The strength of the Dirac peak can be controlled by controlling the proportion of the aperture dedicated to the in-focus part. Blur radius can be controlled by the focal length difference of the apertures (and of course the depth of the scene, but for now it was assumed that the scene is at the optical infinity). 24 3.3. Split Aperture This can be achieved by having a bifocal lens, or by introducing a second lens which has a smaller size and curves light rays very slightly. This would form a combined lens at that part of the aperture and in effect create a bifocal lens (Figure 3.6). Note that this method was developed independent of [16], but both approaches use a multi-focal lens. Simulation The simulated test cases used the measured clear aperture kernels, with a Dirac peak added at the center (Figure 3.7). 3.3.3 Results Simulated cases showed that the deconvolution results were quite good (Figure 3.8). But even in simulated cases, dynamic range increase (with faithful reconstruction) higher than 60% could not be achieved. We did not capture real images. It was found that the strength of the method was the Dirac peak embedded in the PSF. Since a Dirac peak has nonzero components for all frequencies in the Fourier domain, this bifocal filter can be easily inverted to get the original image. However, there were several limitations: • Energy spread was found not to be strong enough, resulting in a very limited dynamic range increase. • In addition, clipping was still hard to handle. If the saturated pixel value is too high, all the energy cannot be spread to neighboring areas, and clipping will occur. Test data showed that even for small saturated regions the reconstruction was really poor. Ringing artifacts were very prominent. 25 3.3. Split Aperture (a) Synthetically blurred with an (b) Recovered (c) Ground truth artificial PSF (d) Synthetically Blurred = Captured Blurred + Captured in-focus (e) Recovered (f) Captured in-focus (ground truth) Figure 3.8: Split aperture synthetic test case. In (a)–(c) (the cameraman image), the ground truth image was convolved with a mixed PSF with 20% Dirac peak and 80% Gaussian(σ =6.67) and then quantized to 12 bits to produce the blurred test image. The vignetting-type effect in the recovered image is created due to the boundary condition assumed. In (d)–(f), the blurred image has been formed by superimposing a captured in-focus image with a clear aperture defocus-blurred image. In this example test case, the in-focus image was weighted 30% and the out-of-focus image was weighted 70% when they were superimposed. The point spread function is estimated by using the appropriate measured clear aperture filter kernel, with a dirac peak added to it, the weight being the same as before. 26 3.4. Observations • Defocus blur is achieved by moving the CCD towards or away from the the focal plane; thus blurring also involves a scaling operation. Hence the blur kernel is not completely shift-invariant any more. Thus the operation cannot be expressed in terms of convolution and the idea broke down. 3.4 Observations The approaches presented in this chapter show that highlights have to be allowed to be clipped to obtain better dynamic range increase. Information about them has to be multiplexed in the image, within the unsaturated pixels, and then a postprocessing step will be necessary to demultiplex that information. These observations led to the use of cross-screen filters, which have the strengths of the approaches described so far but not the weaknesses. The technique that uses crossscreen filters is proposed in Chapter 4, and results are analyzed in Chapter 5. 27 Chapter 4 HDR with Cross Screen Filters 4.1 Motivation and Overview (a) Ground truth HDR image (b) An LDR image captured using a crossscreen filter Figure 4.1: Imaging with a cross-screen filter The deconvolution approach presented in the previous chapter has an optimization step which is highly underdetermined – there are many possible solutions to the problem, each of which can satisfy the constraints. In other words, there are many sharp images, all of which can produce the same blurred image. On one hand, increasing the error tolerance level would increase number of candidate solutions, and on the other hand, reducing it would result in overfitting and hence noise would dominate. Due to this inherent ill conditioning of the deconvolution problems, a better constrained energy dissipation and reconstruction approach is needed to construct a more reliable method. A cross-screen filter can provide just that. 28 4.1. Motivation and Overview A cross-screen filter can be used effectively to dissipate energy from saturated regions (Figure 4.1). In addition, since the filter kernel has sharp features, the filter is not band limited. Therefore, information is not destroyed, but only multiplexed within an image. This means, at least theoretically, a good reconstruction can be achieved if this encoded information can be demultiplexed properly. The cross-screen filter acts as a convolution kernel/operator, and therefore the convolution equation (Equation 3.2) applies. Although a deconvolution problem can be massively underdetermined, fortunately, a cross-screen filter has several properties that help us pose a better constrained optimization problem. It transmits most energy directly, only a small part is spread into the star-shaped glare patterns; and therefore the star-shaped glare patterns are primarily created around the bright points in the scene, as can be seen in Figure 4.1(b). The brighter the point, the stronger the glare pattern. Glare rays due to a cross−screen filter give aggregate information Saturated Region Figure 4.2: Glare patterns give aggregate information The glare patterns around a clipped saturated region multiplex aggregate information about that region inside a number of glare rays (Figure 4.2). By using natural image statistics, this glare can be separated from the underlying scene detail and the aggregate information can be recovered. Each glare ray around a saturated region gives one aggregate 1-D view of the 2-D clipped region (Figure 4.3). All of these views can be combined to form a tomographylike reconstruction problem to reconstruct the unknown clipped regions. Once satu- 29 4.1. Motivation and Overview Figure 4.3: Different glare rays give different aggregate views of the saturated region rated regions are reconstructed, an optimization step helps restore the glared regions of the observed image. 4.1.1 Chapter Outline The following sections begin by analyzing the cross-screen filter and constructing an empirical model of the filter in Section 4.2. Then a model of pixel interactions due to this filter, namely, the limited interaction model, is derived in Section 4.3. In the rest of this chapter, this model and a few suitable priors map this problem into a tomography-like reconstruction problem. Section 4.4 gives an overview of the whole algorithm, which is detailed in the remaining sections. Section 4.5 details the first step, how to remove unsaturated pixel interactions. Section 4.6 describes how natural image statistics and other empirical constraints can help compute aggregate information about the saturated regions, which are then combined in a linear system to solve a tomography-like reconstruction problem in Section 4.7 in order to reconstruct the saturated pixels. Section 4.8 gives a brief description of how to recover the glared region. 30 4.2. Image Formation with Cross-Screen Filters 4.2 Image Formation with Cross-Screen Filters Figure 4.4: A 4-point cross-screen filter. A cross-screen filter (also known as star filter) is a transparent photographic filter with parallel scratch marks or grooves on its surface (Figure 4.4). When mounted in front of a camera lens, it creates linear glare in a number of directions. These glare directions are perpendicular to the direction of the grooves on the surface, i.e. horizontal grooves on the filter produce vertical glare on the image. This glare is usually very faint and hence stars are often noticeable only around very bright areas in the scene. Determining the precise underlying physics of star patterns is beyond the scope of this thesis, but the pattern is generated by one or more of the following effects: • Diffraction grating, • Prismatic effect, and • Multiple reflection within the filter. Most of the light passes through the unscratched path of the filter. Part of the light hits the grooves and gets blurred along one dimension (Figure 4.5). Usually the glare patterns generated around a point are equally spaced. The filter is called p-point if it creates p glare rays around a bright point where p is even since opposite rays are created by the same set of grooves on the surface. Since light is p linear, a p-point cross-screen filter can be modeled as a combination of individual 2 31 4.2. Image Formation with Cross-Screen Filters Parallel beam of light Lens Lens Glare Parallel beam of light Sharp Sharp image image Glare Cross−screen filter Aperture Cross−screen filter Focal plane (a) One groove (Side View) Parallel beam of light Aperture Focal plane (b) One groove (Top View) Parallel beam of light Lens Lens Glare Sharp Sharp image image Glare Cross−screen filter Aperture Focal plane (c) Multiple parallel grooves (Side View) Cross−screen filter Aperture Focal plane (d) Multiple parallel grooves (Top View) Figure 4.5: How blurring occurs due to a 2-point cross-screen filter. (a) For a horizontal groove, the incoming beam spreads out vertically. (b) But it stays on the same course horizontally, that is, when viewed from top the beam would look as if it has not changed its direction. Therefore it will be focused horizontally at the image plane. (c) Multiple parallel grooves provide the same glare pattern which will be superimposed (Section 4.2.1) on the same vertical line at the focal plane. Different colors are used above to identify parts of the beam of light. (d) Horizontally they will be in focus just like the single groove case. 2-point cross-screen filters. First, we list some assumptions regarding a 2-point cross-screen filter and their impact: • It is a one dimensional filter; therefore images can be processed on a row-by-row basis; there will be no crosstalk between adjacent rows if this filter is applied. This assumption is important for modeling the glare gradient perpendicular to the glare direction. • It has a Dirac peak component (Figure 4.2) which helps preserve all frequencies. Since high frequency information is preserved, image detail is not completely lost, except at the clipped regions. 32 4.2. Image Formation with Cross-Screen Filters Figure 4.6: Cross-screen filter profile. A point light source seen through an 8-point crossscreen filter. 8 rays are emanating from the center along 4 glare directions. This filter profile has been estimated by taking a photo of a point light source placed in front of a black background. Since it is hard to get a true point light source, the captured image gives a reasonable estimate. • If the lens-flare around the Dirac peak of the PSF is ignored, it is assumed that each pixel can interact with other pixels only along a number of glare rays. For a p-point cross-screen filter, p glare rays can be seen to emanate from each pixel. • There is a sharp drop beyond the central Dirac peak, which amounts to 3–4 orders of magnitude. This observation leads to a simplification of the pixel interaction structure, and helps to greatly reduce the complexity of the problem. • It has been been empirically found to exhibit an exponential dropoff beyond the Dirac peak (Section 4.2) with some noisy oscillating features (Figure 4.7). These oscillations will be ignored in our model. This exponential approximation is critical since this way superimposed glare from multiple saturated pixels gives a glare with similar exponential dropoff, only with a different magnitude (Section 4.6.1). This allows formulating the saturated region reconstruction problem as a tomography-like reconstruction problem. • Due to the glare, objects in the scene situated beyond the image frame can affect the image. This is visible as an elevated black level in images taken with a crossscreen filter (Figure 4.8). For example, strong light sources that are situated 33 PSfrag 4.2. Image Formation with Cross-Screen Filters 0 10 Measured Estimated −2 10 −4 10 −6 10 Figure 4.7: Cross-screen filter profile estimation. The exponential falloff approximates the trend of the falloff and ignores the wavy details. outside the image frame, but are close by, would produce glare rays. The model of the blur kernel will be formalized in the next section, and based on that, the limited interaction model will be developed in Section 4.3. This limited interaction will lead to the method we are proposing in Section 4.4. 4.2.1 Split Kernel Assuming that all the parallel grooves on the filter have the same profile, it can be shown that the glare patterns will be in focus. This is because light is distorted in the same way, and parallel distorted rays get focused at the same point on the focal plane (Figure 4.9). Since glare patterns are in focus, the aperture is effectively divided up into a number of portions. • Most of the light passes right through the unscratched part and gets focused by the lens at the image plane in the usual manner. This is represented by the Dirac Delta function in the filter profile (Figure 4.10). This portion of the incoming light from the scene can be thought of being convolved with a Dirac Delta function, i.e., incoming light remains unaltered. Thus, in effect, the largest part of the 34 4.2. Image Formation with Cross-Screen Filters (a) Without Filter (b) With a 6-point star filter Figure 4.8: Black level is elevated in the filtered image. These two images were taken under the same conditions except the filter. Parallel beam of light Lens Glare Sharp image Glare Cross−screen Aperture filter Focal plane Figure 4.9: Glare is in focus. The scratch marks create diffraction patterns, and/or prismatic patterns, and these patterns are in focus because of similar rays on these patterns being parallel for a parallel input beam. For an object not at infinity, similar behavior can be expected. aperture remains dedicated to the Dirac Delta component of the cross-screen filter. This is why the star-shaped glare patterns are visible only around the bright objects in the scene. This way most of the image remains unaltered. This is exactly the strength of the method we are going to propose in Section 4.4. • The grooves of the cross-screen filter are responsible for producing glare along different directions where the direction is determined by the orientation of the 35 4.2. Image Formation with Cross-Screen Filters grooves. Grooves that are parallel to one another produce the same glare pattern, and all these instances get focused on the image plane. log scale log scale α +β β Approximate PSF log scale α = X + Delta function X β Exponential dropoff X Figure 4.10: Split kernel. Our estimated kernel can be split into Dirac delta function and an exponential dropoff. Note that the vertical axes of the plots are in log scale. Therefore, a 2-point cross-screen filter PSF can be split into two components (Figure 4.10): 1. A Dirac delta function representing the peak, which acts like a scaling factor α , and 2. A low-pass exponential filter L with slope or falloff rate parameter m, scaled down to a very low amplitude β , L(x) = e−m|x−x0 | , (4.1) where x0 is the location of the saturated pixel. In this formulation, the cross-screen filter acts as an aperture filter implying that this approach is another implementation of the split aperture approach (Section 3.3). Evidently, the cross-screen filter acts as a convolution operator, and therefore the convolution equation (Equation 3.2) applies. The convolution matrix H can be expressed as a combination of two components mentioned above: H = α I + β L, (4.2) where I is the identity matrix. The parameters α and m above are specific for a particular glare direction (Figure 4.11) on a particular cross-screen filter, and these quantities are measured. Note that it is important to have α much greater than β so that most of the cross-screen filter is dedicated to the Dirac Delta function. 36 4.3. Limited Interaction Model (a) Measured Filter (b) Measured after rotating the filter 45◦ CCW Figure 4.11: Variation in strengths of glare rays. The kernels have been measured with a point light source. The image on the right verifies that the variation in the strengths are due to filter properties only. Figure 4.11 shows a measured 16-point filter where all the glare rays are not equally bright. This can happen because the appearance of the glare rays depend on the properties of the grooves on the surface of a filter. Modeling the profile of the grooves and the underlying physical phenomenon is beyond the scope of this thesis. However, we have found that our approximation is sufficient for this problem. 4.3 Limited Interaction Model First, a successive approximation method to deconvolve images blurred by a crossscreen filter will be derived for cases without any clipping due to saturation. Then, the limited interaction model will be proposed. When there is no clipping due to saturation, general convolution described by Equation 3.2 remains applicable for the split kernel model (Equation 4.2). Combining these 37 4.3. Limited Interaction Model two equations gives the split kernel convolution equation, j = α i + β L i. (4.3) Since β is very small to which L is scaled down, and since the low-pass filter L removes high frequency information from an image, only a coarse approximation of i is needed to compute the term L i. Thus, a successive approximation approach can be developed. Rewriting Equation 4.3 gives i= 1 (j − β L i) . α (4.4) This gives a simple fixed-point iteration method for computing i, i(0) = j, and 1 β j − L i(k−1) , for k > 0 α α 2 1 β β β = j− L j+ L j− L α α α α i(k) = = 1 α k−1 ∑ i=0 β − L α 3 j + · · · + (−1)k β L α k j (4.5) i j, where i(k) for k = 0, 1, . . . are the successive approximations of i. Since i is unknown, j needs to be successively convolved and alternately added or subtracted to compute i from j as shown above. All the terms except the first one on the right hand side are thus acting as correction terms that successively approximate i. The term interaction will be used to express the successive correction terms due to blurring. k-th order interaction is the interaction among observed pixels that is described by the k-th correction term, k β − L j. α In the above iterative method, the k-th error term is β i(k) − i = − L i(k−1) − i . α Since β L α Intuitively, = ∞ β L α (4.6) β and β ≪α (Section 4.2.1), above iteration converges fast. α transfers only a small amount of energy. So, the correction terms 38 4.3. Limited Interaction Model − + (a) Synthetically blurred j (b) (d) (e) Ground truth i β L j α First iteration i(1) ×10 (c) (f) β L α 2 j ×100 Second iteration i(2) Figure 4.12: Limited interaction — simulated test case. (a) is the synthetically blurred image and (d) is the ground truth. (e) shows the image obtained by removing only the first order interactions (b) from the blurred image. Note that it is hard to find any visible difference between (e) and the original image (d). Correction due to the second order interactions is shown in (c). The result of removing these second order interactions is shown in (f). (f) does not have any visible improvement over (e). All computations were done on HDR images since successive approximation only works when there is no clipping. Since strength of higher order interactions drop off rapidly, image intensities in (b) and (c) are multiplied by 10 and 100. 39 4.3. Limited Interaction Model have diminishing energy, i.e., β L j > α β L α 2 β L α j > 3 j > ··· . (4.7) In practice, good results were obtained even with a single iteration (Figure 4.12), i = i(1) + O β α ≈ i(1) 1 j − β L i(0) α 1 = (j − β L j) . α (4.8) = This is equivalent to removing only the first order interaction. This model of ignoring higher order interactions will be referred to as the limited interaction model. When capturing natural images, some of the image intensities may get clipped due to saturation. Saturation destroys information, and the linear relation as described by the split kernel convolution equation (Equation 4.3) breaks down. That means the limited interaction model would fail if provision for clipping is not made. Therefore, this model needs to be applied to saturated and unsaturated pixels separately. Some image j can be expressed as a combination of saturated and unsaturated pixels (Figure 4.13), j = j U + jS , (4.9) where at any point (x, y): jU (x, y) · jS (x, y) = 0. Unless otherwise stated, throughout the rest of the thesis, subscript U denotes unsaturated pixels, and S, saturated pixels. Note that i is the true radiometric map (HDR image), and therefore it does not have saturated pixels. S and U gives locations of saturated and unsaturated pixels in the observed image j. In the observed image j, saturated pixel values jS are clipped. Equation 3.2 cannot be applied on jS . However, unsaturated pixels do follow that relation. The contribution, or the energy transferred, from other pixels to an unsaturated pixel due to pixel interaction is a sum of contributions from saturated pixels and unsaturated 40 4.3. Limited Interaction Model = + i iS iU = /100 + j jU jS Figure 4.13: Saturated and unsaturated pixels. Red pixels denotes pixels that do not belong to that part. i is the true HDR image, iS is the clipped pixels we need to recover. j is the captured LDR image, and therefore the saturated part of it, jS , contains only clipped pixels. jU contains the unsaturated part and the glare. pixels. When these two types of interaction are modeled separately, the split kernel convolution equation (Equation 3.2) becomes U jU = H iU + β L iS , (4.10) U where = denotes that the equality holds for the unsaturated pixels only. The first term models the effect of unsaturated pixels on unsaturated pixels (U→U), and the second term models the effect of saturated pixels on unsaturated pixels (S→U). Figure 4.14 illustrates this relation. Contributions to saturated pixels from unsaturated pixels (U→S) are ignored because they are too small to be considered. True values at saturated pixels are so large that these small contributions do not make any noticeable difference. Contributions to saturated pixels from other saturated pixels (S→S) do not affect our 41 4.3. Limited Interaction Model U = jU + H iU β L iS ×10 Figure 4.14: Split kernel convolution when clipping occurs. Saturated pixels are marked with red. The equality does not hold at these pixels. algorithm since such glare would get clipped due to saturated and hence cannot be captured or measured. However, there can be some clipped pixels that would not get clipped without the (S→S) interactions. This is why it cannot be assumed that the original pixel intensity at any clipped pixel is greater than or equal to the cliiping value. All four types of contributions are summarized in Figure 4.15. S→U Saturated Pixels (ignored) S→S Unsaturated Pixels Few Many Clipped Not clipped Magnitude usually high U→S (ignored) U→U Magnitude usually low Figure 4.15: Four kinds of pixel interactions. Contributions to the saturated pixels are ignored because they are too small compared to the true values at these pixels. 42 4.4. Method 4.4 Method An overview of the method to be presented in the rest of the chapter is given below. 1. First, first interactions among the unsaturated pixels (U→U) will be removed (Section 4.5). 2. Next, glare patterns will be separated from scene data using natural image statistics, and then aggregate views, or more precisely, line integral estimates, of the saturated regions from p glare directions will be computed (Section 4.6). p glare directions, these ob2 servations will be combined into a tomography-like reconstruction step in order 3. Once the line integral estimates are known along all to compute iS (Section 4.7), 4. Finally, iU will be recovered using the line integral estimates and the computed iS values (Section 4.8). The rationale behind performing the last two steps above separately is that they need different objective functions and different sets of constraints for optimization (Section 5.4). 4.5 Removing Unsaturated Pixel Interactions The effect of first interactions among unsaturated pixels (U→U) can be easily reduced or almost entirely removed. We have modeled the cross-screen filter, H, as a split kernel; i.e., convolution with this kernel can be modeled as a linear combination of the original image and a blurred image convolved with an exponential dropoff profile, L (Equation 4.2). When considering only unsaturated pixels, this can be expressed as H iU = α iU + β L iU . (4.11) 43 4.6. Glare Estimation Therefore, the split kernel convolution equation (Equation 4.3) becomes U jU = (α iU + β L iU ) + β L iS , (4.12) U where = denotes that the equality holds for the unsaturated pixels only. This relation may not hold at the saturated pixels because the left hand side is zero while the right hand side may have some nonzero value due to U→S interactions. The limited interaction model (Section 4.3) dictates that U→U interaction can be well estimated with jU . Let jU denote the observed image after removing U→U interactions, U α jU = jU − β L iU . U (4.13) = α iU + β L iS . Using Equation 4.8, jU can be computed without knowing iU , α jU ≈ jU − β L jU . (4.14) 4.6 Glare Estimation After removing glare due to unsaturated pixel interactions, only the glare due to S→U interactions, g, remains at the unsaturated pixels. Rearranging Equation 4.13, we get U g = β L i S = α j U − iU . (4.15) In the following sections, the first half of this equation will be utlized to develop a model of g, and then this model will help estimate glare using natural image statistics. The second half of this equation will be used in Section 4.8 to estimate iU by subtracting the estimated glare. 4.6.1 Model of Glare Equation 4.15 shows that the glare at a pixel (x, y) results from the contributions from all the saturated pixels situated on the same row. If there are N saturated pixels on this 44 4.6. Glare Estimation Agg β lL rega log scale te g lare β e−m(x−xL ) lL (x) Glare due to saturated pixels x0 x1 x2 x3 x4 = xL x X Saturated pixels Mixed segment Unsaturated segment Figure 4.16: Glare contribution from one side of an unsaturated region. x is an arbitrary point in an unsaturated segment. x0 , x1 , . . ., x4 are the saturated pixels situated to the left of this segment. xL = x4 is the closest saturation to the left. Each saturated pixel creates its own glare profile, all of them get summed up and only an aggregate glare can be seen in the captured image. Like glare due to only one pixel, the aggregate glare also has an exponential falloff profile and an aggregate energy lL , which we measure. row and they are situated at locations x0 , x1 , . . ., xN (Figure 4.16), then, N g (x, y) = ∑ β e−m|x−xi | iS (xi , y) . (4.16) i For the sake of clarity, y will be omitted in the equations that follow. If we separate the saturated pixels into two groups based on whether they are situated to the left of x or to the right, we get g(x) = ∑ β e−m(x−xi ) iS (xi ) + ∑ β e−m(xi −x) iS (xi ) xi >x xi <x = β e−m(x−xL ) ∑ e−m(xL −xi ) iS (xi ) + β e−m(xR −x) ∑ e−m(xi −xR ) iS (xi ) xi <x (4.17) xi >x = β e−m(x−xL ) lL (x) + β e−m(xR −x) lR (x), 45 log scale 4.6. Glare Estimation β lL g β lR β e−m(x−xL ) l L β e−m(xR −x) lR xL x xR X Unsaturated segment Saturated pixel Saturated pixel Figure 4.17: Glare contribution. x is an arbitrary point in an unsaturated segment xL < x < xR . xL (xR ) is the closest saturation to the left (right). Line integral terms lL and lR capture the aggregate energy on each side of the unsaturated segment. Adding up these two aggregate exponential falloff profiles gives g, the resultant glare over this segment. where lL (lR ) denotes the line integral – the aggregate energy – to the left (right), lL (x) = ∑ e−m(xL −xi ) iS (xi ) , xi <x lR (x) = ∑ (4.18) e−m(xi −xR ) iS (xi ) , xi >x and xL (xR ) gives a reference location to the left (right). Without loss of generality, xL (xR ) can be assumed to be the location of the closest saturation to the left (right) (Figure 4.17). The equation above shows that if more than one such exponential falloffs from a single side are superimposed, the result will also have an exponential falloff profile with the same slope. Similarly, gradients of these glare profiles will have an exponential profile. Moreover, glare at each unsaturated segment is a linear combination of two exponential functions described by lL and lR . 46 4.6. Glare Estimation 4.6.2 Estimating Line Integrals Estimating line integrals directly is difficult because it requires separating the unknown original image i from the observed image j affected by the filter H. However, natural image statistics can provide strong prior knowledge about iU . the sparse gradient prior for natural images [15, 28] makes a reasonable assumption about natural images that they contain sharp edges and regions between the edges are smooth. That is, image gradients are sparse — only a small number of pixel gradients have magnitudes significantly greater than zero. Therefore, natural image gradients follow a heavy-tailed distribution, which can be approximated by a Laplacian distribution, Dy i(x) ∼ Laplace(µ = 0, b), (4.19) where Dy is a partial derivative operator in the direction orthogonal to the glare direction, µ = 0 is the location parameter and b > 0 is the scale parameter. Therefore, the probability density function is given by f (Dy i(x) µ = 0, b) = 1 − |Dy i(x)| b . e 2b (4.20) Then, the maximum likehood estimator of Dy i can be computed as, Dy i = arg max log ∏ Dy i = arg min Dy i Dy i x 1 1 − |Dy i(x)| b e 2b (4.21) , which amounts to an L1 optimization in the gradiant space — a convex optimization problem [2]. By combining Equations 4.13 and 4.17, and then taking the gradient along Y, we get an expression for image gradients in terms of glare, α Dy iU (x) = α Dy jU (x) − β em(x−xL ) Dy lL (x) − β e−m(xR −x) Dy lR (x), (4.22) where Dy lL (x) and Dy lR (x) are the unknowns. These quantities remain the same for all pixels within some unsaturated segment. Using Equation 4.21, we can compute these 47 4.6. Glare Estimation quantities by doing an L1 optimization over that segment in the gradient space, i.e., Dy lL = arg min α Dy jU (x) − β em(x−xL ) Dy lL (x) − β e−m(xR −x) Dy lR (x) . 1 T T T Dy lR [Dy lL Dy lR ] (4.23) Note that the unsaturated segments at the left (right) border of an image will have a zero lL (lR ). Because of noise and scene edges, image gradient distributions can have a nonzero central tendency or become less peaked. In such cases, the L1 optimization may give wrong estimates. since dispersion increases as a distribution gets less peaked, the inverse of some appropriate dispersion measurement would give the level of confidence in each estimate. By introducing these confidence values as weighting factors in the linear system, we can ensure that erroneous estimates have less effect on the final solution. Once the line integral gradient estimates Dy lL and Dy lR are computed, line integral estimates lL and lR are computed by a simple least square fit, l D l lL y L − gy L = arg min σ −1 T lR D y lR lR [lL T LIR T ] , (4.24) 2 where gy is the gradient operator that takes into account the reference locations xL and xR when computing the gradient, and σ −1 is a diagonal matrix with the confidence measurements on the diagonal. A few constraints that help make the estimation process more robust are described below. Darkest Pixel Constraint At each pixel, estimated glare must be less than or equal to the observed value, i.e., g(x) ≤ jU (x). (4.25) Note that due to the presence of noise, this constraint cannot be a hard constraint. 48 4.7. Recovering Saturated Regions Left-Right Agreement Constraint Lef t es log scale tim ate loc Rig us loc st ht e us te ima β lL β lR xR x xL X Saturated segment Figure 4.18: Left-right agreement constraint. Loci of left and right line integrals of a saturated segment must agree at some x within the saturated segment. The value of lL (lR ) depends on the selection of reference location xL (xR ). The locus of all values of l will too have an exponential profile with the same slope (Figure 4.18). For a given saturated region, lL and lR represent the same amount of energy, so they should agree at some x within the saturated region. Therefore, for each saturated segment, the following should hold, lL em|x−xL | = lR em|x−xR | . (4.26) 4.7 Recovering Saturated Regions The problem of computing iS is by contruction very similar to the tomographic reconstruction problem. Once all the line integral estimates are computed, estimation of iS is formulated as a linear least squares optimization problem, e.g., iS = arg min σ −1 (A iS − b) iS 2 , (4.27) where A is a sparse matrix that gives the line integral in terms of elements of iS , b contains corresponding estimates and σ has corresponding standard deviations along its main diagonal. 49 4.8. Removing Glare This linear system is often under-constrained, and a smoothness prior is necessary to make up for the missing information. The highlights should follow the sparse gradient prior [15, 28], which can be imposed by minimizing the L1 norm in the gradient domain. After adding this smoothness term, we get, iS = arg min σ −1 (A iS − b) 2 +λ iS ∇iS 1 , (4.28) where ∇ is the gradient operator and λ gives the regularization coefficient. Boundary pixels around saturated regions are included in gradient computation for a better reconstruction. The Iterative Reweighted Least Squares [14] method is applied for L1 optimization. 4.8 Removing Glare Once the highlights are reconstructed, they are convolved to compute glare which is then subtracted from jU . By rearranging Equation 4.13, we get, U iU = j U − β L iS . α (4.29) Note that to remove glare, iS estimation process is run again, but with a small smoothness regularization coefficient. This tends to produce a noisy highlight reconstruction, but when convolved it matches well with estimated glare. Section 5.4 in the next chapter presents an example that illustrates this idea. Glare removal is not perfect since the estimated PSF does not model the wavyness present in the real blur kernel (Figure 4.7). A possible future direction of research that can potentially provide a fix is presented in Section 6.2.3. 50 Chapter 5 Results In this chapter, a few results of the algorithm proposed in the previous chapter is presented. The proposed algorithm focuses on estimating glare and recovering the saturated regions. Quality of reconstruction depends on how well glare is estimated and how well the saturated regions are reconstructed. Therefore, for each test case, these two aspects will be discussed. First, the experiment environment is described in Section 5.1. The discussion on results starts from Section 5.2 with a few simulated test cases which will help analyze the method’s performance under varying conditions such as different background texture, noise and quantization. The later sections show a few real world test cases. Finally, we discuss the known limitations of this algorithm in Section 5.6. 5.1 Environment HDR reference images were created by merging a series of 16 bit raw images captured at different shutter speeds, ranging from 1/8192-th of a second to 32 seconds. Aperture setting varied from test case to test case. Raw images were converted to PNG format using dcraw 8.77 and ImageMagick 6.3.5, using the following commands: dcraw −c −o 1 −4 −v −w $ f i l e > $ b a s e . ppm c o n v e r t $ b a s e . ppm $ b a s e . png All test images were captured with a Canon D40 digital SLR camera placed on a tripod. We have found this camera to have a linear response curve, therefore we developed a 51 5.2. Simulated Test Cases simple HDR merging utility that neither tries to find the camera curve, nor requires one previously generated. Some of the test images were bilinearly scaled down from their original size to reduce the size of the problem. In all captured test cases, images have been manually aligned so that one of the crossscreen filter patterns align with the x-axis. First, while taking photos with the camera, we tried to manually orient the filter correctly so that one of the directions of grooves on the filter is horizontal. Any fine tuning is done after the images were downloaded, and merged if necessary, by simple image rotations with bilinear sampling. We have used an 8-point cross-screen filter to capture blurred images. Simulated test images were generated using a synthetic 8-point cross-screen filter PSF. We have implemented our code on Matlab 7.8.0 (R2009a) and used the linear solvers provided in the Matlab Optimization Toolbox. 5.2 Simulated Test Cases Simulated test cases help understand the strengths and the weaknesses of the algorithm presented in this thesis. In each simulated test case, an artificially created or captured high dynamic range reference image is blurred with an synthetic cross-screen filter PSF, added photon shot noise to, and then quantized to 10 bits unless otherwise specified to produce the test image. Since only artificial PSFs have been used, the Wavyness in the filter profile has been ignored. The first test case is shown in Figure 5.1. A number of disks with different strengths are placed in front of a dark background. Note that the large saturated disk at the top actually has two disks superimposed one or the other. The effect is easily noticeable at the glare; in each direction, the middle glare rays are brighter than the rest. We have also tested the algorithm on synthetically blurred real HDR images. One such example is presented in Figure 5.2. The HDR image of some billiard balls on a table 52 5.2. Simulated Test Cases (a) Blurred LDR /1000 (b) Recovered HDR (b) Glare around the double disk /1000 (c) Original HDR Figure 5.1: Simulated Test Case. Note the double-disk, pointed by a cyan arrow in (a). The double disk gets clipped in the blurred LDR image, but by observing the glare around it one can realize there are two disks there. The actual shape of this highlight is shown in (d). The glare pattern around it, marked by a cyan box in (a), is shown in (b). The central brighter disk is creating thin strong glare, while the larger disk is creating a dimmer thick glare. Also note the tomographic reconstruction artifacts shown in the inset in (c). 53 5.2. Simulated Test Cases /100 /100 (a) Artificially blurred × 10 (b) Original HDR /100 /100 (c) Recovered HDR × 10 (d) Recovered HDR Figure 5.2: Simulated test case. A captured HDR image has been convolved with an artificial 8-point cross-screen filter to create our test data. All the highlights are created through one or more specular reflections of the sun. 54 5.2. Simulated Test Cases on a bright day has been used. Since the objects are shiny and the surfaces are curved, small specular reflections and inter-reflections of the sun can be seen. As a result, the highlights are very small but very bright. The reconstruction is quite well even after the fact that the glare patterns are relatively faint and the image has background texture and noise. Reconstruction artifacts can be seen only when the pixel magnitudes are multiplied by 10 in Figure 5.2(c). 5.2.1 Effect of Noise Noise degrades the estimation process. Presence of high noise disturbs the estimation process and the highlight recovery step “discovers” some detail that is not there. Figure 5.3 shows reconstruction results for different simulated additive noise added to the same simple test case. As expected, the reconstruction degrades as noise becomes stronger. Although the reconstruction can be noisy, the reconstruction can adequately capture the total energy of the highlight. This is evident from the fact that glare removal step removes glare well enough. As additive noise is increased, the estimation and reconstruction process slowly degrades to a point where noise dominates and therefore no energy can be estimated, as can be seen on the last row. It should be noted that when noise becomes dominant, the solver produces some nonexistent pattern in the reconstructed highlights. However, the patterns created are not completely random noise patterns because of the smoothness constraint. Figure 5.3 shows a comparison between additive and multiplicative noise components. Effect of Quantization Quantization is also a form of noise; and this too degrades the estimation process. As before with shot noise, reconstruction quality degrades if test image is quantized to fewer bits. 55 5.2. Simulated Test Cases (a) Blurred image, (b) Reconstruction, (c) Blurred image, 0.01% (d) Reconstruction, no noise or quantization no noise or quantization noise, 8 bit quantization 0.01% noise, 8 bit quantization 0.1 Additive Uniform Noise Mean 0.05 0.02 0.01 0.005 0.002 0.001 None None 14 13 12 11 10 9 8 7 6 5 Quantization (e) Additive Noise vs. Quantization Figure 5.3: Effect of additive noise and quantization. The top row show two test cases with glare removal. It is evident that with high noise the glare detection process suffers heavily, and as a result the highlight reconstruction gets noisy. Also, these results also show that quantization affects less strongly than noise. Same noise pattern was used for computing results with different quantization. A default multiplicative noise of 0.01 was used in all the cases above. The red pixels in the recovered HDR images indicate a negative value in the reconstruction. 56 5.2. Simulated Test Cases (a) Blurred image, (b) Reconstruction, no noise or quantization no noise or quantization 0.1 Additive Uniform Noise Mean 0.05 0.02 0.01 0.005 0.002 0.001 None None 0.001 0.002 0.005 0.01 0.02 0.05 0.1 0.2 0.5 Multiplicative Uniform Noise Mean (c) Figure 5.4: Effect of noise. Multiplicative noise shows little impact compared to additive noise. Since photon shot noise is poisson noise, it is proportional to luminance, and therefore is multiplicative. This shows that the algorithm is expected to perform reasonably well in the presence of shot noise. The synthetic test images above were not quantized in order to observe the pure effect of noise. The red pixels in the recovered HDR images indicate a negative value in the reconstruction. 57 5.3. Effect of Scene Brightness Figure 5.3 shows reconstruction results for different number of quantization levels with the same simple test case. As expected, the reconstruction degrades with fewer quantization levels. 5.3 Effect of Scene Brightness Scene brightness has a significant impact on the performance on the algorithm. Since most of the highlights are either light sources or specular reflections, the rest of the scene usually is not so bright. Glare is easy to estimate if the background is relatively dark and the glare is prominent. However, in terms of quality of the reconstruction of the glared region, brighter background is favorable since the bright background texture can mask some artifacts in reconstruction. 5.3.1 Day Shots (a) Blurred LDR (b) Recovered HDR (c) Ground Truth Figure 5.5: Day shot example. Glare removal is not accurate but works well since the artifacts are barely noticeable. Day shots are comparatively easy in terms of removing glare, since glare is barely visible anyways, a rough removal of the glared region looks quite good; when compared with outputs from night shots with the same effort. On the other hand, since only a faint glare can be seen, saturated region reconstruction becomes less accurate. Often glare goes below the noise level and the estimation process would fail. 58 PSfrag 5.3. Effect of Scene Brightness 5.3.2 Night Shots (a) Blurred HDR (b) Recovered HDR (c) Ground Truth Figure 5.6: Removing glare from a night shot. Note that the wavyness patters are more visible in a recovered night shot. Also note the glare pattern at the top-right corner of the image resulting from a light source lying outside the image frame. Proposed method does not handle this case; but a possible solution subject to further research is presented in the Future Work (Section 6.2.2). Night shots have clear glare rays and little intervening background structure, thus making the glare estimation process easier. However, this also presents a few challenges, • Since the glare patterns are brighter than scene texture, more bits are allocated for the glare patterns rather than scene detail, effectively reducing number of quantization levels for scene detail in the glared region. Even if glare can be well approximated, due to low effective bit depth, reconstruction will be poor. • Since photon shot noise is proportional to luminance, glared pixels will have higher noise compared to neighboring non-glared pixels. When glare is removed, the recovered scene details will have a poor signal-to-noise ratio and the noise will become visible. • The blur kernel is not perfectly exponential, there are some oscillating features on it (Figure 4.7) which were ignored while designing the kernel model. When glare is estimated, the algorithm finds the best fit in some sense; and when this estimated glare is subtracted from observed image to remove glare, pixels where 59 5.3. Effect of Scene Brightness (a) Recovered HDR (b) Blurred Figure 5.7: Removing glare from a night shot. Note the tomographic reconstruction artifacts. The inset images in (b) show the short exposure reference images. The highlight pointed by the cyan arrow creates no glare signature because its brightness is close to clipping value, the reconstruction algorithm fails to recover this highlight. glare is overestimated may end up getting negative values if the underlying scene structure is not bright enough to mask the error in estimation. The night shot presented in Figure 5.3.2 shows such an example. Alternate bright and dark patches in the glared area of the recovered image clearly demonstrates that this artifact has occurred due to subtracting the best fit of the estimated glare profile from the image. Figure 5.7 shows another night shot. The saturated regions are pretty large. The saturated region pointed by a cyan arrow produces no glare, that’s why it cannot be reproduced. The central large saturated region produces a glare pattern pretty similar to the double-disk saturation in Figure 5.1, and the algorithm quite successfully could predict the energy distribution within the highlight. Note the tomography artifacts around both of the recovered highlights shown in the insets, similar to the ones found in Figure 5.1. The test case presented in Figure 5.8 is quite interesting because it shows that the algorithm could recover the light shade’s structure which cannot be seen in captured HDR image, probably due to the glare of the optics used. 60 5.4. Effect of Smoothness Regularizer (a) Recovered HDR (b) Blurred / Reference Figure 5.8: Recovering hidden detail. This image is interesting because it could recover the light shade’s structure which cannot be seen in captured HDR image, shown in the inset of (b), probably due to glare of the optics used. 5.4 Effect of Smoothness Regularizer Recall that we added a smoothness term to account for the missing information (Section 4.7). The tomography-like linear system is underconstrained; by utilizing the sparse gradient prior we get the additional constraints. However, how smooth the reconstructed highlights would be depends on the strength of the smoothness regularizer. The best glare removal is produced by a very noisy highlight estimate, while the recovered highlight that is closest to the original may not remove glare properly (Figure 5.9). 5.5 Failure Case: Napa Valley Image The Napa Valley image is a test case where the glare detection algorithm fails (Figure 5.10). The reason is that the gradients of the glare in the sky due to the setting sun has a close to exponential dropoff thoughout the sky. The glare estimation process heavily depends on image gradient not following an exponential profile, so it fails in the case of this image. 61 5.5. Failure Case: Napa Valley Image (a) Blurred (b) Ground truth (c) Recovered HDR × 10, high λ (d) Recovered HDR × 10, low λ Figure 5.9: Effect of smoothness regularizer, λ , on the recovered image (Section 4.7). Low smoothness results in better glare estimate but noisy highlights, while high smoothness does the opposite. Note that th epatches left in (d) are due to the wavyness present in real filter PSF. 62 5.6. Limitations The saturated region is not strong enough to produce any detectable glare. However, the algorithm does find some energy, which is evident from the glare estimate that has been removed. (a) Synthetically Blurred LDR (b) Recovered HDR Figure 5.10: Failure case — Napa Valley image. (a) shows the blurred test image with little or no glare present due to the cross-screen filter. However, the gradient present in the sky due to the sunset confuses the glare detection algorithm. Subtracting the wrong estimate gives (b). 5.6 Limitations First of all, unless a large percentage of pixels can be captured without clipping, the proposed method would fail. Since large values are estimated using small values and these small values also contain noise and other random components such as background texture, the estimation process cannot reliably estimate line integral gradient estimates that are not sufficiently large. That is, if some pixel gets clipped but its intensity is not high enough to produce detectable glare, it cannot be reconstructed. This algorithm heavily relies on the fact that glare patterns have an exponential profile 63 5.6. Limitations and scene edges or gradients usually do not. However, if an exponential falloff is found the algorithm will discover some energy creating it and the solution will be off-target. With only 8 views of a highlight, only a poor quality of highlight reconstruction can be expected. With more views it could get better, but then glare patterns from different directions of grooves would overlap and separating them may become difficult. This method does not address cases with under-saturation. Sufficiently long exposure can avoid under-saturation but it can potentially increase clipping due to saturation and cause the algorithm to fail. A few possible extensions are discussed in the next chapter in Section 6.2 that may potentially overcome some of these limitations. 64 Chapter 6 Conclusion 6.1 Conclusion In this thesis, the problem of capturing High Dynamic Ranges images with only one exposusre has been addressed. A novel High Dynamic Range image acquisition technique has been proposed. This new technique has been developed through a number of major contributions — 1. Split aperture approach for HDR imaging through deblurring, 2. Using cross-screen filters to do the blurring and thus incorporating a Dirac peak into the blur kernel, which helped encode HDR information into an image more effectively than simple blurring, 3. Utilizing the sparse gradient prior and then using the exponential profile of the glare to compute line integral gradient estimates; which paved the way to formulate the saturated pixel reconstruction problem as a tomography-like reconstruction problem. 6.2 Future Work 6.2.1 Additonal Constraints for Recovering Saturated Regions In addition to the constraints discussed in Section 4.7, a few more constraints can be implemented for better results. These constraints have not been implemented or tested, 65 6.2. Future Work but due to their theoretical strength, they can potentially make the proposed algorithm more robust. Penalizing Edgelike Tomography Artifacts (a) Simulated Blur (b) Sharp Reference Image / 1000 (c) Recovered Image / 1000 Figure 6.1: Edgelike tomograpy artifacts in a simulated case. As a common artifact in tomographic reconstructions, false edges along the glare directions can be seen in the reconstructed saturated region (Figure 6.1). These artifacts appear to be very prominent since only very few line integrals are being used to construct the saturated regions. These edgelike artifacts can be penalized by adding a penalty term to the optimization equation. After adding this term, the iS reconstruction equation (Equation 4.27) becomes iS = arg min σ −1 (A iS − b) iS 2 + λ ∇(iS ) 1 +γ E (iS ) 2 , (6.1) where E is a suitable edge detection operator to detect edges perpendicular to the glare directions and γ is a regularization parameter. Cross-Channel Correlation Constraint The RGB channels are highly correlated, as dictated by the color line prior [10]. Therefore unsaturated channel information can be used to reconstruct saturated channel in66 6.2. Future Work formation for pixels that are not saturated in all channels, using a method similar to [34]. As an added advantage, pixels that got clipped because of having a value only slightly higher than the saturation level would often have one or more unsaturated channels. Therefore, this step helps recover from cases when a saturation region does not have enough energy to produce a detectable glare. 6.2.2 Generalized Hough Transform (a) An image captured with an 8-point cross- (b) Generalized Hough Transform screen filter Figure 6.2: Results of a generalized Hough transform (Vertical direction). Greens show the detected positive gradients due to glare and Reds show the negative gradients due to glare. At each pixel (x,y), 100 pixel gradients on the same column starting at that pixel are taken, elementwise divided by the PSF to compute line integral gradient estimate distribution (as described in Section 4.6.2), and then the median is computed to get the estimate at that point, i.e., assuming that the point (x,y) is saturated and causing the glare on that column. Interestingly, given a fixed bin width, actually computing the histogram and taking the mode as an estimate is a generalization of the Hough transform. The Hough transform takes an image and expresses it in a different parameter space to detect a certain pattern [3]. In this case the exponential falloff profile is the pattern being searched for. 67 6.2. Future Work As an added advantage, generalized Hough transform can also be used to detect the glare rays due to bright points lying beyond the image frame. In Figure 6.2, there is a light source right above the top right corner, and only 2 of 8 glare rays came into the image. The Hough transform has successfully captured this information. While the saturated region will neither be easy to reconstruct nor it is necessary, glare removal step will need this information to remove glare due to light sources lying beyond the image frame. 6.2.3 Better PSF Estimation Since an estimated PSF, rather than the actual PSF, is used in the process (Figure 4.7), an additional optimization step is necessary to estimate the wavyness in the actual PSF from the recovered unsaturated image. The optimization process needs to minimize the deviation from estimated glare and should not blur out scene edges. Appropriate image priors, such as the sparse gradient prior, along with an assumption that the wavyness is constant for all glare rays in an image, can help construct the optimization problem. PSF estimation and glare estimation can be alternately run several times in an ExpectationMaximization fashion until and if the algorithm converges. 6.2.4 Automatic Glare Orientation Ditection This thesis has not investigated automatic detection and rotation of images to make the glare patterns are completely horizontal. It is possible that the generalized Hough transform along with some robust cost function would be able to find fine rotations necessary to make the glare patterns horizontal. 68 6.2. Future Work 6.2.5 HDR Video An obvious extension to this work is towards HDR Video. Of course, as was mentioned in Section 6.2.2, the success of such a method would greatly depend on being able to detect glare patterns created by objects lying outside the image frame. 69 Bibliography [1] J.S. Abel and J.O. Smith III. Restoring a clipped signal. In Proceedings of International Conference on Acoustics, Speech, and Signal Processing, 1991, pages 1745–1748, 1991. [2] U.M. Ascher, E. Haber, and H. Huang. On effective methods for implicit piecewise smooth surface recovery. SIAM Journal on Scientific Computing, 28(1):339, 2006. [3] D.H. Ballard. Generalizing the Hough transform to detect arbitrary shapes. Pattern recognition, 13(2):111–122, 1981. [4] Y. Bando and T. Nishita. Towards digital refocusing from a single photograph. In Proceedings of Pacific Graphics, pages 363–372, 2007. [5] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester. Image inpainting. In Proceedings of the Conference on Computer Graphics and Interactive Techniques (ACM SIGGRAPH), pages 417–424, 2000. [6] P. E. Debevec and J. Malik. Recovering high dynamic range radiance maps from photographs. In Proceedings of International Conference on Computer Graphics and Interactive Techniques, pages 369–378, 1997. [7] A. El Gamal. High dynamic range image sensors. In Tutorial at International Solid-State Circuits Conference, February, 2002. [8] Ginosar, R., Hilsenrath, O., Zeevi, Y. Wide dynamic range camera. 1992. 70 Chapter 6. Bibliography [9] P. Horowitz and W. Hill. The art of electronics. Cambridge University Press, 1989. [10] N. Joshi, C.L. Zitnick, R. Szeliski, D.J. Kriegman, L. Zitnick, and R. Szeliski. Image Deblurring and Denoising using Color Priors. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2009. [11] S.B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski. High dynamic range video. ACM Transactions on Graphics, 22(3):319–325, 2003. [12] S.J. Kang, H.C. Do, B.G. Cho, S.I. Chien, and H.S. Tae. Improvement of low gray-level linearity using perceived luminance of human visual system in PDPTV. IEEE Transactions on Consumer Electronics, 51(1):204–209, 2005. [13] C. Lau, M. Trentacoste, and W. Heidrich. Poster abstract: high dynamic range from defocus. In Proceedings of 17th Annual Canadian Conference on Intelligent Systems, page 36, 2007. [14] A. Levin, R. Fergus, F. Durand, and W.T. Freeman. Deconvolution using natural image priors. 2007. [15] A. Levin, R. Fergus, F. Durand, and W.T. Freeman. Image and depth from a conventional camera with a coded aperture. In Proceedings of International Conference on Computer Graphics and Interactive Techniques (ACM SIGGRAPH), 2007. [16] A. Levin, S.W. Hasinoff, P. Green, F. Durand, and W.T. Freeman. 4D Frequency Analysis of Computational Cameras for Depth of Field Extension. In Proceedings of International Conference on Computer graphics and interactive techniques (ACM SIGGRAPH), 2009. [17] A. Levin, P. Sand, T.S. Cho, F. Durand, and W.T. Freeman. Motion invariant photography. In Proceedings of International Conference on Computer Graphics and Interactive Techniques (ACM SIGGRAPH), 2008. 71 Chapter 6. Bibliography [18] L.B. Lucy. An iterative technique for the rectification of observed distributions. The Astronomical Journal, 79(6):745–754, 1974. [19] S. Mann and R.W. Picard. On being ’undigital’ with digital cameras: extending dynamic range by combining differently exposed pictures. Perceptual Computing Section, Media Laboratory, Massachusetts Institute of Technology, 1995. [20] R. Mantiuk. High-fidelity imaging: the computational models of the human visual system in high dynamic range video compression, visible difference prediction and image processing. PhD thesis, Universit¨at des Saarlandes, 2006. [21] R. Mantiuk, K. Myszkowski, and H.-P. Seidel. A perceptual framework for contrast processing of high dynamic range images. ACM Transactions on Applied Perception, 3(3):286–308, 2006. [22] M. Markowski. Ghost removal in HDRI acquisition. In Proceedings of the 13th Central European Seminar on Computer Graphics, 2009. [23] S. Morgenstern. Nikon D90 Digital Camera Review. http://www.digitalcamerainfo.com/content/Nikon-D90-Digital-Camera-Review19286/Color-and-Resolution.htm, 2008. [24] K. Myszkowski, R. Mantiuk, and G. Krawczyk. High Dynamic Range Video. Morgan & Claypool Publishers, 2008. [25] S.K. Nayar and T. Mitsunaga. High dynamic range imaging: Spatially varying pixel exposures. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 472–479, 2000. [26] S.K. Nayar and S.G. Narasimhan. Assorted pixels: Multi-sampled imaging with structural models. In Proceedings of International Conference on Computer Graphics and Interactive Techniques (ACM SIGGRAPH), 2005. [27] T. Olofsson. Deconvolution and model-based restoration of clipped ultrasonic signals. IEEE Transaction on Instrumentation and Measurement, 54(3):1235– 1240, 2005. 72 Chapter 6. Bibliography [28] B.A. Olshausen and D.J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609, 1996. [29] R. Raskar, A. Agrawal, and J. Tumblin. Coded exposure photography: Motion deblurring using fluttered shutter. In Proceedings of International Conference on Computer Graphics and Interactive Techniques (ACM SIGGRAPH), pages 795– 804, 2006. [30] W.H. Richardson. Bayesian-based iterative method of image restoration. Journal of the Optical Society of America, 62(1):55–59, 1972. [31] M. Robertson, S. Borman, and R. Stevenson. Dynamic range improvement through multiple exposures. In Proceedings of International Conference on Image Processing (ICIP’99), pages 159–163, 1999. [32] J. Shen. Inpainting and the fundamental problem of image processing. SIAM News, 36(2), 2003. [33] P. Tan, S. Lin, L. Quan, and H.Y. Shum. Highlight removal by illuminationconstrained inpainting. In Proceedings of IEEE International Conference on Computer Vision (ICCV’03), page 164, 2003. [34] X. Zhang and D.H. Brainard. Estimation of saturated pixel values in digital color imaging. Journal of the Optical Society of America. A, Optics, Image Science, and Vision, 21(12):2301–2310, 2004. 73
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Single exposure high dynamic range imaging with a conventional...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Single exposure high dynamic range imaging with a conventional camera using cross-screen filters Rouf, Mushfiqur 2009
pdf
Page Metadata
Item Metadata
Title | Single exposure high dynamic range imaging with a conventional camera using cross-screen filters |
Creator |
Rouf, Mushfiqur |
Publisher | University of British Columbia |
Date Issued | 2009 |
Description | Real world scenes often contain both bright and dark regions, resulting in a high contrast ratio, beyond the capabilities of conventional cameras. For these cases, High Dynamic Range or HDR images can be captured with expensive hardware or by taking multiple exposures of the same scene. However, these methods cost extra resources -- either spatial or temporal resolution is sacrificed, or more than one piece of hardware is needed. In this thesis, a novel technique is presented that is capable of capturing High Dynamic Range images in only one exposure of a conventional camera. We observe that most natural HDR images have only 2-5% pixels that are too bright compared to the rest of the scene to fall inside the dynamic range of a conventional camera. Our method spreads energy from these bright regions into the neighboring unsaturated pixels by defocus blurring. Bright pixels still get clipped in the captured image due to saturation of the sensor; but some information about these clipped pixels gets encoded or multiplexed in the form of superimposed glare patterns in the image. Frequency preservation and decoding of this information can be further improved by using a cross-screen filter instead of using defocus blur. Superimposed glare patterns are recovered with the help of natural image statistics. These glare patterns provide information about how much energy there is in the saturated pixels, which allows a tomography-like reconstruction of the saturated regions. Once the saturated regions are known, the rest of the image can be restored by removing the estimated glare patterns. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-04-10 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0051847 |
URI | http://hdl.handle.net/2429/24121 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2009-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2009_fall_rouf_mushfiqur.pdf [ 6.19MB ]
- Metadata
- JSON: 24-1.0051847.json
- JSON-LD: 24-1.0051847-ld.json
- RDF/XML (Pretty): 24-1.0051847-rdf.xml
- RDF/JSON: 24-1.0051847-rdf.json
- Turtle: 24-1.0051847-turtle.txt
- N-Triples: 24-1.0051847-rdf-ntriples.txt
- Original Record: 24-1.0051847-source.json
- Full Text
- 24-1.0051847-fulltext.txt
- Citation
- 24-1.0051847.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0051847/manifest