COMPUTATIONAL SINGLE-IMAGEHIGH DYNAMIC RANGE IMAGINGbyMushfiqur RoufB.Sc., Bangladesh University of Engineering and Technology, 2005M.Sc., Bangladesh University of Engineering and Technology, 2007M.Sc., University of British Columbia, 2009A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Computer Science)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)June 2018c©Mushfiqur Rouf 2018The following individuals certify that they have read, and recommend to the Facultyof Graduate and Postdoctoral Studies for acceptance, the dissertation entitled:Computational single-image high dynamic range imagingsubmitted by Mushfiqur Rouf in partial fulfillment of the requirements forthe degree of Doctor of Philosophyin Computer ScienceExamining Committee:Rabab K. Ward, Electrical and Computer EngineeringCo-supervisorJames Little, Computer ScienceCo-supervisorSupervisory Committee MemberWilliam Evans, Computer ScienceUniversity ExaminerRoger Tam, RadiologyUniversity ExaminerAdditional Supervisory Committee Members:Jane Wang, Electrical and Computer EngineeringSupervisory Committee MemberSupervisory Committee MemberiiAbstractThis thesis proposes solutions for increasing the dynamic range (DR)—the numberof intensity levels—of a single image captured by a camera with a standard dynamicrange (SDR). The DR in a natural scene is usually too high for SDR camerasto capture, even with optimum exposure settings. The intensity values of brightobjects (highlights) that are above the maximum exposure capacity get clippeddue to sensor over-exposure, while objects that are too dark (shades) appear darkand noisy in the image. Capturing a high number of intensity levels would solvethis problem, but this is costly, as it requires the use of a camera with a highdynamic range (HDR). Reconstructing an HDR image from a single SDR imageis difficult, if not impossible, to achieve for all imaging situations. For somesituations, however, it is possible to restore the scene details, using computationalimaging techniques. We investigate three such cases, which also occur commonlyin imaging. These cases pose relaxed and well-posed versions of the general single-image high dynamic range imaging (HDRI) problem. The first case occurs whenthe scene has highlights that occupy a small number of pixels in the image; forexample, night scenes. We propose the use of a cross-screen filter, installed atthe lens aperture, to spread a small part of the light from the highlights acrossthe rest of the image. In post-processing, we detect the spread-out brightness anduse this information to reconstruct the clipped highlights. Second, we investigatethe cases when highlights occupy a large part of the scene. The first method isnot applicable here. Instead, we propose to apply a spatial filter at the sensor thatlocally varies the DR of the sensor. In post-processing, we reconstruct an HDRimage. The third case occurs when the clipped parts of the image are not white buthave a color. In such cases, we restore the missing image details in the clipped coloriiichannels by analyzing the scene information available in other color channels in thecaptured image. For each method, we obtain a maximum-a-posteriori estimate ofthe unknown HDR image by analyzing and inverting the forward imaging process.ivLay SummaryOne aspect of photography that consumers often overlook is the dynamic intensityrange or dynamic range (DR) for short. A high DR is essential for capturing vividphotos. For example, in outdoor scenes, an object in the sun is many times brighterthan an object in the shade, and standard dynamic range (SDR) cameras would beunable to faithfully capture both objects in one image due to the limited DR. Whatis needed is a camera with a high dynamic range (HDR). However, it is prohibitivelyexpensive to make HDR cameras. This thesis explores an alternative solution; wepropose to do HDR imaging using off-the-shelf SDR cameras using computation-based solutions that come (almost) for free.vPrefaceSmall portions of the introductory text are modified from previously writtenintroductory material from my master’s thesis [114] (2009) completed at theUniversity of British Columbia.A version of Chapter 4 has been published [118]. The work in this chapterwas a progression of work done for my master’s research [114]. I was thelead researcher, responsible for all major areas of problem formulation, researchand analysis. Wolfgang Heidrich, the supervisory author, and Rafal Mantiuk,coauthor and a postdoctoral fellow, were involved throughout the project in conceptformation and manuscript composition.The work discussed in Chapter 5 has been published in two separate parts. Aversion of Section 5.6.3 was published [120]. I was the lead researcher on thiswork, and I was responsible for all major areas of problem formulation, solutiondevelopment, and analysis. Dikpal Reddy, Kari Pulli were involved in conceptformation and in writing the paper. Rabab K. Ward was the supervisory authorand was involved throughout the project in solution development and writing thepaper. A version of the rest of Chapter 5 was also published [117]. I was the leadresearcher, I was responsible for all major areas in formulating the solution and wasinvolved in writing the paper. Rabab K. Ward was the supervisory author and wasinvolved throughout the project and in the writing of the paper.Chapter 6 contains material that resulted in three publications; I was the leadresearcher on all the three projects; I was responsible for the problem formulations,solution developments and analyses. I was also the main author who wrotethe papers. Section 6.1 was published [119]; Wolfgang Heidrich was involvedvithroughout the project in concept formation and writing the paper. A version ofSection 6.2 was published [116]; Rabab K. Ward was the supervisory author; shealso was involved throughout the project in solution development and paper writing.A version of Section 6.3 was published [115]; Rabab K. Ward was the supervisoryauthor; she also was involved throughout the project in solution development andpaper writing.viiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xixList of Acronyms and Initialisms . . . . . . . . . . . . . . . . . . . . . xxxList of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxiiiList of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxivAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxviDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xli1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 What is the dynamic range (DR): the importance of high dynamicrange (HDR) . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Research objective . . . . . . . . . . . . . . . . . . . . . . . . 8viii1.3 Outline of the rest of this thesis . . . . . . . . . . . . . . . . . 112 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1 Multi-exposure high dynamic range imaging (HDRI) . . . . . . 132.1.1 Deghosting: correcting object misalignment in a multi-exposure image sequence . . . . . . . . . . . . . . . . 152.1.2 HDR video: obtaining multiple exposures for free . . . 162.2 Single-image HDRI . . . . . . . . . . . . . . . . . . . . . . . . 172.2.1 Exposure-multiplexing: spatially varying exposure . . . 172.2.2 Multiple sensors . . . . . . . . . . . . . . . . . . . . . 192.2.3 Integration curve manipulation . . . . . . . . . . . . . 192.3 Restoration of clipped signals . . . . . . . . . . . . . . . . . . 212.3.1 Single-image standard dynamic range (SDR) to HDR . . 222.3.2 Restoration of clipped colors . . . . . . . . . . . . . . 232.3.3 Inpainting . . . . . . . . . . . . . . . . . . . . . . . . 232.3.4 Denoising . . . . . . . . . . . . . . . . . . . . . . . . 242.4 Multiplexing scene information . . . . . . . . . . . . . . . . . 252.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Background and mathematical framework . . . . . . . . . . . . . 283.1 The forward image formation model . . . . . . . . . . . . . . . 293.1.1 The human visual system (HVS) . . . . . . . . . . . . . 303.1.2 Digital imaging . . . . . . . . . . . . . . . . . . . . . 343.1.2.1 Exposure controls: aperture, shutter andexposure-index (EI) . . . . . . . . . . . . . . 343.1.2.2 Analog to digital conversion of optical data . 363.1.2.3 Noise and noise modeling . . . . . . . . . . 383.1.3 Image DR: an indicator of image fidelity . . . . . . . . 413.1.3.1 Conventional multi-exposure HDRI . . . . . . 423.1.3.2 We propose computational single-image HDRI 443.1.3.3 Thought experiments: how image DR indicatesimage fidelity . . . . . . . . . . . . . . . . . 453.2 Displaying HDR images . . . . . . . . . . . . . . . . . . . . . 50ix3.3 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . 523.3.1 Forward process: how an image is acquired by an SDRcamera . . . . . . . . . . . . . . . . . . . . . . . . . . 533.3.2 Inverting the forward process: obtaining the latent HDRimage from a SDR image . . . . . . . . . . . . . . . . . 553.3.3 A Bayesian framework for solving the inverse problem . 563.4 Our computational optimization framework . . . . . . . . . . . 583.4.1 Vectorized representation of images for computation . . 593.4.2 Computing the data-fitting term . . . . . . . . . . . . . 593.4.3 The Rudin-Oshar-Fatemi (ROF) model: utilizing edgesparsity of natural images . . . . . . . . . . . . . . . . 623.4.4 Deriving and applying Image priors . . . . . . . . . . . 623.4.5 Computational optimization for solving the ROF model:iterative convex solvers for the mixture of `1 and `2 priors 633.4.6 Iterative reweighted least squares (IRLS) solution . . . . 643.4.7 Saddle point formulation and solution using primal-dualconvex optimization . . . . . . . . . . . . . . . . . . . 653.4.7.1 Proximity operators . . . . . . . . . . . . . . 673.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684 Filtering at the camera aperture . . . . . . . . . . . . . . . . . . . 694.1 Optical encoding of pixel intensities . . . . . . . . . . . . . . . 704.2 Image formation model . . . . . . . . . . . . . . . . . . . . . . 734.3 HDR image reconstruction . . . . . . . . . . . . . . . . . . . . 774.3.1 Removing light streaks due to unsaturated pixels . . . . 794.3.2 Separating the light streaks . . . . . . . . . . . . . . . 804.3.2.1 Gradient properties of the residual light streak,r . . . . . . . . . . . . . . . . . . . . . . . . 814.3.2.2 Solution via computational optimization . . . 844.3.3 HDR reconstruction of saturated pixels . . . . . . . . . 874.4 Results: synthetic test cases . . . . . . . . . . . . . . . . . . . 894.5 Results: real test cases . . . . . . . . . . . . . . . . . . . . . . 924.5.1 Separation of light streaks . . . . . . . . . . . . . . . . 92x4.5.2 Reconstruction of clipped highlights . . . . . . . . . . 924.6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934.6.1 Detection limits of light streaks . . . . . . . . . . . . . 934.6.2 Noise analysis . . . . . . . . . . . . . . . . . . . . . . 964.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975 Filtering at the camera sensor . . . . . . . . . . . . . . . . . . . . 985.1 Sensitivity-multiplexing . . . . . . . . . . . . . . . . . . . . . 995.1.1 Optical sensitivity-multiplexing . . . . . . . . . . . . . 995.1.2 Electronic sensitivity-multiplexing . . . . . . . . . . . 1005.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . 1035.3 Forward model: image formation . . . . . . . . . . . . . . . . 1065.3.1 Estimating the response curve r . . . . . . . . . . . . . 1115.4 Formulation of the HDR reconstruction problem . . . . . . . . . 1125.5 A non-local self-similarity image prior . . . . . . . . . . . . . 1145.5.1 Global optimization for reconstruction . . . . . . . . . 1145.5.2 Discussions . . . . . . . . . . . . . . . . . . . . . . . 1165.6 A fast image prior: smooth contour prior . . . . . . . . . . . . 1165.6.1 Intuition behind the smooth contour prior . . . . . . . . 1175.6.2 Fast edge-directed interpolation (EDI) . . . . . . . . . . 1185.6.3 Using EDI in the smooth contour prior for super-resolution(SR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225.6.3.1 Previous work on image SR . . . . . . . . . . 1245.6.3.2 Single-image super-resolution . . . . . . . . 1285.6.3.3 Global optimization for reconstruction . . . . 1305.6.3.4 SR results . . . . . . . . . . . . . . . . . . . 1325.7 Adapting the smooth contour prior for single-image HDRI . . . 1375.7.1 Modified smooth edge-guided interpolation . . . . . . . 1425.7.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 1445.8 Discussion: non-local self-similarity prior vs. smooth contourprior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1455.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146xi6 Cross-color-channel postprocessing for single-image DR expansion 1486.1 Gradient domain color and clipping correction . . . . . . . . . 1516.1.1 Restoration of Saturated Color . . . . . . . . . . . . . . 1526.1.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . 1546.1.2.1 Image formation model . . . . . . . . . . . . 1556.1.2.2 Hue interpolation . . . . . . . . . . . . . . . 1576.1.2.3 Cross-channel detail transfer and colorrestoration . . . . . . . . . . . . . . . . . . . 1586.1.2.4 Gradient smoothing for fully clipped regions 1616.1.2.5 Discretization . . . . . . . . . . . . . . . . . 1636.1.3 Results and analysis . . . . . . . . . . . . . . . . . . . 1636.1.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . 1666.2 Using intensity-invariant patch-correspondences for single-imageDR expansion . . . . . . . . . . . . . . . . . . . . . . . . . . 1686.2.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . 1716.2.1.1 Image formation model . . . . . . . . . . . . 1716.2.1.2 Stochastic search . . . . . . . . . . . . . . . 1736.2.1.3 Adding intensity-invariance to non-local self-similarity . . . . . . . . . . . . . . . . . . . 1746.2.1.4 Reconstruction with global optimization . . . 1776.2.2 Results and analysis . . . . . . . . . . . . . . . . . . . 1786.2.2.1 Limitations . . . . . . . . . . . . . . . . . . 1796.2.2.2 Theoretical bounds . . . . . . . . . . . . . . 1806.3 Retrieving information lost by image denoising . . . . . . . . . 1826.3.1 Previous work . . . . . . . . . . . . . . . . . . . . . . 1836.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . 1846.3.2.1 Image formation model . . . . . . . . . . . . 1856.3.2.2 The estimation of the lost latent data . . . . . 1896.3.2.3 Pseudocode . . . . . . . . . . . . . . . . . . 1916.3.3 Results and discussion . . . . . . . . . . . . . . . . . . 1926.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1947 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196xii7.1 Our HDRI framework . . . . . . . . . . . . . . . . . . . . . . . 1987.2 HDRI methods developed: results and conclusions . . . . . . . 2007.2.1 Methods utilizing aperture-filtering for single-image HDRI 2017.2.2 Methods utilizing sensitivity-multiplexing for single-imageHDRI . . . . . . . . . . . . . . . . . . . . . . . . . . . 2017.2.3 Methods utilizing cross-color-channel correlation forsingle-image DR expansion . . . . . . . . . . . . . . . 2027.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 2047.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2077.5 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2077.5.1 Auxiliary HDR camera . . . . . . . . . . . . . . . . . . 2077.5.2 Filtering somewhere in between the aperture and sensor 2087.5.3 Aperture-filtering and deep learning . . . . . . . . . . . 208Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210xiiiList of Tables3.1 A summary of aperture and sensor filters used in various methodspresented in this thesis . . . . . . . . . . . . . . . . . . . . . . 553.2 A summary of natural image priors used in various methodspresented in this thesis . . . . . . . . . . . . . . . . . . . . . . 57xivList of Figures1.1 A HDR image vs. a SDR image of the same scene . . . . . . . . 31.2 A comparison of the DR of various systems . . . . . . . . . . . 42.1 Multi-exposure HDR merging example . . . . . . . . . . . . . . 142.2 Assorted Pixels for spatial multiplexing of exposure . . . . . . . 173.1 Color perception by the HVS . . . . . . . . . . . . . . . . . . . 313.2 CIE 1931 colorspace xy chromaticity diagram . . . . . . . . . . 323.3 Image acquisition using a lens and camera . . . . . . . . . . . . 353.4 The Bayer filter [14] enables color imaging . . . . . . . . . . . 363.5 Spectral response of Canon 40D . . . . . . . . . . . . . . . . . 373.6 Intensity response and DR limit . . . . . . . . . . . . . . . . . . 403.7 DR is a fixed-width window; when exposure settings are changed,this window slides up and down along the log-intensity scale . . 433.8 Representing an image formation model as a matrix-vectormultiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.1 Capturing a HDR image with a cross-screen filter . . . . . . . . 704.2 A cross-screen filter and its point-spread function (PSF) . . . . . 724.3 A cross-screen filter creates light streaks that fall off exponentially. 744.4 Analysis of the PSF of a cross-screen filter . . . . . . . . . . . . 754.5 Analysis of the effect of a cross-screen filter . . . . . . . . . . . 814.6 Motivation for modeling image gradients distribution as a Laplacedistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.7 Chromatic issues in a light streaks created by the cross-screen filter 86xv4.8 Reconstruction of clipped highlights from cross-screen filter lightstreaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.9 Our reconstruction results for simulated light streak from an 8-point cross-screen filter . . . . . . . . . . . . . . . . . . . . . . 904.10 Results of reconstruction from images captured by a camera withan 8-point cross-screen filter . . . . . . . . . . . . . . . . . . . 914.11 Detectability of light streaks vs. distance from a saturated region 944.12 Analyzing noise added due to a cross-screen filter . . . . . . . . 955.1 The “exposure-multiplexed mode” . . . . . . . . . . . . . . . . 1025.2 Overview of the exposure-multiplexed HDRI method: imageformation model . . . . . . . . . . . . . . . . . . . . . . . . . . 1045.3 Color filter array modifications for HDRI . . . . . . . . . . . . . 1075.4 exposure-multiplexed imaging: corrections for the nonlinearresponse curve, an example . . . . . . . . . . . . . . . . . . . . 1105.5 exposure-multiplexed imaging: corrections for the nonlinearresponse curve using random sample consensus (RANSAC)polynomial fitting . . . . . . . . . . . . . . . . . . . . . . . . . 1125.6 An illustration of speed-up via precomputation of area-sums . . 1195.7 An illustration of the two stages of EDI . . . . . . . . . . . . . . 1205.8 A comparison of results and timings of our SR approach with sparsegradient prior, smooth contour prior and the combination of bothpriors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1255.9 A demonstration of the combined effect of the smooth contour priorand the sparse gradient prior with a Siemens star chart . . . . . . 1275.10 SR with smooth contour prior: dyadic examples . . . . . . . . . 1335.10 SR with smooth contour prior: dyadic examples (continued) . . 1345.11 SR with smooth contour prior: nondyadic examples . . . . . . . 1355.12 Flowchart of our proposed exposure-multiplexed HDRI method . 1385.13 Augmenting the smooth contour prior for single-image HDRI . . 1395.14 Exposure-multiplexed HDRI results . . . . . . . . . . . . . . . . 1435.14 Exposure-multiplexed single-image HDRI results (continued) . . 144xvi6.1 Gradient domain color and clipping restoration for single-image DRexpansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1526.2 Flowchart of the proposed gradient domain color and clippingrestoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1536.3 Partial clipping resulting in color desaturation: how differentchannels are clipped at a different rate depending on the hue . . 1566.4 Advantages of performing the color restoration in gradient domainas opposed to intensity domain . . . . . . . . . . . . . . . . . . 1606.5 Gaussian infilling for gradient domain restoration in case of fullsaturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1616.6 Examples of color and clipping restoration with our gradient-domain method . . . . . . . . . . . . . . . . . . . . . . . . . . 1626.7 Comparison of color restoration with other methods . . . . . . . 1636.8 Results of the gradient-domain color and clipping restorationmethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1656.8 Results of the gradient-domain color restoration method(continued) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1676.9 A failure case of color restoration with our algorithm . . . . . . 1686.10 An overview of single-image DR expansion using non-local self-similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1706.11 How noise is reduced using non-local self-similarity, and thetheoretical optimum . . . . . . . . . . . . . . . . . . . . . . . . 1726.12 Results of our proposed single-image DR expansion methodemploying non-local self-similarity . . . . . . . . . . . . . . . . 1756.13 An analysis of how lowering the noise level improves the dynamicrange; theoretical bounds . . . . . . . . . . . . . . . . . . . . . 1816.14 Overview of our proposed noise correction method for single-image DR expansion . . . . . . . . . . . . . . . . . . . . . . . 1856.15 Denoising improvement: Deriving statistical properties ofdenoising error . . . . . . . . . . . . . . . . . . . . . . . . . . 1866.16 Denoising improvement method: results . . . . . . . . . . . . . 1886.16 Denoising improvement method: Results (continued) . . . . . . 191xvii6.17 Our denoising improvement performance for different noise levels 193xviiiGlossaryadditive white Gaussian noise simple Gaussian noise with a zero mean. Allfrequency components of this noise term are equally strong.aperture Opening through a lens. The larger the opening, the more light goesthrough. The aperture size is not the physical size of the opening through thelens, but the size from the perspective of the front end of the lens. The lightpencil size at the entry pupil—the front of the lens—is determined by howbig the physical aperture is, but how big it looks from the perspective of thatentry pupil.aperture-filtering Filtering at the aperture. It spreads light from highlights intothe nearby pixels in an effort to store more information in the image thanthat an unfiltered standard dynamic range (SDR) camera can capture in animage. A post-processing step utilizes this additional information stored inthe image and generates a high dynamic range (HDR) output image. Thisis also known as aperture-coding. Aperture-coding was first introduced byAbles [4], Dicke [28] for the field of astrophysics and popularized by laterpublications such as Fenimore and Cannon [37].Bayes’ rule a statistical rule that describes the probability of an event given priorknowledge of conditions that might be related to that event. For two eventsA and B, the Bayes’ rule states that,Pr (A | B) = Pr (B | A) Pr (A)Pr (B). (1).xixBayesian method optimization using Bayes’ rule [99].beam splitter a semi-reflective mirror (partially “silvered” sheet of glass) thatsplits incident beam of light in two parts: one part refracts (keeps going inthe same direction through the glass) and the rest reflects (and goes on to asecond direction).bilateral filter a conditional Gaussian filter where the filter weight is inverselyproportional to the product of difference in value (“range”) and difference inspace. Both of these differences are assumed to be Gaussian distributed withzero mean and some known variance. Due to Tomasi and Manduchi [135].See Section 6.1.2.2 for a discussion.bit-depth bits per pixel in an image.block-matching and 3D filtering A denoiser [21, 24]. This denoising techniqueleverages natural image sparsity and non-local self-similarity.blur kernel the convolution kernel.color filter array A color image sensor has a red-green-blue (RGB) color filterarray pasted on top. As a result, each pixel can only observe one of thethree colors. The most common pattern of these three colors is the Bayerpattern [14].computational imaging some imaging technique where computation is heavilyinvolved. As opposed to the conventional imaging techniques, where thegoal is to obtain sharp imagery using precision optics only, computationalimaging is a general class of methods where the optical imaging partacts as a complementary to the subsequent computational postprocessing.Usually these computational postprocessing steps are compute-intensiveinverse problems. This is quite different from conventional image processing(for example, unsharp masking, bilateral filtering, or various edge or otherkinds of feature detection) where the “postprocessing” is basically anapplication of a filter, which is NOT an inverse problem. Inverse problemsxxrequire a “search” (i.e., optimization), which makes them “harder” than theconventional “easier” image processing techniques.convolution an image operation where every pixel in the resulting image is aweighted sum of all neighboring pixels in the source image. The weightsare called the convolution kernel or blur kernel. Convolution can happenoptically, or computationally. For example, camera shake can be modeled asa convolution: the motion of the camera makes everything in the capturedimage to move the same way (assuming little or no rotational motionaround the optical axis). Again, per-pixel image operations that are spatiallyinvariant (such as computing the image-gradients) can be expressed as aconvolution.cross-color-channel correlation Image prior. In color images, pixel intensitiesacross different channels are correlated. This is because the spectralresponses of colors have overlap. Section 3.1.1.cross-screen filter a physical photographic filter made of transparent plastic withscratch marks or grooves cut on the surface. When photos are captured withthis filter, star-shaped light streaks are added to the image around brightobjects (Section 4.2). Applying this filter is like an optical convolution;and removing the light streaks from a captured image is effectively adeconvolution.Dcraw a popular public domain software package for processing RAW imagefile [25].deconvolution inverting convolution.deep learning a machine learning (ML) technique [75, 79], it trains a deep networkof artificial neurons with massive amount of training data.demosaic generating a RGB image from a RAW image. RAW images have onlyone color channel per pixel; in demosaicking, missing color information isreconstructed using image analysis.xxidenoise to separate noise from the signal. Observed data is noisy, i.e., acombination of noise and signal, noise needs to be removed from theobserved data before further processing of the signal can proceed.denoiser algorithm that can denoise a noisy signal.digital SLR camera digital single-lens reflex camera (SLR).discrete cosine transform expresses an image in a discrete cosine basis. Thisbasis is known for concentrating energy in a few coefficients. Transformingin this domain enables some imaging tasks such as compression anddenoising.dynamic range A measurement of image fidelity. More accurately, it is the widthof the range of log-intensities a camera can faithfully capture. In other words,this is the ratio between the maximum value (i.e., the sensor saturation level)and the minimum value (i.e., the noise level) of a camera for a given exposuresetting (See Section 1.1 for details).edge-directed interpolation an edge-aware anisotropic filter. This methodestimates the direction of isophotes by examining patch structures, andthen interpolates along these estimated isophote directions to avoid blurringedges. More generally this is a class of methods, the most canonical exampleis the one due to Li and Orchard [84].exposure settings camera parameters that dictate the brightness of the imagecaptured, namely, aperture (width of the pencil of light entering the camerathrough the lens), shutter speed (how long light from the scene is allowed toland on the sensor), and exposure-index (EI) (how sensitive the sensor is tothe incident light).exposure-bracket multiple exposures are captured in rapid succession. Eachexposure has a different exposure setting. (Most often it is the shutter speedthat is varied.) The image sequence produced by it covers more dynamicrange (DR) than a single SDR capture.xxiiexposure-index (also known as “ISO number” [64]) Sensor sensitivity. “100 ISO”is considered the standard. “200 ISO” is twice as “fast” (i.e., an image withsame level of overall brightness can be captured in half the time.) The speedgain comes at a cost; “faster” exposures contain more noise.exposure-multiplex multiplexing exposure settings between pixels or rows ofpixels on the same exposure of an image sensor. This is a moregeneral form of sensitivity-multiplexing; while sensitivity-multiplexing onlymultiplexes EI (sensitivity), exposure-multiplexing includes multiplexingexposure duration as well.exposure-value exposure value. See also: f/stops.f/stop (Also f-stop) The bit-depth of a signal. 1 f-stop difference implies a factorof 2 in terms of amount of light [20].It originally referred to the physical stopping of light through lens. Thisnumber is a ratio of the focal length of a lens to the diameter of the aperture“opening”. When a lens is “wide open”, the f-stop number is the smallestpossible for that lens. (With focal length kept constant, higher aperturediameter results in a lower f-number. Since area is quadratic to diameter,halving the f-stop results in 4 times as much light. As a result, the commonf-numbers form a geometric sequence with a rate of√2: f/1.4, f/2, f/2.8,f/4, f/5.6, f/8, f/11, f/16, f/22.The term is now used in a broader context, still implying the same halving ordoubling of light. Furthermore, “stopping up” means doubling the amountof light, and stopping down is halving the amount of light. More generally, ifexposure is varied by a factor k, then the number of stops is log(k)/ log(2).In the context of DR, f/stops, bit-depth and exposure-value are closely relatedconcepts.forward model image formation model, abstracts the image formation process.xxiiiforward process Image formation process. The steps in the imaging process ofthe camera: a description of the optical transformation incoming light goesthrough before hitting the sensor and being converted to electronic signal.gamma a nonlinear per-pixel function often applied on intensity data; the mostsimple example is raising the pixel intensities by a power.gamma correction display technologies often convert image intensity valuesnonlinearly when processing these values for output. A gamma is then usedto correct for this non-linearity.Gaussian noise a noise term that is distributed as a Gaussian. In the context of thisthesis, a Gaussian noise term η is zero-mean and have a variance of b2, i.e.,η ∼ N (0, b2) (2)or, η ∼ 1b√2piexp[− η22b2](3).ground-truth ground-truth reference data for some observed data.high dynamic range Better than conventional camera DR. It is ≥ 20 bits or ≥20 f/stops.high dynamic range imaging high dynamic range imaging.high-resolution The image that has a higher resolution. This term is used in thecontext of a super-resolution (SR) algorithm.highlight A bright area in an image, usually over-exposed.human visual system processing by the human eye and the visual cortex.image bandwidth Total amount of information in an image (in bits) = spatialresolution (number of pixels) × pixel-depth (bits per pixel).xxivimage prior prior knowledge or assumptions about images. Image priors areusually based on statistical properties.image sensor photosensitive element in a camera that records the image incidenton it.inpaint an imaging problem of hole-filling. If small parts of an image are missing,these methods apply various image priors aiming to reconstruct those missingpixels. The easiest such a method would rely on simple diffusion, howeversuch inpainted results will not look plausible owing to the blurring resultingfrom the diffusion. The easiest plausible alternative is an edge-preservinginterpolation for the infilling of the missing data.International Standards Organization standards body.isophote an equi-intensity contour; an iso-line on an image where pixels have thesame intensity.iterative reweighted least squares An iterative algorithm for convexoptimization (Section 3.4.5). This method is particularly suitable forobjective functions with one or more `p norm terms where p 6= 2. As long asp >= 1 for all terms in the objective function, the problem remains convexand this algorithm can be used.just noticeable difference largest intensity difference that is small enough that thehuman visual system (HVS) cannot detect.latent underlying, original. A latent image is the unknown original image in itspristine condition, as opposed to the captured image which is corrupted bynoise and other limitations of physical processes.least significant bit least significant bit of a number.low-resolution The image that has a lower resolution. This term is used in thecontext of a SR algorithm.xxvmachine learning machine learning.maximum-a-posteriori The maximum-a-posteriori estimate of an unknownquantity.maximum-likelihood estimator The maximum likelihood estimator of anunknown quantity.midtone Neither a highlight nor a shade, i.e., a part of an image that is neither toobright nor too dark. Usually well-exposed.modulation transfer function modulation transfer function.multi-exposure multiple exposures of the same scene, often obtained usingexposure-bracketing.neutral-density photographic filter that blocks light at the same rate at everywavelength of light across a broad spectrum, typically all wavelengths ofvisible light.noise impurity in a signal or error in measurement. This can result from a varietyof reason including but not limited to: the randomness present in physicalsignal acquisition processes, inaccuracies in the measuring devices, or lossycompression of the signal. Every signal has multiple sources of noise; someof these sources are deterministic, others are random. Deterministic noise(such as fixed-pattern noise) is easy to remove, and therefore is often notmentioned. The nondeterministic, random noise is what most of the timewe worry about. Section 3.1.2.3 has a discussion on the noise model inphotography.non-local means a denoising technique that finds similar patches from within thesame image, and then denoises the patch via averaging them.non-local self-similarity Image prior. In natural images, image patches tend to“repeat”, i.e., image patches with very close structure appear in multipleplaces in the same image.xxvione-dimensional one-dimensional.optical encoding of pixel intensities pixel intensities are optically filtered andencoded in the intensity values of the nearby pixels.peak signal-to-noise ratio The ratio of peak signal (the highest possible value in asignal) to noise. High peak signal-to-noise ratio (SNR) implies a high fidelityof the signal. PSNR is a “global quantity” whereas signal-to-noise ratio (SNR)is “local quantity”.photosensitive sensitive to light. When light energy hits a photosensitive element,it emits electrical energy.pixel picture element. In the context of this thesis, pixel is the photosensitive unitson image sensors. A pixel can be thought as a point sampler of (often band-limited) optical signal (“the image”) [125].pixel-depth same as bit-depth.point-spread function in a convolution, how a pixel is spread out into itsneighboring pixels because of the convolution.Poisson noise see shot noise.random sample consensus is an iterative method for fitting a mathematical modelto observed data that contains outliers [39].RAW straight out of camera without any modification such as demosaicking,gamma correction, color correction, jpeg compression etc.red-green-blue the three color channels in color imaging.Rudin-Oshar-Fatemi a noisy image formation model [121] that assumes additivewhite Gaussian noise (AWGN) and sparse gradient. Under the Bayesianframework, this model becomes a nonlinear optimization problem.xxviiscale-invariant feature transform a feature detector that detects corners in animage [85].sensitivity-multiplexing Filtering at the sensor. It modifies pixel intensity valuesbefore these values are digitized in an effort to store more information in theimage than that an unfiltered SDR camera can capture in an image. A post-processing step utilizes this additional information stored in the image andgenerates a HDR output image.shade A dark area in an image, usually under-exposed. Pixels in this area receivea small number of photos; as a result, these pixels have high noise and lowSNR.shot noise (also Poisson noise) a noise involved with discrete events (such asphoton arrival) that can be modeled with a Poisson process. For largenumbers, the Poisson distribution approaches a normal distribution about itsmean. Variance of shot noise is equal to its mean. Therefore, the resultingnoise term has a zero mean, variance proportional to the signal, and the shapeof a Gaussian distribution. This is why shot noise acts as a multiplicativeGaussian noise.shutter speed how long the shutter is kept open for exposure.signal-to-noise ratio The ratio of signal to noise. High SNR implies high a fidelityof the signal. SNR is a “local quantity” whereas PSNR is a “global quantity”.single-lens reflex camera a type of camera where the same lens is used forimaging and viewfinding. It uses a mirror-prism system to put the viewfinderon the same optical path as the image sensor.smooth contour an image prior. It utilizes edge-directed interpolation (EDI) toenforce smooth contour structures in images.sparse gradient an image prior. It utilizes the fact that natural images tend to bepiecewise smooth, i.e., edges are sparse in an image. This prior encouragesimages having piecewise smooth structure.xxviiispatial resolution Total number of pixels in an image, often expressed inmegapixels.speeded up robust features a feature detector [13].standard dynamic range Conventional camera DR. It is≤ 14 bits or≤ 14 f/stops.star filter a photographic filter for creating star-like effect around bright objects inthe scene.structural similarity index a score of structural similarity between images [141].super-resolution Increasing spatial resolution of an image.three-dimensional three-dimensional.total variation `1 sum of gradients, |∇·|.two-dimensional two-dimensional.unsharp masking a simple image sharpening technique due to Yule [150]. Highfrequencies are boosted so that the image appears sharp.virtual exposure simulated SDR exposure of an HDR image. virtual exposuressimulate what a SDR camera would be able to capture in a single exposure.We use virtual exposures as a means to show our reconstruction quality. Ashort virtual exposure simulate a short exposure with an SDR camera andclearly show the highlights. A long virtual exposure simulate a long exposurewith an SDR camera and clearly show the shades. See Section 3.2 for adetailed discussion.xxixList of Acronyms and Initialisms1D one-dimensional.2D two-dimensional.3D three-dimensional.AWGN additive white Gaussian noise.BM3D block-matching and 3D filtering.CFA color filter array.DCT discrete cosine transform.DL deep learning.DR dynamic range.DSLR digital SLR camera.EDI edge-directed interpolation.EI exposure-index.EV exposure-value.GT ground-truth.xxxHDR high dynamic range.HDRI high dynamic range imaging.HR high-resolution.HVS human visual system.IRLS iterative reweighted least squares.ISO International Standards Organization.JND just noticeable difference.LR low-resolution.LSB least significant bit.MAP maximum-a-posteriori.ML machine learning.MLE maximum-likelihood estimator.MTF modulation transfer function.ND neutral-density.NLM non-local means.PSF point-spread function.PSNR peak signal-to-noise ratio.RANSAC random sample consensus.RGB red-green-blue.xxxiROF Rudin-Oshar-Fatemi.SDR standard dynamic range.SIFT scale-invariant feature transform.SLR single-lens reflex camera.SNR signal-to-noise ratio.SR super-resolution.SSIM structural similarity index.SURF speeded up robust features.TV total variation.xxxiiList of Algorithms3.1 iterative reweighted least squares (IRLS) . . . . . . . . . . . . . 643.2 primal-dual convex optimization algorithm . . . . . . . . . . . . 665.1 Single-image high dynamic range imaging (HDRI) using the BM3Dprior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1165.2 Super-resolution (SR) using the smooth contour prior . . . . . . 1315.3 Exposure-multiplexed HDRI using the smooth contour prior . . 1406.1 Improved denoising using a statistical model of color noise . . . 192xxxiiiList of SymbolsThe next list describes several symbols that will be later used within the body ofthe thesisB(·) application of block-matching and 3D filtering (BM3D)⊗ convolution operator↓ downsample operatorE(·) application of edge-directed interpolation (EDI)I identity matrixF (·) Fourier transform∇ gradient operatorγ exposure-index (EI)fˆ an estimate of the unknown high dynamic range (HDR) imageg the observed standard dynamic range (SDR) imageΓ domain of the captured SDR image gf the unknown HDR imageΩ domain of the unknown HDR image fI set of integersλ regularization parameter∇2 Laplacian operatorL(·) likelihoodM Forward process Ω→ Γxxxivη noiseN normal distribution|·| `1 norm, sum of absolute valuesEi Optical filter at the aperturePr (·) probabilityprox proximity operatorQ quantization operator, discretizes analog valuesR set of real numbers·|Γ decimation operator Ω→ Γρ anti-aliasing filterEo Sensor filter‖·‖TV total-variation norm‖·‖ `2 normxxxvAcknowledgementsThis thesis is the last milestone of a very long journey, full of surprises, some aregood surprises that I wish would repeat while others are also good but I do not wishrepetitions! The worst was the constant second-guessing of various choices I havemade along the way, many of which I regretted at times but not anymore. PhD istoo long a commitment; it takes up a big chunk of one’s young adult life, a timewhen life can be turbulent, things that otherwise could be in control can go out ofcontrol. So many things go beyond one’s control that pursuing a PhD may oftenfeel like a constant uphill battle. I am very thankful for being so fortunate to havebeen granted this opportunity, for an overall positive experience, and for having asupportive family, friends, and mentors.The PhD turned out to be a battle harder than I had imagined. I am here nowthanks to a lot of people who helped me with their time, their advice, their supportand their love; I am indebted to my supervisors, my committee, my family andfriends, my colleagues, my well-wishers and everyone who directly and indirectlyhelped me make steady progress towards a successful completion.First and foremost, I would like to thank my supervisor and mentor Dr. RababKreidieh Ward, Professor Emeritus, ECE, UBC; without her invaluable advice,kind support, deep compassion and constant guidance it would have been quiteimpossible for me to complete my PhD. Working with Dr. Ward helped me manyways—the personal and professional growth, the freedom to learn new things andto try out new ideas, and most of all a space that finally helped foster my creativity.With Dr. Ward, the PhD life was finally NOT subservience, NOT destitution;Dr. Ward was a teacher, a guide, a mentor, a well-wisher, all rolled into one. Shexxxvicompletely understood my way of working; she always embraced me with a bigsmile every time I returned after going into hiding in a crunch mode. In writingevery paper and most importantly this thesis, Dr. Ward has been very closelyinvolved, she provided critical comments, helped me say what I wanted to say inthe simplest possible language but no simpler, and helped improve the underlyingtheory. She is a true mentor—instead of taking short cuts, she always chose tospend time in guiding me to the right answer, improving my writing, and makingmy presentations better. I am ever-grateful for the unique experience.I would like to thank my supervisor Dr. Jim Little, Professor, CS, UBC, forstepping in and helping me salvage my PhD, for guiding me through the process,and for helping me wrap up my thesis. Dr. Little has been fun to work with; he hasgiven me many of the best advices throughout my program.I wholeheartedly thank my committee members for their guidance and helpthrough the degree program. Dr. Jane Wang, Professor, ECE, UBC, and membersof the first committee Dr. Bob Woodham, Professor Emeritus, CS, UBC, andDr. Nando de Freitas, Adjunct Professor, CS, UBC and Professor, Oxford, UK,have all provided timely and valuable support and advice at various points of myprogram. I have also been fortunate to receive valuable advice from Dr. WillEvans, Professor, CS, UBC on various occasions, including various discussions Ihad with him during and after the PhD defense. I would like to thank my firstsupervisor Dr. Wolfgang Heidrich, then Associate Professor, CS, UBC, and nowProfessor, CS, KAUST, for accepting me into the PhD program. In the first few yearsof my PhD, I was very fortunate to have Dr. Rafał Mantiuk as a mentor, back thenhe was a Postdoc in CS, UBC, and currently he is a Senior Lecturer, University ofCambridge, UK. Dr. Mantiuk has been a friend and mentor throughout my PhD.From outside of UBC, Dr. Kari Pulli, first met at Nvidia Research, Santa Clara,CA and now at Meta Vision Inc, San Mateo, CA, Scott Daly, Senior technicalstaff, Dolby Laboratories, CA, and Dr. Karen Eguiazarian, Professor, CS, TampereUniversity of Technology, Finland, have provided guidance and support.I have to thank UBC for providing a wide variety of opportunities—study,work, and extracurricular activities. UBC is truly “a place of mind.” At UBC, therexxxviiare valuable opportunities everywhere if you are willing to get involved.Competitive programming is what kept me going; the competitiveprogramming club met every Saturday for problem solving practice—time forme to seek refuge from the harsh realities of the PhD life! Shout out to JasonChiu, Daniel Du, Aram Ebtekar, Raunak Kumar, Angus Lim, Paul Liu,Daniel Lu, Jonathan Shen, Lucca Siaudzionis, Chen Xing, Greg Zhang, DavidZheng, and many others. I have been involved with coaching the programmingteams since 2008 until now; this whole experience has been very rewarding andfulfilling. The UBC contestants are some of the smartest people I have ever met,it has been a privilege being amongst them. During this time we have advancedto the prestigious Association for Computing Machinery International CollegiateProgramming Contest (ACM ICPC) World Finals almost every year; ACM ICPCrecognized this voluntary work by giving me a coach award at ACM ICPC WorldFinals 2016. I am grateful to the ACM ICPC organizing team for providing thisopportunity to have a lot of fun while also giving back to the community. I amvery thankful to CS, UBC, for hosting the competitive programming club, and inparticular Dr. Will Evans for always championing our case to the department.I would like to thank my colleagues at places I worked at while I waspursuing my PhD. These work opportunities have provided me with a much-neededidentity and sometimes escape from the PhD grind. The internship with NvidiaResearch gave me the much-needed respite after some horribly difficult times inmy PhD journey. I have worked at a North Vancouver, BC embedded imagingstartup Holos Vision as a researcher, and at Simon Fraser University, Burnaby,BC, as a sessional lecturer. Most notably, I thoroughly enjoyed my time at theRisk Management Services RMS, UBC, in particular our privacy and informationsecurity group PRISM: Jennifer Kain, Michael Lonsdale-Eccles (thanks for thevital hint that I “got to TED-talk it” my thesis defense), Kari Martin and PimkaeSaisamorn were amazing colleagues, I am grateful to have met some of thevery capable people of UBC and to be able to see them at work so closely. Thecompression project, “Bitsy”, which I work together with Braydon Batungbacaland Mohammad Adib, was a fast-paced research and development project and soxxxviiimuch fun. Currently, I work at Ouster Inc., an autonomous driving technologystartup, which is an interesting opportunity for me; not only I apply what I amstrong at but also I am able to explore and learn new things. Mark Frictl,the CTO, and Angus Pacala, the CEO, have been very patient during the finalmonths of writing this thesis. My colleagues Dima Garbuzov, Daniel Lu, RaffiMardirosian, Harish Raja, Brionna Seley, Daniel Sohn, H “Jean-Claude” W(thanks for telling me not to joke during the defense, I joked only 100 time!), KaiWong and others could not have been more supportive.My PhD life, in general, was so much fun and enjoyable thanks to the manyfriends I have. Many of them going through the same PhD grind, many are freer,and everyone is cheerful, energetic and supportive of me through my difficulttimes. Shout out to Dr. Zaki Abdullah, Dr. Hasanat Alamgir, Raged Anwar,Dr. Usman Chowdhury, Dr. Joydip Das, Ninadh D’Costa, Dr. Masrur Hossain,Dr. Sajjad Hossain, Nafis Jalil, Dr. Md Kamruzzaman, Ash Khan, AbdullahMahmud, Tasnuva Mariam, Dr. Mouri Sharmin, Dr. Jonatan Schroeder andothers for being there for me through thick and thin, for wishing me well! MalihaSultana supported me emotionally, morally for a long time, and I am very gratefulfor all the good times and great memories! My sincere gratitude also to myuncle “chachchu” M Abdur Rafiq, cousin Dr. Raihan Rafique, aunt “sweetkhalamoni” Samsun Wazid for always keeping in touch and for providing theconstant moral support. Thanks so much my MSc thesis supervisor Dr. M MostofaAkbar, professor, CSE, BUET, and my competitive programming coach Dr. MKaykobad, Professor, CSE, BUET for always wishing me the best!I often like to break away from the confinement of one desk and fixed workhours! All the coffee shops I spent long days and sleepless nights, the Beaneryat the Acadia Park residence in UBC, Calhoun’s on Broadway in Kitsilano (sadlynow closed), Tim Hortons locations near UBC, and Cartems also on Broadwayin Kitsilano, thank you! The sushi places I lived on are One More Sushi in UBCvillage, Miku at Waterfront, The Eatery on Broadway near Alma, I-Sushi onArbutus near 12th, and Green Leaf Sushi on Broadway near Alma. Granted, coffeekept me awake, but it was sushi that kept me rolling!xxxixI have been so fortunate to have lived in Vancouver, BC for much of my life!I believe this is the best place to live—why that is true is itself a PhD topic. Yes wesometimes complain about the weather, but the Summer more than makes up forthat! The rich cultural mosaic in Vancouver, all the events and all the places to eat.The mountains, the beaches, and Stanley Park! All the hiking, all those campingtrips, roadtrips, bicycle trips. Vancouver lets you be truly free! What if it is alldownhill from here? I truly miss Vancouver, and I truly wish to go back!My family has been beside me at every turn; my gratitude and love for them areway beyond my capability for articulating feelings. They have always provided mewith invaluable moral and psychological support. My brilliant brother Shahriar“Nafi” Rouf, Software Engineer, Google, Mountain View, CA, my courageoussister Farzana “Alice” Rouf, my caring mother Begum Shirin Akhtar, DeputyGeneral Manager, MIS, BJMC, Bangladesh, and my watchful father Dr. M AbdurRouf, Professor, Civil Engineering, BUET. My mother is a computer programmerby training; she was a professional software engineer before computers becamecool. She has always encouraged me to succeed, believed in me that I would, lovedme for succeeding, and loved me just as much when I failed. My father has alwaysbeen a constant inspiration for me; my life is an iterative search problem to bemore like him and still failing. My father has given me continuous moral supportthrough my PhD and directly helped me with writing, proofreading and correctingthis thesis. I learned how to code from my father; one fine day he gave me apiece of code doing some simple computer animation written in QuickBASIC andchallenges me with a task to modify it. That was probably day zero of this careerpath in Computer Vision I am pursuing today. I do not believe I could have beenwhere I am now without their constant support and encouragement. I have beenvery fortunate to have my wonderful family; I realize it a little more every day, andI don’t believe I appreciate it enough, but I will do my best.Mushfiqur “Nasa” RoufJune 11, 2018San Francisco, CAxlDedicationxliChapter 1IntroductionTechnology around photography is ever-improving; among the various aspectsof this advancement, the image fidelity, known as dynamic range (DR), has notimproved as fast. The DR limitation is rooted in physical constraints.On May 7th, 2016, a Tesla Model S car was involved in a fatal crash. At thetime of the accident, the vehicle was driving itself; i.e., Tesla’s autonomous drivingmode known as Autopilot was turned on. According to an official statement fromTesla, “Neither Autopilot nor the driver noticed the white side of the tractor traileragainst a brightly lit sky” [5, 131]. This accident demonstrates the camera DRlimitation—because of the limited DR of a camera, both a white object and a brightsky would “wash out” in the captured image, making the washed out white objectundetectable against the washed out bright sky. A limited DR is a crucial limitingfactor in any imaging technology, and yet it is mostly unsolved.This thesis focuses on the high dynamic range (HDR) for consumerphotography of natural images. But as the example above shows, DR limit is afundamental problem with every form of image acquisition problem, includingmachine vision (e.g., autonomous driving) and scientific imaging. On the otherhand, in cinematography and other professional photography, where art is the mainfocus and not accurate measurements of light, DR limit is still a problem. This isbecause it is crucial that captured images (and videos) are of high fidelity so thatartistic or professional postprocessing tools (e.g., cinematic color grading) can beapplied.1The fidelity of a captured image directly depends on the amount of lightthat a camera receives while the sensor is exposed to the incident light. Theresulting fidelity depends not only on the overall illumination but also on how thisillumination varies over the scene. The camera exposure settings control the overallintensity of light received by a camera during an exposure. These settings are:camera lens aperture (the more “open” the lens is, the more light comes in), shutterspeed (the slower the speed the longer the exposure is, and the more light comesin as a result) and exposure-index (EI) (the sensitivity of the sensor). However,illumination variation in the scene can still cause difficulties for the camera to imagethe scene with high fidelity. Pixels that receive bright light would saturate and losescene detail. Pixels that receive very little light would be noisy.This problem is compounded by the fact that consumers want the cameras tobe smaller and lighter as well as having sharper images and more pixels. Camerasare getting smaller, and the number of pixels is getting higher, but the sensortechnology cannot quite catch up to improve fidelity with a limited amount of lightcaptured by the camera. This fundamental limitation outlines the camera DR issue.Camera DR is limited, but we always need a wider DR than what is available withcurrent technology. When next-generation cameras arrive, whatever progress wemay have made in electronics is absorbed by higher demands for spatial resolution(“megapixels”) and smaller cameras. The DR issue never disappears.In this thesis, we address this limited DR problem. The problem of havingan HDR in a single captured image is challenging, if not impossible, to solve forall possible cases. Instead, we consider a few common cases of HDR imaging.We propose a few single-image high dynamic range imaging (HDRI) methods forthese particular cases. We use images captured by consumer-grade off-the-shelftechnology only. We value robustness: we aim to increase the DR of a single image.Our techniques expand the capabilities of today’s off-the-shelf cameras for a fewcases and aim to produce the best image possible in these cases. Additionally,unlike common multi-exposure HDRI methods, our methods can capture dynamicscenes and do not require the user to mount the camera on a tripod. To this end,we have developed HDR imaging methods that apply to a single image obtained2(a) The standard dynamic range(SDR) image of a scene, typical of asingle exposure of a conventionalconsumer camera, captures onlylimited details(b) A high dynamic range (HDR)image captures the full details of thesame scene just as we perceive itFigure 1.1: SDR vs. HDR photographs of Brandywine Falls near Whistler,BC, Canada. Left: A lot of detail is missing in a photograph taken by aconventional camera with an SDR due to the wide brightness variationsin outdoor settings. Right: A HDR image of the same scene. In thisimage, the whole scene is visible with all the details. Since the goal ofphotography is to capture the real world scenes just as we perceive them,while the image on the left contains the correct radiometric representationof the scene that is partly in bright sunlight and partly in the shade, theimage on the right fulfills the true goal of photography.3Figure 1.2: A comparison of the dynamic range (DR) of various systems.Natural scenes can have a very high DR, much higher than conventionalcameras can capture and conventional displays can display. The humanvisual system (HVS) has a remarkable DR via adapting to current scenebrightness. As a result, while the simultaneous DR of what the humaneye can perceive is low, when brightness adaptation is taken into accountthe human eye can perceive a wide variety of brightness levels. It maybe noted that cameras can also “adapt” to different brightness conditions,but a “faithful” image of a scene requires a DR as great as the scene itself.This requirement stems from the fact that the purpose of photography isto capture scene details in the original form such that when the imageis presented on an HDR display, it looks just like the original scene to ahuman observer. Every detail of the scene needs to be captured so that theHVS can adapt depending on content and have an experience of lookingat the original scene. Most conventional displays are also DR limited,although the DR performance of displays has been improving.by one exposure only of conventional SDR camera. To overcome the fundamentalphysical limit of the DR of SDR cameras, we take the computational approach. Fromthe input image—a single SDR image captured with an inexpensive, off-the-shelfconsumer camera—we reconstruct the underlying HDR image using image analysisand computational optimization.At a very high level, images are merely color information organized on a grid.4In case of SDR imaging, this captured image loses some of the detail informationof the original scene. To obtain an HDR image, we need to recover this lostinformation. We use a computational approach to do exactly that—we leverageprior knowledge about natural images and attempt to reconstruct the missing data.1.1 What is the dynamic range (DR): the importance ofhigh dynamic range (HDR)Dynamic intensity range of a sensor, or dynamic range (DR) for short, does nothave a universal definition. We follow what seems to be the most widely accepteddefinition: the ratio of the highest detectable pixel intensity to the lowest1.DR depends on the size of the pixel and the amount of light it can register. Thismakes it a physical limiting factor. DR of a pixel cannot be physically modified,but the overall DR of the image can be improved via image analysis and processing.Image DR is measured in bits, or more accurately, exposure-value (EV)differences (known as f/stops). If a camera outputs values 1, 2, . . . , 256, then theDR is 256 : 1, or using the photography jargon, “8 f/stops” or “8 EV”. Dependingon the target audience this same figure is also sometimes expressed as 8 bits, or20 log10 256 ≈ 48 dB. High-end consumer cameras today have come a long way;they can capture up to 14 bits of data per pixel.To illustrate what is DR, let us draw an analogy to something more familiar:DR is like the spatial resolution but along the intensity dimension. Just as the spatialresolution is fixed for every sensor, the DR is also fixed for every sensor. Spatial1This is a simplified definition. More generally, the DR of a camera (or a display) is directly relatedthe number of quantization levels, i.e., the number of different perceptible “steps”, it can capture(or produce). In theory, it is possible to have uneven quantization. When quantization is uneven,the quantization noise would vary, and the additive white Gaussian noise (AWGN) assumption (seeSection 3.1.2.3 for details) made in this thesis would break down. In such cases, a variance-stabilizingtechnique such as the Anscombe transformation [8] needs to be applied to the data-fitting term (suchas the one used in our framework in Section 3.3.2). However, in practice, cameras use a fixed step-size, and furthermore the step-size and the smallest captured value can be used interchangeably. Forthe clarity of exposition, we therefore use the simplified definition of DR.5resolution is dimensionless—it is essentially the ratio of the spatial dimensions ofthe largest object the sensor can image (spanning the whole sensor) to the spatialdimensions of the smallest detail it can resolve (contained in just one pixel). Theabsolute dimensions, of course, can vary from photo to photo; it depends on thefocal length of the lens used. A lens with a long focal length can image an entirecity from a long distance, while a macro lens with a very short focal length canimage a small bug on the same sensor. While the actual physical dimensions vary,the ratio, i.e., the spatial resolution, remains fixed.Analogously, while DR is fixed, the absolute intensity numbers can vary. Theabsolute intensity range depends on what exposure settings are being used, suchas lens aperture, shutter speed and exposure-index (EI) or “ISO settings” [64]. Nomatter what these absolute numbers are, the ratio, i.e., the DR, remains fixed.Together, the product of the spatial resolution (in other words, the number ofpixels) and the DR (in other words, the depth of each pixel) of a sensor gives theimage bandwidth (the total bits of information per image) the sensor can capture inone exposure. This image bandwidth is fixed for a given sensor. For example, let usassume we have a standard dynamic range (SDR) image with 8 bits per pixel depth(i.e., 8 f/stops in photography jargon). This image would have 256 unique intensityvalues 1, 2, . . . , 256, i.e., a DR of 256 : 1. Now, if we bilinearly downsamplean image by 2 × 2 (as in, by averaging every block of 2 × 2 pixels to obtain 1pixel in the downsampled image, pixels outside the image are assumed to havezero intensity), the pixel count would decrease but the pixel depth would increase:The downsampled image would have one-fourth as many pixels (half as many rowsand half as many columns as that of the original image), and four times of theoriginal DR. This is the case since in the downsampled image 1024 different pixelvalues 0.25, 0.5, 0.75, 1, 1.25, . . . , 256 are possible now, resulting from averagingfour pixels. It may be noted that the representation of these intensity levels does notmatter as far as information content is concerned: 1, 2, 3, . . . , 1024 would conveythe same information just scaled. The DR is then either 256 : 0.25 or 1024 : 1, bothof which convey the same range: 10 bits or 10 f/stops. By downsampling, 4×8 bitsof information was reduced to 10 bits of information; we have sacrificed resolution6and image bandwidth to gain DR. This loss of image bandwidth is not ideal. In thisthesis, we improve the DR and keep the spatial resolution unchanged, effectivelyincreasing the image bandwidth.The DR limit is a physical limiting factor that is baked into the sensor designas will be presented in Chapter 3. Unfortunately, the SDR of a conventional camerais very limiting; it is only enough for scenes with a uniform lighting condition suchas indoor scenes and outdoor scenes without dark shadows.A crucial decision in any sensor design is the trade-off between DR and spatialresolution: increasing one reduces another. Sensors cannot be made arbitrarilylarge since the cost of fabrication of sensors and manufacturing lenses and otheroptical components go up exponentially. For a fixed sensor size, an optimal DRneeds to be determined which would then determine the pixel size and spatialresolution.• A high DR would require enlarged sensor pixels, reducing image bandwidth.Similar to the downsampling example above, increasing the physical size ofeach pixel on a sensor increases the number of photons a pixel can sensebefore it is saturated, and that increases the detectable quantization levelsand consequently the DR. However, fewer of larger pixels can be packed intothe same image sensor, reducing the image bandwidth.• A low DR would cause fewer of the pixels contain useful information: Assoon as the intensity range in the original scene goes beyond the cameraDR, scene details are lost: brighter parts of the scene (“highlights”) saturatethe sensor, and their intensities are clipped. On the other hand, the darkerparts of the scene (“shades”) barely register on the sensor; such image pixelsappear dark and noisy. In either case, scene details are lost due to limitedDR. Clipping and noise causes information loss and therefore reduces theeffective image bandwidth.The key here is in making the sensor most useful: by having the best expectedeffective bandwidth: the sensor needs to perform well in photographing mostcommonly photographed types of scenes.7Traditionally, regardless of imaging technology advances, spatial resolutionhas improved faster than DR. The consumer market prefers a higher spatialresolution; so mainstream consumer camera sensor development has largely beenpushing for a higher spatial resolution. DR improvement has been largely ignored.In this thesis, we restrict ourselves to using only consumer-grade off-the-shelfcameras and hardware.We define DR more rigorously in chapter 3.1.2 Research objectiveThe primary objective of this thesis is to develop methods that extend the dynamicrange (DR) of conventional cameras to a high dynamic range (HDR). Since standarddynamic range (SDR) cameras are fundamentally limited with respect to their DR;we approach fulfilling this objective by using combinations of simple modificationsto the camera and applications of computational imaging algorithms. We focus onthe common high dynamic range imaging (HDRI) cases and develop methods thatare suitable for them.We propose to expand the camera DR via camera hardware modifications andsoftware computations. We expand the DR by restoring highlights (e.g., brightlights where intensity values are clipped) and also by restoring shades (whosevalues are buried by noise). Single-image HDRI in the general case is not a tractableproblem. However we observe that we can solve this problem for some kindsof scenes, (for example, night scenes impose a different set of constraints onthe single-image HDRI problem compared to scenes in bright daylight or scenescontaining both light and shade.) With this observation, we proceed to solve anumber of most common HDRI cases, each with a different approach.In this thesis, the main problem we address is how to reconstruct the intensityof the saturated pixels (whose captured values have been clipped) that SDR camerasfail to capture because of camera’s limited DR. We develop single-image HDRImethods for three kinds of scenarios:81) Aperture-filtering: These are methods suitable for scenes having few brightobjects (highlights). For this case, we propose to “spread” the informationin a bright source to its neighboring pixels in the image. In such a case,the intensities of highlights are clipped. As the information is lost, it isnot possible to reconstruct the highlights. We propose to use a filter (to beattached in front of the lens) that will spread some of the information aboutthe highlights to their neighboring pixels. This information would have beenclipped and lost otherwise. From this information, we reconstruct highlights.In other words, we use an invertible optical encoding of the highlights (brightobjects): we use a cross-screen filter in front of the camera lens to encode thebrightness information in the form of pre-determined structured slope createdby the cross-screen filter. These off-the-shelf photographic filters create star-shape streaks around bright objects when placed in front of the lens. Wedemonstrate that each of these streaks gives a projective view of the brightobject (highlight) that produced it. We derive a method to separate thesestreaks from the captured image and obtain the projections from them. Then,from these projections, we reconstruct the bright objects (highlights) using atomography-style reconstruction. This is discussed in Chapter 4.2) Sensitivity-multiplexing (sensor-filtering): Here we turn our focus onscenes that have large areas of highlights (bright objects) and shades (darkobjects). The method above is not usable with large highlights; instead, wefilter the incident intensity of light at the sensor using sensor electronics.We use recently-introduced off-the-shelf camera sensor technology knownas the “exposure-multiplexed mode” [1]. This captures alternate (even) rowsof pixels a certain exposure-index (EI) (sensor sensitivity setting, also knownas “ISO speed” [64]) and the odd rows by the other setting. This new imagingmode brings in a new challenge: for a scene with a wide range of intensitylevels, only one exposure setting would be able to properly expose the scenedetail in the even or odd rows whereas the other exposure setting used by theother rows would cause those rows to be over-exposed or under-exposed. Asa result, exposure-multiplexing effectively reduces the spatial sampling rate9in the vertical direction by half. We present structure-aware interpolationtechniques specifically tuned for this problem to faithfully reconstruct thefull-resolution HDR image despite the lower sampling rate in the capturedimage. We present our findings in Chapter 5. We develop two methods:a) The first method uses non-local self-similarity (Section 5.5): we usedthe sparse gradient prior and the non-local self-similarity image prior toreconstruct an HDR image from a single exposure-multiplexed capture.b) The second approach uses a novel edge-preserving image prior(Section 5.6): This method further improves our exposure-multiplexedHDRI method by using a novel edge-preserving prior we havedeveloped that we call “smooth contour prior”: We observe that sceneedges are smooth along the length and sharp across. To enforce thisconstraint, we take the edge directed interpolation technique Li andOrchard [84], and build our prior based on this technique.We discuss this in Chapter 5.3) Using cross-color-channel correlation: We also propose methods forexpanding the DR of images that have already been captured by SDR images.These are images that have been captured with a SDR camera, and thereforeit is not possible to add a filter to the aperture or the sensor. For suchimages, we develop methods that perform computational post-processingonly. For highlights, we consider the special case when scene highlightsare not white but have a color. We explore what is the best we can do interms of correcting the clipping due to sensor saturation in this case. Forwhite highlights, all three color channels are completely saturated, but forcolored highlights, some pixels can be partially saturated—one or two outof the three color channels may be free of sensor saturation. We employ anatural image prior for such cases, known as the color constancy or cross-color-channel correlation: the pixel intensities in a local neighborhood arecorrelated across color channels. We propose three methods that employ thisprior assumption:10a) Under the Bayesian framework, it is possible to reconstructthese partially saturated pixels and recover the clipped intensities(Section 6.1).b) We then further extend this technique by employing another naturalimage prior: known as the non-local self-similarity prior: same imagepatches tend to appear multiple times in the same image (Section 6.2).c) Further, we observe that any denoising algorithm can help lower thenoise level and therefore improve the DR. We have developed adenoising technique for color images by observing how color imagesensors capture images (Section 6.3).We discuss these methods Chapter 6.1.3 Outline of the rest of this thesisThe rest of this thesis is organized as follows. In Chapter 2, we present therelated work in the literature and discuss how they relate to the scope of thisthesis. In Chapter 3, we introduce imaging and computational imaging in thecontext of single-image high dynamic range imaging (HDRI), define necessaryterminology, and develop a mathematical and computational framework that servesas a foundation for the methods developed and presented in this thesis.In the three subsequent chapters (Chapters 4–6), we discuss the three aspectsof HDRI in this thesis as outlined in the Section 1.2 above. First, we discussour proposed solution for single-image HDRI of scenes with a small number ofbright objects using aperture-filtering: a novel optical encoding approach presentedin Chapter 4. We also propose solutions for high dynamic range (HDR) imageswith large areas with highlights and develop proposed algorithms to exposure-multiplexed imaging for single-image HDRI (in Chapter 5). Finally, we explore thecross-color-channel correlation and developed methods suitable for the dynamicrange (DR) expansion of color image data. We present this in Chapter 6.11Chapter 2Literature reviewFor mostly static scenes, a high dynamic range (HDR) image can be acquired bycombining multiple time-sequential exposures taken with a conventional off-the-shelf standard dynamic range (SDR) camera set at a different exposure settingfor each of these exposures (Section 2.1). However, camera-shake (in case ofhand-held photography) or object movement (in case of photographing a dynamicscene) can cause misalignment between the time-sequential exposures. This spatio-temporal misalignment needs to be corrected (Section 2.1.1). Similar alignmentstrategies are found in multi-exposure HDR video methods (Section 2.1.2).Post-capture misalignment correction does not always work, and hence it ispreferred to develop methods that use only one exposure to capture HDR images(Section 2.2). With SDR sensors, we can either take different exposure settings byone sensor (Section 2.2.1), or use multiple sensors or cameras (Section 2.2.2). HDRsensors have a wider dynamic range (DR) than conventional SDR sensors, but theseHDR sensors have their own drawbacks (Section 2.2.3). Another way of obtainingsingle-image high dynamic range imaging (HDRI) is to correct for the clipping dueto sensor saturation in postprocessing (Section 2.3). Since no data is available inthe saturated image areas, some methods hallucinate data and produce inaccuratereconstruction, while others resort to user-intervention for a plausible restoration(Section 2.3.1). Other methods focus on correcting the pixels which are onlypartially clipped, i.e., one or two of the color channels are clipped (Section 2.3.2).Inpainting methods can fill-in some missing pixels but they are incompatible withpixels missing due to sensor saturation (Section 2.3.3). Denoising techniques canlower the noise and increase DR of SDR images (Section 2.3.4).12Some methods optically “multiplex” the brightness information from theclipped pixels into the rest of the image so that the clipped areas can be recoveredpost-capture via a computational reconstruction of information (Section 2.4).2.1 Multi-exposure high dynamic range imaging (HDRI)Standard dynamic range (SDR) image sensors can only capture with a dynamicrange (DR) of up to 12 f/stops. 12 f/stops means there are 12 bits of noise-free dataper image pixel. Since natural scenes require a much higher DR, one solution toobtain an high dynamic range (HDR) image is “exposure-bracketing”—capturingmultiple images of the same scene, each captured with different exposure setting,and blending these images together (Figure 2.1). The longer exposures wouldcapture details in the shades or dark areas of the scene but the highlights will becompletely washed out because of sensor saturation. And the shorter exposureswould do the opposite and capture the details in highlights or the bright areaswhereas the lowlight areas will be lost due to sensor quantization and noise. Manyconsumer devices, including digital SLR camera (SLR) cameras and cellphonecameras, now support this method, most commonly available as the “HDR mode”.A variation of this approach is “burst mode” photography, where the cameraexposure settings are kept fixed between captures in favor of rapid firing of thecamera [56].This method was known even before the mid twentieth century, for exampleconsult the works by Ginosar, R., Hilsenrath, O., Zeevi, Y. [46], Mantiuk et al.[92]. The underlying mathematics was first formalized by Mann and Picard[91]. Debevec and Malik [26] introduced this method to the Computer Graphicscommunity. They gave a method to simultaneously recover a HDR image and thecamera response curve. An improved method was published later by Robertsonet al. [113]. These multi-exposure high dynamic range imaging (HDRI) methodsand the ones that followed have the same underlying principle of merging multipleexposures, and they only differ in how the weights are determined when combiningthe unsaturated samples of each image pixel location. While typically these13Figure 2.1: Multi-exposure high dynamic range (HDR) merging example.The graph below each image shows the log-log response curve (logcaptured intensity vs. log incident intensity) of that image (See Figure 3.6for details). Left: We present three different exposures of the same image;the longest exposure is shown at the top, and the shortest is shown at thebottom. Different parts of the scene are captured by different images here.Center: Same images are shown with pixel values scaled to match. Theseimages demonstrate that the images capture the same scene, but dependingon the original exposure setting, scene details can be lost. Right: Thecombined HDR image, after merging the three standard dynamic range(SDR) images. (It may be noted that due to the dynamic range (DR) limitof conventional displays and printing paper, for presentation purposes wehave compressed the HDR of the reconstructed image by applying a simplelinear tone mapping, which makes the image appear a little darker.)14methods give preference to the well-exposed observations (i.e., unsaturated andclose to the middle value of the DR of the sensor), Granados et al. [48] usesthe weights based on various sources of noise and derives an optimal exposuresequence. Hasinoff et al. [55] provides a noise-optimal strategy for multi-exposureHDRI given a bound on the total capture time. In theory, this technique can capturescenes with any DR.2.1.1 Deghosting: correcting object misalignment in amulti-exposure image sequenceWhen blending multiple exposures, camera or scene object motion can result inghosting due to misalignment. Camera motion is a homography transform, and aglobal alignment is often enough to avoid ghosting. Ward [142] proposed usingmedian-thresholded images when determining the alignment correction. Latermethods such as by Tomaszewska and Mantiuk [136] use robust methods suchas random sample consensus (RANSAC) [39] over visual feature points extractedfrom the exposures such as scale-invariant feature transform (SIFT) [85] or speededup robust features (SURF) [13]. Since different exposures might mask out differentparts of the image due to oversaturation or undersaturation, care must be taken toaccount for the missing parts of the image when aligning different exposures.However, a global transformation is not sufficient when a significant amountof parallax is present in the scene, and also when there is misalignment due toobject motion. To compensate for the local motions due to such object motions orparallax, alignment methods rely on visual feature correspondences, image patchcorrespondences by Gallo et al. [45], Hu et al. [62], Kalantari et al. [68], Sen et al.[122], or more sophisticated optical flow methods. However, since local alignmentcorrections can be wrong, care must be taken when merging the exposures intoone HDR image. Some methods explicitly reject problematic regions from someexposures such as the ones by Mangiat and Gibson [89], Markowski [94], orweights each exposure by the probability of an estimated motion before averaging.Other methods perform a joint HDR merging and deghosting. A reference exposure15is selected first. The goal is to find the HDR image that matches well with thereference image where the reference image is well-exposed. For the regions wherethe reference image is over-exposed or under-exposed, local similarity in otherexposures is enforced as a constraint on the reconstructed HDR image. Granadoset al. [48] uses a Markov random field to relate pixels from different exposures ona multi-exposure stack that belong to the same scene object.2.1.2 HDR video: obtaining multiple exposures for freeWith off-the-shelf SDR video cameras, multi-frame HDR acquisition is possible bychanging exposure setting between frames; a method that applies this principlewas proposed by Adams et al. [6]. Kang et al. [70] capture a sequence of videoframes while rapidly alternating between two exposure settings. Frame alignmentis performed using optical flow. Since the frames at different exposure settings arecaptured at slightly different points in time, they use global and local registrationschemes to warp the images before merging color information at dark and brightregions from the frames before or after the current frame.Since saturation due to clipping destroys information which furthercomplicates alignment issues, some methods for example by Bürker et al. [17],Guthier et al. [50], Unger and Gustavson [138] use high frame-rate video camerasand then downsample the videos in the temporal domain by averaging consecutiveframes. For example, from a source video acquired at a frame-rate of 200 frames-per-second, an HDR video of a normal frame-rate of 25 frames-per-second canbe produced by merging every consecutive 8 frames. Misalignment correctiondue to object or camera motion is not needed, since a simple average of multipleconsecutive frames produces HDR frames with a regular frame-rate. With a higherframe-rate, exposure time would be shorter, and brighter objects can be capturedwithout clipping due to saturation. However, shorter per-frame exposure timewould require the use of a high exposure-index (EI) (high sensitivity or a ISOsetting), which also increases amount of noise in the captured video frames. Portzet al. [109] proposed a random coded sampling for HDR video. They propose to16expose each pixel with an exposure time randomly drawn from 1, 2, 4 and 8 frametimes. Every time a frame is read out, only the pixels that are at the end of theirexposure time are read out while other pixels continue to expose. They obtain asparse capture of the frame at every frame time since not every pixel is read out.For reconstruction, they apply spatial and temporal locality constraints. This waythere is no loss of light since every pixel is being exposed for the whole duration.2.2 Single-image high dynamic range imaging (HDRI)Multi-exposure HDRI requires time sequential captures, and as a result thesemethods also heavily rely on deghosting. Deghosting cannot be expected to workall the time, and consequently single-image HDRI methods are more attractive.2.2.1 Exposure-multiplexing: spatially varying exposureFigure 2.2: Assorted Pixels [102]—a technique that spatially variesexposure setting. Pixels with the same labels above make a completeframe. One image constitutes a number of these frames, i.e., multipleframes of the same scene can be captured with different color channelsand/or exposure settings. (Image above extracted from [103].)The exposure-multiplexing based solutions to the dynamic range (DR) limitproblem can be viewed as various space-time trade-offs. On one extreme, the multi-exposure HDRI techniques often lose temporal accuracy (multiple exposures over17time causing ghosting artifacts in the final image) in favor of having full-resolutionimage sequence. On the other extreme of this space-time trade-off lies spatiallyvarying exposure techniques such as Assorted Pixels by Nayar and Mitsunaga [102]and later by Nayar and Narasimhan [103] (Figure 2.2).Spatially varying exposure techniques effectively obtain multiple captureseach having a different exposure setting but also a lower spatial resolution.It introduces a mask on top of the camera sensor’s RGB Bayer pattern [14](Figure 2.2). This mask has a varying transparency per pixel. Thus, adjacent pixelswill be exposed differently. Pixels with the same exposure setting, when groupedtogether, form a lower resolution image of the scene captured with that exposuresetting. A total of 4 different levels of transparency is used, resulting in 4 lowresolution captures taken with 4 different exposure settings at the cost of a 2 × 2resolution loss. The mask also has to be placed permanently. Wetzstein et al. [143]proposed a Fourier domain method for HDR acquisition, and even though they hada different approach, their solution used similar sensor masks and suffered from thesame resolution loss.Some recent sensors have a built-in capability similar to that of AssortedPixels. However, instead of blocking light using various levels of neutral densityfilters, these sensors allow the sensor exposure-index (EI) (also known as ISO speedor light sensitivity) to vary spatially. This sensor capability is commonly known as“exposure-multiplexed mode”, or more commonly “dual ISO mode” [1]. Most ofthese sensors allow two different EIs to be set on alternate row-pairs. This approachis better than Assorted Pixels since the two EIs can be changed independently,which gives a higher degree of flexibility compared to a fixed mask. Hajisharifet al. [52] use local polynomial fitting weighted by noise for reconstructing the fullresolution HDR image. We have explored ways to reconstruct the HDR image fromsuch a exposure-multiplexed HDR image using computational image reconstructiontechniques such as due to Heide et al. [59]. We propose to further improve onexposure-multiplexed single-image HDRI by developing a novel edge-preservingprior we refer to as the smooth contour prior in Chapter 5.182.2.2 Multiple sensorsInstead of trading off resolution for DR, some methods use multiple standarddynamic range (SDR) sensors instead of one. Aggarwal and Ahuja [7] provideda mirror based solution while Tocci et al. [134] and Kronander et al. [76] presentedbeam splitter-based solutions to multi-sensor HDR acquisition. All of these methodsuse three sensors imaging along the same optical axis but receiving a different shareof the incoming light, resulting in different exposures. Since they are situatedon the same optical axis, there is no parallax. A downside is the dependenceon accurate extrinsic calibration of the sensors; the slightest misalignment wouldcause the resulting image to exhibit blurring and ghosting. Furthermore, since thistechnique splits incoming light into a few sensors, there is a possibility that in lowlight conditions the overall noise performance will be worse than a single sensorSDR camera. Finally, such reflective components on the optical path increases inter-reflections inside the system, but inter-reflections cause undesirable glare.Much of these problems can be alleviated by an off-axis solution. The workin [127] proposed an off-axis two-camera HDRI solution. The two cameras are setto two different exposure settings to capture a wider DR than single camera cancapture. This solution has the advantage of not losing light because of using abeam splitter. However, this method uses two copies of the same hardware andhence incurs twice the cost. Also, in such a setup, the cameras do not share anoptical axis which causes parallax; a homography correction and deghosting needsto be performed when merging the two SDR captures into a HDR image.2.2.3 Integration curve manipulationConventional image sensor pixels act as photon counters. In particular, the sensor isphysically or electronically “shuttered” and exposed to the incident light for a presetexposure time. When the sensor is exposed, each pixel time-integrates the intensityof incident light. Since light is a form of energy and like all forms of energy it alsoarrives in discrete quanta, intensity is nothing but the number of incident photons.19Incident photons are converted to electronic charge in the pixel. Effectively, eachpixel linearly counts incident photons2; the output of a pixel is linear with respectto the incident intensity of light. Every pixel has a capacity: after counting a fixednumber of photons it cannot count any further because it is saturated.In contrast, cones in human eyes have a non-linear response to intensity oflight. More accurately, according to the Weber-Fechner Law [36], the responseis logarithmic. A logarithmic response would allow camera sensors to measure alarger range of intensity values before the pixel saturates.Such nonlinear response can only be achieved electronically. Opticalcomponents (lens, etc.) operate linearly on non-monochromatic light, and thereforea combination of such optical components would produce a linear transformationon light. Instead, the integration curve of sensor pixels can be modifiedelectronically. There are two broad classes of such sensors:1) Logarithmic sensor technology utilizes the exponential I-V characteristics ofMOSFET3 transistors in the subthreshold region [32]. While conventional,“linear” sensors accumulate charge over an exposure period, theselogarithmic sensors directly convert photocurrent to electric voltage forreadout. This very non-integrating property limits maximum possible signal-to-noise ratio (SNR). Fixed pattern noise is also high in these sensors [9].Recent linear-logarithmic sensor technology attempts to combine both linearand logarithmic sensor technology [9, 126]. On these sensors, for lowintensities, a pixel acts linearly, but for high intensities, the MOSFET turnson and the sensitivity significantly drops, creating a logarithmic response.However, this sensor does not solve the fundamental low-SNR issues withthe logarithmic sensors but rather employs a linear component to hidethe problem. These sensors usually readout the linear and logarithmic2more accurately, sensor pixels can count a portion of the incidents photons; the portion dependson “quantum efficiency” i.e., the incident-photon-to-converted-electron ratio of the photosensitivematerial the sensor is made of.3metal-oxide-semiconductor field-effect transistor20components at different times, which can cause spatial incoherence similarto the self-reset sensors we discuss next.2) Another approach is to add “intelligence” to each sensor pixel such thatit can reprogram its own sensitivity. Such a pixel can observe its photoncounting level; if the pixel saturates before the exposure time expires,it means the pixel is too sensitive given the incident light intensity, andsubsequently it reprograms itself with a lower sensitivity and restarts timeintegration. Alternatively, instead of reprogramming the sensitivity, it is alsopossible to reprogram the exposure time: every time the sensor saturates,it can reprogram itself to expose for half the time and restart the photoncounting for this shorter exposure. This reprogramming can continue untila good observation is made. These “self-reset” methods [32] howeverhave undesirable spatial incoherence: images contain incoherent data sincedifferent pixels would be exposed at different times.2.3 Restoration of clipped signalsFor band limited 1D signals, reconstruction algorithms have been proposed forsituations where the number of missing samples is low [3] or where a statisticalmodel of an undistorted signal is known [105]. In case of images, similarapproaches would require much stronger image priors than currently available.For noisy images, Foi [40] demonstrated that some clipped pixel values ina standard dynamic range (SDR) image can be restored provided the underlyinglatent values are not much higher than the clipping threshold. Natural images areknown to be piecewise smooth. All sensor pixels inside a large highlight withbrightness level just above the clipping threshold would ideally be clipped dueto saturation. However, due to noise, some of these pixels will have intensitylower than the threshold, and therefore will not be clipped. Measurements fromthese pixels can be propagated to nearby unknown pixels. However, this will onlywork for pixels with latent values not higher than the clipping threshold plus the21noise standard deviation. For latent pixel values that are much higher than theclipping threshold, approaches like this would be unable to restore the clippedvalues faithfully since there is no information to propagate. In Chapter 4 wepropose a method that optically spreads information about bright highlights intoneighboring pixels when an SDR image is captured; we then use this additionalinformation in post-processing to reconstruct the clipped highlights.2.3.1 Single-image SDR to high dynamic range (HDR)Reconstructing a HDR image from a single SDR image with clipped values is achallenging problem that yields only approximate solutions based on heuristics ormanual user intervention. Meylan et al. [97] separates the SDR image in diffuseand specular parts, and applies a steep scaling of image brightness. They justify theuse of their piecewise linear tonemap through a psychovisual experiment, howeverthey do not give the parameters of the tonemap operator and only mention thatthe parameters vary from image to image, which means automated single-imagedynamic range (DR) expansion is not possible with this method. Piece-wise lineartone mapping can introduce unwanted discontinuity in the reconstructed image,instead Banterle et al. [11] enhances the highlights using a smooth brightnessdensity map. They estimate the density map from the input SDR image by selectingthe highlight areas and blurring the selected highlights with a Gaussian filter. Thisapproach can leak background brightness in dark foregrounds since they use ablurry density map. As an improvement to this technique, Rempel et al. [112]later proposed applying an edge-stopping function on the density map dependingon which parts of the input SDR image are not highlights. In both of these methods,the brightness density map is generated from the map of the highlight areas, andtherefore the brightness enhancement is directly related to the size of the highlightregion, but this is clearly not always the case.All these methods do not distinguish between saturation due to very brighthighlights, and saturation due to reflection from a specular surface that might be justabove the clipping limit. Didyk et al. [29] use a classifier to identify what caused22a saturated highlight. They enhance a SDR video to a HDR video, which givesthem the ability to gather more data on the highlights for classification via trackingthese highlights across multiple video frames. They trained their classifier using atraining set of 2,000 manually classified highlight regions. They manually designed20 features and took the 8 best performing ones. However, misclassification is notunexpected, and such misclassifications produce noticeable artifacts.Recently, Eilertsen et al. [31] proposed a method for restoring clipping using adeep convolutional neural network, which works well for small light sources only.2.3.2 Restoration of clipped colorsOther research focused on building on partially available data. In the case ofcolor images, pixels that are clipped in one or two color channels can be estimatedfrom the unclipped channel using natural image prior information on color. Zhangand Brainard [152] use cross-color-channel correlation [65] and models RGB pixelvalues as a three-dimensional (3D) Gaussian distribution. However, since differentimage areas have different colors, one Gaussian distribution is inadequate.Guo et al. [49] recover color and lightness through propagation of information.Dcraw [25], a popular public domain software package for processing RAW imagefile formats, also has a restoration mode for clipped color channels. Masood et al.[95] and Elboher and Werman [33] restore highlights in the spatial domain usingcross-channel correlation, whereas we proposed a gradient domain reconstructionmethod that outperforms the prior work [60, 119].2.3.3 InpaintingInpainting techniques [15, 124, 130], although designed to fill-in missing pixels, arenot well suited for the restoration of clipped signals since they tend to interpolatethe missing pixels. In case of single-image high dynamic range imaging (HDRI),23on the contrary, the missing pixels are often much brighter compared to thesurrounding pixels the inpainting algorithms use for interpolation.2.3.4 DenoisingDenoising is very much related to single-image HDRI since denoising lowers thenoise floor and therefore improves DR. Intuitively, denoising brings out the detailsin the dark and relatively more noisy parts of the scene, and thereby improves theoverall image fidelity. Also, lowering noise increases DR by definition.Blurring with a smooth kernel is the simplest form of denoising. It reduces thenoise but unfortunately reduces the high frequency contrast. Median filtering due toHuang et al. [63] is also a local window based filtering approach yet it is much morepowerful. This demonstrates the power of nonlinear filtering in denoising. Thebilateral filter first proposed by Tomasi and Manduchi [135] brings in an explicituse of similarity, as it works on the principle that noise can be reduced by averagingover similar pixel values. Without knowing the ground truth, looking for pixelsimilarities within a local window is what these methods rely on.Non-local means (NLM) denoising by Buades et al. [16] improves theperformance by utilizing non-local self-similarity which is often present in naturalimages. It adds a “non-local” aspect to the search for similarities and alsoutilizes patch-similarities instead of pixel-similarities. Patch similarities utilizesome information about detail structure of the underlying image and provides bettermatches compared to pixel-similarities.Other types of denoising algorithms [19, 41] assume compressibility of imagedata. The core principle of compression of any data is that the data can betransformed into another domain (e.g., discrete cosine transform (DCT), wavelets)such that the noise and data become easily separable. As such, these methods aimat performing a reversible transform so that most of the image energy is compressedinto a small number of coefficients, but the noise energy is not.24Block-matching and 3D filtering (BM3D) [21, 24] is a state-of-the-artdenoising technique for natural images. It has the advantage that it combines thebest of all approaches and yields the best results. BM3D employs a non-local searchfor patch-level similarities. For every pixel, similar patches are gathered into athree-dimensional volume, the volume is transformed using DCT or some othersparsity inducing domain, and finally the noise is removed via hard thresholding ofthe transform coefficients. After inverse transforming the volume data, the centerpixel of each clean patch is written back to the original location.2.4 Multiplexing scene informationIt is well-known that natural images are highly compressible. When a cameracaptures an image, the image data in the uncompressed format (also known as theRAW format) is stored in the internal memory.Compressibility opens up a door for overlaying additional scene informationinto the same image buffer on the camera. For the sake of argument, it wouldbe possible to free up space in the image buffer by performing a simple losslesscompression such as run-length encoding [78] and free up space to hold otherinformation of interest. Although we do not necessarily have to literally compressthe image data. We can simple take advantage of natural image statistics andoverlay new information in a way that this new information is separable.In theory, it should be possible to overlay any information on to raw,uncompressed image data as long as the overlay remains separable. Any suchseparable overlay can be removed before processing or viewing the image.This has interesting implications in image processing in general (for example,steganography [66] discusses how information can be concealed inside of files suchas images). However, in image acquisition, the most interesting applications arediscovering ways to optically overlay information about the scene.The fluttered shutter approach due to Raskar et al. [110] encodes motion byopening and closing the shutter in a binary pattern. The idea is to produce a blur25pattern that preserves high frequencies and therefore can be inverted. Levin et al.[81], on the other hand, achieves motion invariance by driving the camera in aparabolic motion on a line parallel to the motion of the object of interest.The recovery process is an inverse problem and, in most cases, it is heavilyunderdetermined. To improve the overall conditioning of the problem, ones needsa set-up such that slightly wrong reconstructions would amplify the artifacts tomake them easily detectable. Then, finding the solution amounts to finding rightparameter values that produces little artifacts. Levin et al. [80] multiplex depth byutilizing defocus blur and a coded aperture. The defocus blur kernel, whose radiusat different parts of the image would be related to the depth of the scene in thatpart. They have designed a binary patterned aperture filter that would increase theringing artifacts in a reconstructed image when deconvolved with a point-spreadfunction (PSF) of a wrong radius. They approximate the heavy tailed natural imagegradient prior by minimizing the `0.8 norm of image gradients [80, 106].Bando and Nishita [10] digitally refocus a single image using colorsegmentation and estimation of defocus blur radius. They look for the smallestradius of blur that produces ripple artifacts with oscillation measure below somethreshold. However, their method was found to fail for images with clipping due toover-exposure.For single-image high dynamic range imaging (HDRI), we first attempted toprevent highlights from being clipped by diffusing the energy [137]. A simplisticway of diffusing the energy is by defocusing the camera. Defocusing the camerablurs the image; more accurately the captured image is a convolution of thelatent sharp image and the shape of the aperture. Therefore, blurring an image iseffectively the same as low-pass filtering (or more accurately, blurring with a diskfilter); which means that a blurred image is very difficult to sharpen again sincethe high frequencies are either suppressed or lost. Instead, we put “invertible”shapes in the aperture—these shapes have high-frequency components whichhelp preserve the high frequency components of the image itself as the image isoptically convolved with the aperture. From the obtained blurry image, the single-image HDRI reconstruction involves the inverse problem, which is a deconvolution26problem. In theory, deconvolution can work perfectly given a blurred image withno data corruption. However, in reality, most deconvolution problems are ill-poseddue to various sources of noise, quantization and compression of the image data.Generally speaking, the aperture-filtering above performs an “opticalencoding” of information. Aperture-coding was first introduced by Ables [4],Dicke [28] for the field of astrophysics and popularized by later publications suchas Fenimore and Cannon [37]. In our approach, it takes the intensity information atthe really bright areas (highlights) and “overlays” this information in nearby pixelsof the same image. We observed that this optical overlay of information about (oran “encoding” of) the highlights has to be invertible; i.e., the overlaid informationshould be easy to separate out. Since the two sources of information (image data,and encoded information) are added together in the captured image, the two sourcesneed to have some “orthogonality” that we use to separate them. Orthogonality isachieved in our case by having different statistical properties. The statistics of theencoded or multiplexed information have to be orthogonal to and separable fromthat of image data. The solution we came up with is to use a cross-screen filter. Across-screen filter is an off-the-shelf photographic filter that one attaches in frontof the lens to create star-shaped light streak around each bright object. In otherwords, we encode the highlight information in the form of structured light streakcreated using a cross-screen filter (Chapter 4, [118]). We demonstrate that each ofthe streaks created next to a highlight gives a one-dimensional projection of the twodimensional clipped highlight. We can obtain up to 8 projections for each clippedhighlight in the image, and from these projections we can reconstruct the unknownclipped highlights using a reconstruction algorithm.2.5 SummaryThe literature review we have listed in this chapter is directly related to the generalproblem of single-image high dynamic range imaging (HDRI) we address in thisthesis. The later chapters also review some literature that is relevant specifically tothe problems addressed in those chapters.27Chapter 3Background and mathematicalframeworkIn this chapter, we discuss some physical concepts regarding camera’s imageacquisition process and the mathematical constructs (i.e., our single-image highdynamic range imaging (HDRI) framework) that we use throughout this thesis.We call the physical process of going from high dynamic range (HDR) (ofthe scene) to standard dynamic range (SDR) (of the captured image) the forwardprocess or the image formation process. The mathematical model that representsthe physical process is called the forward model.We compare and contrast various aspects of the human visual system (HVS)(Section 3.1.1) with digital imaging which will be useful in the later discussions.Then we discuss the photographic process (Section 3.1.2), the noise model andother relevant aspects, which leads to the definition of the dynamic range (DR)(Section 3.1.3).We describe why DR is a physical limit for cameras. Going back froma captured SDR image to a reconstructed HDR image is, therefore, an inverseproblem: for a given SDR image we need to reconstruct the unknown HDR image(Section 3.3.2). Reconstructing the HDR image requires reconstructing missingdata; and this is an underdetermined problem. We perform a mathematicalderivation of the maximum-a-posteriori (MAP) estimate of the unknown HDRimage. This derivation creates the mathematical framework that underlines allof our methods presented in this thesis. We also provide a brief discussion of28the different methods presented in subsequent chapters of this thesis with respectto this framework. Single-image HDRI is an inverse problem; we discuss thegeneral properties of this optimization problem, in particular, the mixture of `1and `2 priors. This mixture of priors requires advanced optimization techniques;we describe two such techniques: iterative reweighted least squares (IRLS) andprimal-dual convex optimization (Section 3.4.5).3.1 The forward image formation modelDynamic range (DR) is a measure of image fidelity: It indicates how much of thedetail in the original scene is faithfully captured. Every imaging system suffersfrom the DR limitation, and it is true for all imaging devices (both cameras anddisplays) as well as human eyes. This means that we do not need to achieve aninfinite DR. Instead, we need to study the human visual system (HVS) and settlefor a high enough DR for digital imaging purposes that is enough for a humanobserver. Unfortunately, due to technical limitations, existing imaging technologyis not quite there yet—camera hardware alone is unable to capture images with a DRthat is high enough; therefore images captured with today’s cameras lack fidelity,and hence our thesis.Conventional cameras capture images for human observers. This is why adiscussion on the differences and parallels between the HVS and digital imagingis relevant here. The human eye mostly acts like a camera. Just as in digitalcameras, the human eye contains a lens, an aperture to control how much light tolet in, and a pixelated sensing layer in the back of the eye with three different colorsensitivities. While there are parallels to the human eye and digital cameras, thereare also differences. Unlike digital cameras, human eyes perform more complexcomputation before sending an “image” to the brain. Color perception, as well asbrightness perception by the HVS, is quite complex because of all the perceptualintelligence built into the HVS. Fortunately, photographs only need to be able tocapture enough scene details so that the scene can be reproduced as is, and thehuman brain can perceive the scene quite vividly from just a two-dimensional29image of the scene. For the full experience of a scene, the full range of colors,brightness and contrast need to be captured. In particular, we want:1) higher resolution so that the image does not look pixelated,2) the full range of colors so that true colors of objects are captured well, and3) highlights (details in the very bright area that can get over-exposed) andshades (the very dark parts of the scene that can get under-exposed) arecaptured just as well as the midtones (pixels that are properly exposed) inthe picture.This thesis focuses on the last (item 3 above). For a camera, observing details overthe full scene brightness range is difficult for a variety of reason as we will discusslater in this section.The range of brightness values a camera can observe is not fixed—hencethe term “dynamic” in DR. While the relative size of the range depends on thehardware, the absolute range of brightness values depends on the specific exposuresettings that were used to capture an image.We present a short discussion of all these settings that affect the DR of acamera. Through this discussion, we develop a mathematical model of the digitalimaging process. From this image formation model (also simply, the forwardmodel) we derive a formal definition of image DR in the context of digital imaging.3.1.1 The human visual system (HVS)In the most general definition, images can be acquired from any range in theelectromagnetic spectrum. For example, most medical imaging equipment usesx-ray and other kinds of imaging. In this thesis, we look at consumer cameras thatwe use to photograph everyday experiences. With this goal, the cameras are made4Source: https://upload.wikimedia.org/wikipedia/commons/1/1e/Cones_SMJ2_E.svg[accessed: 2018-02-24]5Source: https://upload.wikimedia.org/wikipedia/commons/3/3b/CIE1931xy_blank.svg [accessed: 2018-02-24]3000.20.40.60.81.0400 450 500 550 600 650 700S M LFigure 3.1: Color perception by the human visual system (HVS): responsevs. wavelength4. The HVS is sensitive to a part of the electromagneticspectrum; this part of the spectrum is called visible light. However,the HVS cannot recognize all wavelengths individually, but instead, itrecognizes only three colors: red, green and blue. The human eye focusesan image of the scene on the photosensitive part of the HVS is the retina;color vision results from a component of the retina called cones. Conesare small light receptors, like camera sensor pixels. Usually, the HVShas three types of cones: “L” cones (for long wavelength) perceive red,“M” cones (for medium wavelength) perceive green, and “S” cones (forshort wavelength) perceive blue. Each type of cone responds to a range ofwavelengths within the visible part of the electromagnetic spectrum. Thepeak response for each type of cone is at a certain wavelength; responsegradually falls off on either side of the maximum. It may be noted thatthe response curves of the various types of cones overlap quite a bit; manywavelengths of light can excite more than one kinds of cone, and that ishow we see mixed colors. For example, 600 nm light would be perceivedas yellow when both “M” (green) and “L” (red) cones are excited. Becauseof this overlap, the different color channels red-green-blue (RGB) arespatially correlated. We call this cross-color-channel correlation in thisthesis, and in Chapter 6 we exploit this image property for single-imageDR expansion problems.31460480500520540560580600620x0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8y0.00.10.20.30.40.50.60.70.80.9Figure 3.2: CIE 1931 colorspace xy chromaticity diagram5. Allperceivable colors at full saturation are shown in the diagram. Its name,the horseshoe diagram, comes from the shape of this diagram. The fullysaturated colors are organized on a two-dimensional "xy" plane definedby the CIE 1931 colorspace standard. This plot has been generated usingan experimentally obtained perceptual distance metric, i.e., the pairwiseperceived difference between colors we see. The boundary of this plotis the locus of monochromatic light. Monochromatic light has a singlewavelength; the wavelengths are denoted in nanometers next to the curve.These are the colors we see in a rainbow when the white light from thesun is refracted into the constituent monochromatic light. The colorson the inside of the boundaries are polychromatic colors; these colorscontain more than one wavelength. While each monochromatic color isunique, the polychromatic colors are not: each polychromatic color canresult from many different combinations of wavelengths of light. It is aninteresting fact that we perceive monochromatic colors at both ends of thevisible light spectrum as magenta as well as any polychromatic color thatis some combination of blue and red; this is why visible colors seem tologically form a circle, when they are really the visible segment of themuch wider electromagnetic spectrum. Magenta serves as a single “notgreen” color signal in the HVS.32to mimic the human experience as close as possible, which is why a short studyof the HVS relevant to this thesis. In particular, we shall look at the aspects of theHVS, today’s consumer cameras, which are good at implementing and which arenot, particularly considering the DR.“Color” is merely a concept in HVS perception; color does not exist in nature.In nature, there is an electromagnetic spectrum of wavelengths. Human eyes consistof two kinds of photosensitive cells: the cones (that are responsible for color vision)and the rods (that are responsible for low-light vision). Color vision is perceivedby three types of cones: long (L), medium (M), and short (S). The L cones candetect longer wavelengths, and we identify this color as red. The M cones detectgreen, and the S cones detect the shortest wavelengths we recognize as blue. It maybe noted that these are ranges of electromagnetic wavelengths, and not individualwavelengths. The response of each kind of cones per wavelength is shown inFigure 3.1. It may further be noted that even a pure color, i.e., a single wavelengthcan excite more than one kind of cones. Since we have only three different typesof cones, our color perception boils down to three dimensions instead of perceivingthe exact wavelength of light. There is also a significant overlap between the color-response curves. A cross-color-channel correlation prior (or simply “cross-channelcorrelation”) can leverage this information in a computational optimization setting.In Chapter 6 we employ this image prior.Color perception is three dimensional, and one of the three degrees of freedomof is the intensity. If we separate out the intensity, we are left with two degrees offreedom that determines the perceived color, called chromaticity. The perceivedRGB colorspace can be transformed into one intensity and two chromaticitychannels. If the two-dimensional (2D) chromaticity values are arranged on a planeusing a perceptual distance metric, we get the horseshoe diagram (Figure 3.2).The horseshoe diagram contains all colors perceivable by HVS. These colors arebounded by the horseshoe-shaped locus of monochromatic colors. Any point insidethe diagram presents a color resulting from a mixture of multiple wavelengths.Natural objects tend to be locally color-consistent; i.e., it is reasonable toexpect that an image of a scene can be composed of parts with each part having a33uniform color. Even though illumination can vary spatially, but that only varies theintensity, while the chromaticity (i.e., color) remains unchanged. In other words, ina local neighborhood of an image, the three color channels are expected to correlatespatially except at sparse color boundaries. In Chapter 6 we explore a few methodsthat leverage this observation.3.1.2 Digital imagingIn the simplest form of a camera, light from a scene is focused by the lens onthe image plane. A digital sensor digitizes the image. Let f be the latent imagecorresponding to the scene that we want to capture, and g be the actual imagecaptured by the camera. IfM represents the optical processes in the camera, then,g =M(f). (3.1)ThisM corresponds to the forward model of the imaging system, also known asthe image formation model. We now describe the imaging process (Figure 3.3).3.1.2.1 Exposure controls: aperture, shutter and exposure-index (EI)Exposure settings are camera parameters that we can change to control how bright acaptured image would appear. Since the camera sensor has a limited dynamic range(DR), it is crucial to use the right settings to be able to capture the most amount ofinformation. The following three aspects help choose the right settings.1. Aperture is the size of the opening in the lens, it limits the amount of lightthat can pass through the lens. A pinhole aperture will create a perfectly sharpimage of the entire scene, while larger aperture makes the image blurry forobjects that are not at the focal plane of the lens (Figure 3.3). A large apertureis still desirable because it lets a lot of light into the camera. More light iscrucial for reducing noise in the captured image.34Shutter LensImage plane Focal planeOut of focus Out of focusApertureOptical axisSensorImageFigure 3.3: Image acquisition using a lens and camera. Light from thescene goes through the entry pupil, gets focused by the lens onto the imagesensor. The amount of light that reaches the sensor is determined by: 1)Lens aperture: with a wider aperture, a thicker pencil of light enters thecamera. Lens aperture causes blurring due to defocusing. Only objects atthe focal plane remain in focus (the blue object above). Objects that arecloser than the focal plane (red) appear blurry since rays emanating fromthese objects get focused behind the sensor. Likewise, objects (green)that are farther than the focal plane appear blurry as well because theyget focused in front of the sensor. Light rays from such a sensor getfocused inside the camera and get diverged again by the time these rayshit the sensor. The range of depths that are in focus is called the depth offield (DoF); the wider the aperture, the narrower gets the DoF. 2) Shutterspeed: the slower the shutter speed, the longer the sensor is exposed to theincoming light. A slower shutter (i.e., a longer exposure) causes blurringdue to object or camera motion.2. Every camera has a shutter, either a physical shutter that physically blockslight when closed or lets light in when open, or an electronic shutter thatimpacts when a camera sensor is accumulating photons and when it is not.When photographing a scene, the camera shutter opens “briefly” to exposethe sensor to the incoming light. A faster shutter speed lets in less light; thisincreases noise but also increases the clipping level.3. exposure-index (EI) (commonly known as ISO speed) [64] or sensor35Incoming lightFilter layerSensor arrayResulting patternFigure 3.4: Bayer filter enables color imaging by filtering the incidentlight on each pixel for one color. Left: Color filters are organized ina repeated 2 × 2 BGGR pattern. The exact order of the color-filtersvary by manufacturer though; some manufacturers prefer RGGB patterninstead. Across the different patterns, one thing is common: red andblue are sampled at a lower rate compared to green. This is because thegreen channel carries most of the “luminance” information (or in otherwords, perceived brightness), and the human visual system (HVS) is moresensitive to luminance than chrominance. In Chapter 6 we exploit thissubsampling to improve on state-of-the-art denoising techniques. Right:Profile view shows the working principle of color filters placed over sensorpixels. Photons with wavelength from the one part of the spectrum are letin while those of the other colors are filtered out.6sensitivity determines how sensitive the sensor will be to incoming photons.At a low sensitivity level, more photons will be needed to saturate the sensor.3.1.2.2 Analog to digital conversion of optical dataThe image sensor captures the image and converts it to digital data. First, the imageis anti-aliased by an optical application of a low-pass filter ρ. This low-pass filteringis often implemented by creating small lens-array or light collectors on top of thelight-sensitive surface of the sensor.Each sensor pixel essentially acts as a photon counter. The camera responsefunction is the input-to-output mapping of a ray of light. Assuming the camera doesnot apply an artificial gamma curve or compresses the image, in modern cameras36wavelength (nm)Figure 3.5: Experimentally obtained the spectral response of a Canon40D. Color camera sensors have three kinds of photosensitive pixels:red, green and blue (RGB). These different colors, or “channels”, filterlight based on color. For example, the red pixels filter out photonswith wavelengths that are not considered red. What wavelengths are“considered red” is what defines the spectral-response function of thosepixels. More accurately, the color filters do not respond to differentwavelengths the same way. For example, in this plot, the red pixels havethe highest response around 650 nm, but it responds to many wavelengths.The response curve is usually available from the vendor; we wanted todouble check that the vendor data sheet matches. This is how we measuredthe response curve. We captured multi-exposure high dynamic range(HDR) images of color checkers (a board with small squares painted witha variety of spectrum calibrated paints) under wide-band uniform whiteillumination. We then compare the RGB color values [Rk, Gk, Bk]T forthe kth color checker element obtained from the captured image with themanufacturer-provided reference n-dimensional multispectral data ck ∈Rn of the kthcolor patches on the color checker. Then the discretizedresponse curves ri can be obtained by solving [c′k]ri = [Rk, Gk, Bk]37this relationship is linear until a sensor pixel saturates. Sensor pixels saturate ifa pixel receives overly bright light and the value of that pixel is clipped at themaximum value. Assuming the final image has intensity values scaled between0 . . . 1, the clipping value is 1.An analog-to-digital conversion operation Q then quantizes the pixelmeasurements. Each quantization step needs to be small enough so that a humanobserver cannot distinguish the difference between two consecutive levels (justnoticeable difference (JND)). Cameras typically perform quantization with thesame step size between any two consecutive quantization levels.Color image sensors typically capture three primary colors: red, green andblue. Each sensor pixel has one of three color filters on top of it; the color filterstypically follow a repetitive structure known as the Bayer pattern [14] (Figure 3.4).The spectral response of these color filters aims to mimic that of the three typesof cones in the HVS. For example, the spectral response of a Canon 60D for colorcalibration purposes by using a standard color chart is presented in Figure 3.5; andwhen we compare this plot with the spectral response of the HVS in Figure 3.1,we find obvious similarities but also striking differences. For example, the greenresponse curve appears much wider than that of the HVS.3.1.2.3 Noise and noise modelingWe follow [42] for modeling image noise. In the image acquisition process, thereare multiple sources of noise; but these various noise sources can be combined intotwo major groups:• Images contain an additive Gaussian noise component which can describethe aggregate effect of electronic readout and heat noise.6Source: https://upload.wikimedia.org/wikipedia/commons/3/37/Bayer_pattern_on_sensor.svg [accessed: 2018-02-24] and https://upload.wikimedia.org/wikipedia/commons/1/1c/Bayer_pattern_on_sensor_profile.svg [accessed: 2018-02-24]38• Images also contain a signal-dependent noise, mostly photon shot noise.Photon arrival is a Poisson process, and consequently, the resulting photonshot noise exhibits a Poisson distribution, which is a signal dependent(multiplicative) Gaussian noise [61].For some latent image f , we observe the noisy image g = f + η(f) whereη(f) denotes the signal-dependent additive Gaussian noise η(f) ∼ N (0, σ(f)),σ(f) =√a2f + b2, (3.2)which is the combined image noise (Figure 3.6), and a ∈ R+ and b ∈ R+ areparameters that depend on the sensor and exposure settings of the camera. Thesenoise parameters can be measured. For color images, each color channel wouldhave a different pair of values for these parameters.In case of SDR cameras, noise is particularly dominant and noticeable in theshades, i.e., the dark image areas (when f ≈ 0). In addition, since a 1, for DRimprovement purposes, it is often safe to assume that image noise is additive whiteGaussian noise (AWGN), i.e., from (3.2) we obtain,σ(f) ≈ b, (3.3)and since the parameter b is constant for an exposure setting, it is spatially invariant,and therefore we can write,σ ≈ b. (3.4)This simpler model allows to use an easier way to measure the single noiseparameter σ = b: a white piece of paper with a uniform illumination should bephotographed. Any high-frequency pattern in the captured image can be considerednoise. When the measurement of noise from every pixel is gathered, it appears tofollow some distribution around a zero mean. A Gaussian fitting of the distributionwould get us the parameter we seek: σ = b from the equation (3.4) above. Thisprocess can be repeated once for each color channel and each exposure setting.39SDR response functionNoiselog input intensityNoise floorDynamic RangeNoisy Captured SaturatedlogmeasuredintensityClipping ceilingHDR response functionshades midtones highlightsFigure 3.6: Intensity response and DR limit. An ideal HDR camerawould have a linear response to the input intensity values. However,conventional cameras are physically limited by their DR. Beyond theclipping ceiling, the SDR response is flat (a camera pixel can receive thismaximum intensity, and saturates for any intensity higher than this.) Atthe low end, the DR is bounded at the noise floor (the smallest valuea sensor pixel can faithfully receive. Lower intensity values have poorsignal-to-noise ratio (SNR), i.e., “dominated” by noise.) This low enddepends on the measurement noise—with a high noise level, the low endmoves up, whereas, with a low noise level, this low end moves down.These two quantities (the upper bound and the lower bound of the DR)are two physical limits that limit the performance of every camera sensor.The absolute values for these quantities depend on a number of things,in particular, pixel characteristics (such dimensions, quantum efficiency,etc.), the DR vary from sensor to sensor. A broader discussion on theseaspects of the DR limit is beyond the scope of this thesis.403.1.3 Image dynamic range (DR): an indicator of image fidelityThe DR of an image sensor indicates the “size” of the “range” of intensity values asensor pixel can faithfully capture in one exposure. More accurately, it is not thesize of the range, but a ratio: The DR of an image is formally defined as the ratio ofthe brightest pixel value to the smallest nonzero pixel value or the just noticeabledifference [100]. Since image acquisition techniques inherently incur noise, thesmallest pixel value does not always give a reasonable DR measurement. Instead,the smallest pixel value greater than the expected noise may be used as the smallestnonzero pixel value [100].The term “dynamic range (DR)” was initially used in audio research, wherethe term dynamic is more pertinent since the range dynamically changes over time.In imaging, however, the qualifier dynamic does not apply in the same manner,since the signal does not vary temporally, but rather spatially. (It may be notedthat videos would have a time varying signal, but we discuss DR in the context ofa single image.) In a still image, the brightest point may be found right beside thedarkest point—and a dynamic range would need the full range of values needed toencode both the brightest point and the darkest point.When high dynamic range (HDR) is mentioned in the context of imaging, thefollowing aspects are loosely referred to:a) High contrast ratio: The details of bright areas need to be captured, as wellas the details in dark regions. Colors should be vivid: black should appearblack (and not grey), and bright objects should not look dim. When viewingan HDR image on an HDR display, a viewer should have the same visualexperience as they would have by being at the scene.b) High bit-depth: Bit-depth (also known as pixel-depth) should be highenough for encoding values with many quantization levels. More accurately,intensity granularity should be finer than the just noticeable difference (thehuman visual system (HVS) cannot see intensity differences smaller than thisvalue, and therefore capturing further granularity does not add value to a41human observer). Without this fine granularity, images would show artifactsthat look like discrete intensity steps in a smooth color/intensity gradient. Itmay be noted that the HVS has a response curve that is close to a logarithmicresponse curve, and this results in an intensity-dependent just noticeabledifference (JND): the higher the intensity of an image area, the less sensitiveis the HVS. To exploit this aspect of the HVS quantization levels at lowerintensities should have a higher granularity than those at higher intensitylevels [72].c) High fidelity: To preserve the details in the high-intensity regions of animage there should be little to no clipping, as clipping causes bright patchesto appear flat and lacking detail in the captured image. Moreover, in caseof color images, a wrong color can result. For example, consider a brightorange object (say with a 2 : 1 red-to-green ratio in reality) where onlythe red channel saturates. As a result, the object appears in the image witha yellow color (1 : 1 red-to-green ratio). This unwanted color change issometimes referred to as color desaturation. Likewise, to preserve the detailsin the low-intensity regions of an image, there should be little to no clippingdue to under-exposure. Regions with low light such as a shade area light suchas shade can be too dark for proper exposure, and such image areas wouldlook noisy. Clipping and noise are artifacts due to limited DR. Therefore anHDR image should not exhibit such limitations.3.1.3.1 Conventional multi-exposure high dynamic range imaging (HDRI)Since DR is a ratio, it is better understood on the log-intensity scale. On the log-intensity scale, the DR represents a fixed-width range between the highest andlowest intensities a camera can capture given the exposure settings being used.Varying the exposure settings “slides” this range up and down the log-intensityscale (Figure 3.7). Most natural outdoor scenes require a higher DR than that ofthe conventional camera. Under the DR limit, in a single exposure, a camera can7Source: http://pfstools.sourceforge.net/hdrhtml.html [accessed: 2018-02-24]42Figure 3.7: DR is a fixed-width window; when exposure settings arechanged, this window slides up and down along the log-intensity scale.Three SDR virtual exposures of the same HDR image is shown above7.Virtual exposures are simulated standard dynamic range (SDR) exposuresof an HDR image (see Section 3.2 for details). The HDR image is notshown above; only the virtual exposures are shown. The full histogram(number of pixels vs. log-intensity) of the latent HDR image is shownin green on top of all three SDR images. Each SDR virtual exposurecaptures only a part of the full histogram. The three virtual exposuressimulate three different exposure settings that an SDR camera could takeimages with. Each SDR image can only capture a part of the full histogram(intensity range); the absolute range of intensities each SDR image is ableto show is marked with blue overlays over each histogram.capture only a part of the full scene DR. One can select the exposure settings sothat as many pixels as possible can be captured faithfully (i.e., free of clipping andfree of poor signal-to-noise ratio (SNR)). Choosing such optimum exposure settingsreduces clipping and noise but cannot completely remove them.Recognizing the DR limit, most cameras provide a feature called “exposurebracketing” to give the photographer an opportunity to capture more scene detailsthan a single exposure can capture. The idea is as follows: DR is fixed for anyexposure; but if one takes multiple photos of the same scene, one can choose adifferent set of exposure settings for each photograph so that every scene detail iscaptured by one of the photos. The effective combined DR of these photos—an43exposure sequence—is much greater than the camera DR. In other words, thesephotos together would contain all the information in the scene. A post-processingstep would then combine all these photos to create an HDR image. To avoidghosting in the final HDR image due to moving objects, the exposure bracketingmode takes 3, 5 or 7 photos in rapid succession. However, in many real-lifesituations, exposure bracketing is not suitable despite the rapid successive exposuresince multi-exposure HDRI works well only for static scenes. For many real-lifesituations, the scene would have many fast moving objects, or a tripod cannot beused to stabilize the camera. In both of these cases, exposure bracketing would berendered as useless.3.1.3.2 We propose computational single-image high dynamic rangeimaging (HDRI)In this thesis, we propose new methods to capture high dynamic range (HDR)images from a single standard dynamic range (SDR) capture only. We do not requirea static scene or a steady camera.Our key insight is that dynamic range (DR) can be improved via processing thewhole image instead of pixel-by-pixel processing. Individual pixels have a fixedDR; in isolation, a pixel DR cannot be improved. However, in the methods wepropose in this thesis, we process an entire image at one time. We employ variousprior models about natural images and improve the image DR. We give a high-level,generalized language for our problem definition in the next chapter.How DR is defined tells us how to improve it. Paraphrasing the definition setforth at the beginning of this section, the camera DR is a “range” (more accurately,a ratio) limited by the following two quantities (for some given exposure settings):a) the highest intensity a camera pixel can register without saturating andgetting clipped (which we call the clipping ceiling) and44b) the lowest intensity a camera pixel can register with an expected signal-to-noise ratio (SNR) of 1 : 1, i.e., not less than the expected noise (which wecall the noise floor) (Figure 3.6).Improving either end increases the DR. In this thesis, in some of our proposedmethods we restore the clipped pixel values; i.e., the clipping ceiling and thus theDR is increased. In the other methods, we perform single-image denoising in thecontext of high dynamic range imaging (HDRI). Here the noise floor is lowered,and, as a result, the DR increases.3.1.3.3 Thought experiments: how image dynamic range (DR) indicatesimage fidelityIt is worth clarifying that the dynamic range (DR) is not exactly the same as the“pixel depth” (the number of discrete intensity levels). The definition of DR isrooted in the concept of peak signal-to-noise ratio (SNR). DR is the measurement offidelity; it is similar to the PSNR between a captured (or reconstructed) image andthe uncorrupted latent image.Digital cameras usually produce images where image pixels have intensityvalues expressed on a linear intensity scale and quantized to some preset pixeldepth. In the absence of noise, this pixel depth is (by definition) the same as theratio of highest to lowest pixel values, i.e., the DR. The noise variance for anycamera is high enough, so that the resulting noise is not negligible and as a result,there is an amount of uncertainty in the accuracy of the measured (noisy) imagepixel values. To reflect this uncertainty, DR is defined to be the ratio of the highestpixel value to the standard deviation of noise.For example, let us assume a noise standard deviation of 2. A measuredand noisy pixel intensity of 20 then would correspond to a latent noise-free valuethat is “close to” 20. The latent noise-free intensity could be 20, but there is afinite probability that the noise-free value is 19 instead. Similarly, a measuredvalue of 1 could originally be 0 or 2. Evidently, the “smallest pixel value” is45certainly not the smallest latent pixel value under the noise assumption. Similarly,the “number of discrete intensity levels” in under this noise assumption becomesa fuzzy concept; every measured pixel intensity corresponds to a latent intensityvalue that lies within a certain range defined by the noise variance. Two pixelswith the same measured intensity value might actually represent two different latentintensity values; also, two pixels with two different measured intensity values mightrepresent the same latent intensity value corrupted by noise differently. Clearly,pixel depth is a combination of information and noise, and it does not representfidelity (the effective amount of information per pixel) of an image. On the otherhand, DR represents fidelity.As we shall examine more rigorously in the next chapter, since photon arrivalis a Poisson process, sensor noise is largely dominated by Poisson noise, and themean of this type of noise is equal to the standard deviation. Consequently, cameraDR is defined as the ratio of the clipping level (the highest pixel value) to the noiselevel (or more accurately, the noise standard deviation).First, the thought experiments. Let us assume that some latent image f haspixel intensities taking integer values 1, 2, . . . , 256, with some negligible noisepresent. The DR of this image is 8 bits. (We ignore the intensity level 0 for clarityof exposition) Now, we compute some other image g by applying a pixel-by-pixeloperation on f . For each of the operations listed below, what would the DR be:1) We clip all values 128, 129, . . . , 256 to 128, i.e., g = min(f, 128).2) We replace the least significant bit (LSB) with random noise η ∈ {0, 1}.3) We add zero-mean Gaussian noise with standard deviation 1, i.e., g = f + ηwhere η ∼ N (0, 1).4) We add zero-mean Gaussian noise with standard deviation 2, i.e., g = f+2ηwhere η ∼ N (0, 1).5) We divide all values by 2 and then convert them to integers. This is similarto quantizing the original 8 bit image data in f down to 7 bits: g = df2 e.466) We multiply each pixel value by 2 and add noise η in the LSB, i.e., η ∈ {0, 1}.Then g = 2f + η has 9 bits of intensity value at each pixel. There are 512different intensity levels in g.7) We convolve the image with a 2×2 box filter, i.e., every pixel in f is replacedwith the average intensity of its 2× 2 neighborhood:g(i, j) =f(i, j) + f(i+ 1, j) + f(i, j + 1) + f(i+ 1, j + 1)4(3.5)8) As above, but additionally we quantize the resulting image back to 8 bits,(i.e., pixel values in g are 1, 2, . . . , 256) in the end.9) We “linearly downsample” the image: every non-overlapping 2× 2 block ofpixels is averaged to produce one pixel of the downsampled image.g(i, j) =f(2i, 2j) + f(2i+ 1, 2j) + f(2i, 2j + 1) + f(2i+ 1, 2j + 1)4(3.6)It may be noted that this is not a per-pixel operation because of thesubsampling.The reader is asked to mentally calculate the resulting DR of g in each casebefore proceeding which may be compared with the results provided below:1) The resulting image g only has 128 different intensity levels: 1, 2, . . . , 128.The new clipping level is 128. All values above 128 are irreversibly lost.Resulting DR is 128 : 1 or 7 bits.2) The number of intensity levels has not changed; it is still 256. However, theimage fidelity has decreased. The smallest noise-free intensity value now is2. The resulting DR is 256 : 2 or 128 : 1, i.e., 7 bits.3) Adding noise does reduce fidelity, but in this case, noise is not dominatingover the smallest pixel value. Despite adding the noise we do not see asignificant reduction in the fidelity. The DR of g is still 256 : 1 or 8 bits.474) In this case, however, noise is dominant, which lowers the DR. The resultingDR of g is 128 : 1 or 7 bits.5) The resulting image g only has 128 different intensity levels: 1, 2, . . . , 128.Clearly, the new DR is only 7 bits. We have lost the ability to differentiatebetween the latent values such as 5 and 6, they both would be quantized downto the same number: 3. It may be noted that this also shows that quantizationacts as a kind of noise; which is why this result is the same as the first oneabove. In other words, we could write the quantization process simply asadding a noise term, and the effective result will be the same: g = f + ηqwhere ηq denotes a noise term. This argument here can help generalize thedefinition of DR: even in the noise-free situation, the “smallest pixel value” isnothing but the standard deviation of quantization noise! In other words, thelow end of the DR is always some form of noise, whichever is dominating:quantization noise or the additive Gaussian noise.6) Even though the number of intensity levels has doubled in g compared tof , the information content of the image has not. The fidelity of the imageremains unchanged, so should the DR. In other words, it would be wrong tosay that the DR of g is 512 : 1. Instead, if we take the smallest noise-freevalue, which is 2, we obtain the right answer: 512 : 2 ≡ 256 : 1.7) The number of intensity levels has quadrupled per-pixel. While the bit-depthin f was 8, it is now 10 in g. Even though the image does not appearsharp any more, it still contains the full image information (assuming someboundary condition for the convolution operation, for example assumingevery pixel outside the image is zero-intensity). We analyse why noinformation is lost using two approaches:• Intuitively, each pixel in g is a linear combination of 4 pixels in f ;therefore to obtain f , we can invert the convolution (“deconvolve”) bysolving the system of linear equations (3.5), which is well-posed sinceit has as many equations as unknowns.48• From the signal-processing angle, the convolution kernel ρ in questionis a 2 × 2 box filter, which does not have zeros at itergral locations inthe Fourier domain. This implies that the application of this filter doesnot destroy any frequencies, or in other words full information of f ispreserved in g. Since the box filter does not have any zeros, invertingthe application of the filter is as simple as a pointwise division in theFourier domain:g = ρ⊗ f (3.7)=⇒ f = F−1 (F (f) /F (ρ)) (3.8)where F (·) denotes the discrete Fourier transform.Interestingly, the total amount of image information cannot increase(intuitively, since we cannot deduce anything new about the scene), eventhough technically the DR (i.e., the amount of information per-pixel) hasincreased. This is because neighboring pixels are now correlated.8) As above, except now we perform quantization in the end which is anoninvertible process. The original image f cannot be obtained from g. TheDR of g is 8 bits, which is the same as that of f . Since the pixel intensitiesare now correlated, g contains less information than f even though they havethe same number of bits.This operation is closer to what we mean when we use the operation of“blurring” an image colloquially and in a less rigorous way; on contrary,the more technical process of convolution (mentioned in the previous pointabove) does not include a quantization.9) As in the convolution case, the number of intensity levels has quadrupledper-pixel. While the bit-depth in f was 8, it is now 10 in g. It may be notedthat even though the DR has increased, the total amount of information hasdecreased, because we obtained one-fourth as many pixels in the processof downsampling. Downsampling is an effective way to increase pixel DRat the cost of image fidelity. As an intersting side note, this is also why the49same image appears sharper and less noisy when viewed on a smaller displaycompared to that displayed on a larger display.As a disclaimer, for demonstration purposes in this section, we have used asmall range of intensity values 1, 2, . . . , 256, whereas in reality, many cameras cannow capture up to 14 f/stops or 14 bits per pixel.Furthermore, it may be noted that the DR of an image is specified up-to ascale. This means that Pixel values can be represented as integral values or asreal numbers. Regardless of how the pixel values are represented, the DR willalways be the same. For example, both the range of pixel values 1, 2, . . . , 256 andthe range 0.001, 0.002, . . . , 0.256 have the same DR. In this thesis, without loss ofgenerality, we always assume the captured image pixel values to be in a range (0, 1]to simplify the discussion. In other words, we assume the captured pixel values arenon-negative and are bounded by a clipping level of 1.3.2 Displaying high dynamic range (HDR) imagesA discussion on high dynamic range imaging (HDRI) is not complete withoutmentioning how to display an HDR image. In this thesis, we present all HDR imagesby showing two different standard dynamic range (SDR) renderings (or, simulatedexposures) of the same HDR image. We refer to these two images as simulated longexposure and simulated short exposure. There is a lot of research and engineeringeffort regarding the display side of HDR imaging which is beyond the scope ofthis thesis. However, below we provide a brief discussion on displaying HDRimages using two or more SDR images; this discussion is relevant to displayingour reconstructed HDR images.HDR images (by definition) cannot be directly viewed on an SDR display. Tobe accurate, the dynamic range (DR) of the display has to at least match, or be betterthan, the DR of the image to be displayed. Consequently, conventional displays andprint media are unable to show HDR images; only SDR images can be shown.50However, a transformation known as tonemapping allows some vivid contenton HDR images to be displayed using SDR images. Tonemapping is a nonlineartransformation applied to the pixel intensities of an HDR image. A simple exampleis the application of a gamma function to “bring out” the detail in the dark areasof an image and to “compress” detail in the bright areas. Most tonemappingoperators are more complex than a gamma function; they try to retain the contrastwhile “compressing” or scaling the HDR into an SDR for viewing content on anSDR display. This is to retain the aesthetic value (i.e., scene details) of the HDRimage without requiring the displays to be compatible. A detail discussion ontonemapping is beyond the scope of the chapter.However, tonemapping can accidentally hide errors in the HDRI process.Therefore we use two simulated exposures to present our results so that any artifactsfrom the application of our method is clearly visible.As stated above, we present all HDR images by showing two different SDRrenderings (or, simulated or virtual exposures) of the same HDR image. We referto these two images as simulated long exposure and simulated short exposure.Assuming that the SDR image data is scaled between 0 and 1, the clipping levelis 1 as before. (It may be noted that reconstructed HDR images can have intensitiesgreater than 1.) To show some HDR image f , we render two images, one forshowing details in the shades (dark areas) and the other one the highlights (brightareas):1) A “simulated (or virtual) long exposure” refers to when a camera shutter isopen for a longer period. In this case, the darker areas appear with moredetails and the bright areas are clipped,fLONG ≡ min(1, f). (3.9)As stated before, min(1, ·) models the clipping-due-to-saturation. Relativelydarker areas (shades) in the scene have enough time to expose the sensorproperly and appear well-exposed in the captured image. On the other hand,the bright areas (highlights) in the scene are over-exposed.51This SDR rendering serves two purposes: it would be useful for showing,• what an SDR camera can observe in a single-exposure, and• whether our reconstruction algorithm is working fine on shades (i.e.,over the darker scene areas).2) A “simulated (or virtual) short exposure” that simulates what happens whena camera shutter is open for a short period. In this case, the details in thebright areas appear more clearly. A short virtual exposure is achieved bynormalizing the image by the maximum pixel intensity value,fSHORT ≡ min(1,f‖f‖∞), (3.10)where ‖ · ‖∞ is the infinity norm or maximum norm that simply returns themaximum element in a vector,‖f‖∞ ≡ maxifi. (3.11)In this case, the highlights are within the clipping level, but the midtones andshades appear dark and noisy, and lacks scene details. This image is usedto demonstrate details in the highlights; for example, these images would beused to present the quality of a highlight reconstruction technique.3.3 Problem statementIn this section, we develop a mathematical framework for the general single-image HDRI problem. We first formalize the general forward imagingprocess (Section 3.3.1). Then we argue that it is an underdeterminedproblem (Section 3.3.2); we opt for a maximum-a-posteriori estimate andderive the corresponding minimization problem under Gaussian noise assumption(Section 3.3.3).523.3.1 Forward process: how an image is acquired by an SDR cameraFrom the discussion on the imaging process and the noise model (Section 3.1.2.3),we can derive an image formation model, also known as the forward model,g→ captured SDR image≡ Q→ quantization (analog-to-digital conversion)(min→ clipping due to sensor saturation(1→ clipping level is assumed to be 1,Eo→ sensitivity-multiplexing(ρ→ anti-aliasing filter⊗ Ei→ aperture-filtering(f→ unknown HDR image)) + η→ noise))(3.12)Here, the observed image g is an image with pixel values lying within 0 and theclipping level (without loss of generality, we normalize the captured image so thatcaptured intensities are between 0 and 1, which sets the clipping level at 1). Thepixels are organized in h rows and w columns (g ∈ [0, 1]w×h).For the sake of convenience in computational optimization, we represent allimages in a vectorized form; i.e., g becomes as a n-vector: g ∈ [0, 1]n with thenumber of pixels n being a product of the width w and the height h: n = wh. Wediscuss this vectorization with an example in Section 3.4.1 and in Figure 3.8.The unknown latent HDR image f , on the other hand, has N pixels, each pixelhaving a non-negative intensity value (since it is the latent image, these numbersare not clipped at the clipping level which we assume = 1 in this discussion). Inother words,f ∈ Ω with Ω ≡ R+N . (3.13)53It may be noted that in most cases N = n. However, for certain problems, it maybe the case thatN ≥ n, (3.14)since the forward process could result in a reduction of the number of pixels (e.g.,in case of the super-resolution (SR) problem, the captured image is assumed to havea fewer pixels than the latent image).The symbol Ei represents some optical encoding (i.e., transformation orfiltering) we introduce to a camera (e.g., by attaching a photographic filter in frontof the camera lens)Ei : Ω→ Ω. (3.15)An optical filter is introduced at the aperture and it convolves with the latent image,which is why the optical filter is the same as an aperture filter. If no optical encodingis applied, Ei is just the identity operation, i.e., Ei(f) ≡ f . We assume here that thesignal has been anti-aliased in hardware (before its capture) with a low-pass filter ρ(Section 3.1.2.2). The symbol Eo represents some sensor filtering or encoding,Eo : Ω→ Γ, (3.16)which may add to the sensor as an optical element (for example, the Bayerpattern [14]) or electronically (for example, exposure-multiplexed imaging, aspresented in Chapter 5). This operator modifies the pixels and therefore is a spatialfilter.Q is the quantization operator. η ∼ N (0, b) is additive white Gaussian noise(AWGN) (Section 3.1.2.3).The various aperture/optical filter Ei and sensor filter Eo used in different partsof the thesis is summarized in Table 3.1.54Table 3.1: A summary of aperture and sensor filters used in variousmethods presented in this thesisMethod Section Optical filter,EiSensor filter,Eoaperture-filtering Section 4.1 cross-screenfiltersensitivity-multiplexingand non-local self-similaritySection 5.5 exposure-multiplexingsensitivity-multiplexingfor SRSection 5.6 exposure-multiplexingsensitivity-multiplexingfor single-image HDRISection 5.7 exposure-multiplexingGradient domain colorrestorationSection 6.1Patch-correspondencesfor single-image DRexpansionSection 6.2Retrieving informationlost by image denoisingSection 6.33.3.2 Inverting the forward process: obtaining the latent highdynamic range (HDR) image from a standard dynamic range(SDR) imageQuantization is in effect another form of noise [42], and therefore, we absorb theeffect of quantization into the noise term.Furthermore, the effect of clipping is a loss of data. In other words, the clippedpixels cannot be expressed as a constraint in the missing-value problem that issingle-image high dynamic range imaging (HDRI). In the optimization formulation,we simply ignore the matrix rows corresponding to clipped pixels.Now we use M : Ω → Γ to denote the noise-free forward operation on theunknown ground truth image f that includes the full optical system including anti-55aliasing but does not include noise and quantization,g =M(f) + η. (3.17)While this is an oversimplified forward model, it is sufficient for the discussion thatfollows in this section. In subsequent chapters, we describe the forward model thatis specific to each of those problems.3.3.3 A Bayesian framework for solving the inverse problemFrom the definition of additive white Gaussian noise (AWGN), we get,Pr (η) =1b√2piexp[−(η − 0)22b2]. (3.18)Then we can derive the likelihood term, i.e., the likelihood of f given g, L (f | g),which is (by definition) the probability of g given f , Pr (g | f),L(f | g) = Pr (g | f) = Pr (g | M(f)) Pr (M(f) | f) (3.19)wherePr (g | M(f)) = Pr (g −M(f) | M(f)) Pr (M(f) | f) (3.20)= Pr (η | M(f)) , [from (3.17)] (3.21)= Pr (η) , [since η is assumed independent ofM(f)] (3.22)=1b√2piexp[−(η − 0)22b2], [from (3.18)] (3.23)=1b√2piexp[−(g −M(f))22b2], [from (3.17)], (3.24)∝ exp[−‖g −M(f)‖22b2](3.25)or, Pr (g | f) ∝ exp[−‖g −M(f)‖22b2]Pr (M(f) | f) . (3.26)56Table 3.2: A summary of natural image priors used in various methodspresented in this thesisMethod Section Previouslyknown priorOur proposedprioraperture-filtering Section 4.1 Sparsegradientssensitivity-multiplexingand non-local self-similaritySection 5.5 non-local self-similaritysensitivity-multiplexingfor super-resolution (SR)Section 5.6 sparsegradientsmoothcontoursensitivity-multiplexingfor single-image HDRISection 5.7 sparsegradientsmoothcontourGradient domain colorrestorationSection 6.1 cross-color-channelcorrelationPatch-correspondencesfor single-imagedynamic range (DR)expansionSection 6.2 cross-color-channelcorrelationand sparsegradientintensity-invariantnon-localself-similarityRetrieving informationlost by image denoisingSection 6.3 cross-color-channelcorrelationThe maximum-likelihood estimator (MLE) provides an estimate,fML = arg maxfPr (g | f) . (3.27)However, a better estimate would be the maximum-a-posteriori (MAP) estimatewhich we can obtain by applying the Bayes’ rule at this point. We note that thePr (g | f) is the likelihood of the observed data g given the model f , whereasPr (f | g) is the posterior probability. According to the Bayes’ rule, the posteriorprobability of the of the model given the observation g can be calculated from the57likelihood and some prior probability:posterior ∝ likelihood× prior (3.28)≡ Pr (f | g) ∝ Pr (g | f) Pr (f) . (3.29)Now, the maximum-a-posteriori or MAP estimate of f is,fˆ = arg maxfPr (f | g) (3.30)= arg maxfPr (g | f) Pr (f) , [applying the Bayes’ rule (3.29)] (3.31)= arg maxflog (Pr (g | f) Pr (f)) , [since probability is non-negative](3.32)= arg maxflog Pr (g | M(f))︸ ︷︷ ︸log likelihood+ log Pr (M(f) | f) + log Pr (f) (3.33)= arg maxflog exp[−‖g −M(f)‖22b2]︸ ︷︷ ︸from (3.25)+ log Pr (M(f) | f) + log Pr (f)(3.34)= arg minf‖g −M(f)‖2 − log Pr (M(f) | f)− log Pr (f) (3.35)= arg minf‖g −M(f)‖2︸ ︷︷ ︸data-fitting term+λ1 model_prior(M, f) + λ2 image_prior(f)︸ ︷︷ ︸regularization terms.(3.36)3.4 Our computational optimization frameworkAs long as the terms to minimize are all `2, we can use algorithms for solvinglinear least squares such as gradient descent or the conjugate gradient method.However, most of our problems incorporate image priors (such as sparse gradientprior) which are `p for p ≥ 1. The image priors are still convex but they do nothave a closed form solution like `2 terms. To minimize these `p terms, we need58to employ sophisticated optimization techniques. In this section, we discuss thevarious computational optimization techniques we use in this thesis.3.4.1 Vectorized representation of images for computationFor clarity of exposition, we use two-dimensional (2D) multichannel images andthe vectorized forms interchangeably in this thesis. It should be evident from thecontext whether we are using an image in the 2D image format or the vectorizedform. For example when a norm is applied, typically it means to vectorize the 2Dimage first and then to apply the norm.Linear image operations are typically expressed as operations on the image.When we are computing a gradient such as ∇f or performing a convolution withsome kernel ρ, such as K ⊗ f , we are performing 2D image operations. A smallexample in Figure 3.8 illustrates this process of vectorization.It may be noted that we make sure that this does not violate any linearityof matrix vector operations, and we explicitly mention when such an operationcould violate linearity assumptions. As long as linearity is maintained, any imageoperation such as convolution etc. can be presented in a matrix-vector operationformat, or in a traditional approach, and both operations yield the same result.3.4.2 Computing the data-fitting termThe data-fitting term in (3.36) is a general inverse problem which can be solvedwith computational linear algebra. However, we note that the structure of theforward modelM (Section 3.3.2) largely depends on the forward process.In computational optimization, we vectorize all the images involved (theknown image g and the unknown HDR image f ). We also express the forwardmodel M as a matrix (It may be noted that this is possible because theoptical components that the forward modelM represents only linearly transformsnon-monochromatic light). Then the forward model M(f) is converted to a59ρ0,0ρ0,1ρ0,2ρ1,0ρ1,1ρ1,2ρ2,0ρ2,1ρ2,2f0,0f0,1f0,2f0,3f0.4f1,0f1,1f1,2f1,3f1.4f2,0f2,1f2,2f2,3f2.4f3,0f3,1f3,2f3,3f3,4f4,0f4,1f4,2f4,3f4,4ρ1,0ρ1,1ρ1,2 ρ1,0ρ1,1ρ1,2 ρ1,0ρ1,1ρ1,2 ρ1,0ρ1,1ρ1,2ρ1,1ρ0,0ρ0,1ρ0,2 ρ0,0ρ0,1ρ0,2 ρ0,0ρ0,1ρ0,2 ρ0,0ρ0,1ρ0,2ρ0,1ρ2,0ρ2,1ρ2,2 ρ2,0ρ2,1ρ2,2 ρ2,0ρ2,1ρ2,2 ρ2,0ρ2,1ρ2,2ρ2,1ρ1,0ρ1,1ρ1,2 ρ1,0ρ1,1ρ1,2 ρ1,0ρ1,1ρ1,2 ρ1,0ρ1,1ρ1,2ρ1,1ρ0,0ρ0,1ρ0,2 ρ0,0ρ0,1ρ0,2 ρ0,0ρ0,1ρ0,2 ρ0,0ρ0,1ρ0,2ρ0,1ρ1,0ρ1,1ρ1,2 ρ1,0ρ1,1ρ1,2 ρ1,0ρ1,1ρ1,2 ρ1,0ρ1,1ρ1,2ρ1,1ρ2,0ρ2,1ρ2,2 ρ2,0ρ2,1ρ2,2 ρ2,0ρ2,1ρ2,2 ρ2,0ρ2,1ρ2,2ρ2,1ρ1,0ρ1,1ρ1,2 ρ1,0ρ1,1ρ1,2 ρ1,0ρ1,1ρ1,2 ρ1,0ρ1,1ρ1,2ρ1,1ρ0,2 ρ0,0ρ0,1ρ1,2 ρ1,0ρ1,1ρ2,2 ρ2,0ρ2,1f0,0f0,1f0,2f0,3f0.4f1,0f1,1f1,2f1,3f1.4f4,0f4,1f4,2f4,3f4,425× 125× 25(a) Image operation example : convolution(b) Matrix-vector representation of the image operationFigure 3.8: Representing an image formation model as a matrix-vectormultiplication. (a) A convolution image operation is shown; a 5 × 5image f is convolved with the 3 × 3 kernel ρ. (b) The same operation isshown in matrix-vector representation. The image is row-major unrolledinto a 25 × 1 vector as shown on the right. The matrix M on the leftrepresents convolution; i.e., it has 25 columns (one of each element in theimage f ) and 25 rows (one for each element of the resulting convolvedimage). M can be interpreted two ways: first, it could be interpreted asa collection of column vectors, where the i-th column of M describesa point spread function (how i-th pixel in f is being “spread” out overthe output image). Then the resulting image M·f is a sum over all the“spreads”. Conversely, M can also interpreted as a collection of rowvectors; in which case the j-th element of the resulting image M·f isthe dot product of j-th row and f . In other words, the j-th row of Mdescribes a the weights for the weighted sum of all pixels in f . 60multiplication between the forward model matrixM and the vectorized image f :M·f . While in most cases the forward model matrixM is a sparse matrix, it canget dense as well (depending on the operations involved) and get slow to compute.Even though while studying the mathematical derivations, we treat M as amatrix and the images g and f as vectors, in practice we do not always explicitlyperform the matrix-vector product. In other words, every time a matrix-vectorproduct is required by the optimization method (see for example Algorithm 3.2),we just apply the image operationsM represents (3.12).For example, let us assume M performs a convolution with some kernel ρ,i.e.,M·f ≡ M(f) ≡ ρ ⊗ f . Such a matrixM would still be sparse, but directlycomputingM would still require much computation. Direct computation of suchlarge matrix-vector products is compute intensive; however, when we combine thiswith image processing techniques, this becomes less compute intensive. In thisparticular example, instead of explicit computation of the convolution matrix Mand multiplying it with an image vector f , we use the direct result of convolvingthe image f with the kernel ρ. Computing ρ⊗ f directly is faster since we can takeadvantage of the Fourier domain. 8Similarly, when computing gradients, instead of computing the matrices andperforming explicit matrix-vector multiplication, we compute image gradientsdirectly from the image: by subtracting pairs of pixel intensities.In general, a matrix-vector representation is better for a succinct representationof the problem statement and the derivation. But for an improved performance,when we implement the algorithm we perform matrix-vector computationsimplicitly: we perform basic image operations that these matrix-vector termsrepresent.8The computational complexity of a direct convolution ρ ⊗ f is O(nm) where n and m arethe numbers of pixels in the image f and the kernel ρ respectively. For small kernels, this isfast enough, but for large kernels point-wise multiplication of Fourier coefficients i.e., ρ ⊗ f ≡F−1 (F (ρ) ·F (f)) is faster. The computational complexity of obtaining the Fourier transformF (f) using a fast Fourier transform algorithm is O(n logn).613.4.3 The Rudin-Oshar-Fatemi (ROF) model: utilizing edge sparsityof natural imagesTo demonstrate how we develop a solution to specific instances of the generalinverse problem (3.36), we first discuss the ROF model due to Rudin et al. [121],which a model proposed for inverse problems, in particular, image denoising.It may be noted that the ROF model directly follows from our general inverseproblem (3.36): by setting the forward model M as the identity operation, i.e.,M(f) ≡ f and using the sparse gradient prior [80, 106] (also known as the totalvariation prior in the literature).The ROF model is the following:arg minf‖f − g‖2︸ ︷︷ ︸data-fitting term+ λ ‖∇f‖TV︸ ︷︷ ︸sparse gradient prior, (3.37)where λ ∈ R+ is the regularization coefficient, and ‖·‖TV ≡∑ ‖·‖ is the totalvariation norm: linear sum of point-wise `2 norms. For example, the total variationnorm of a gradient field∇f is a point-wise sum of gradient magnitudes ‖∇f‖:‖∇f‖TV ≡∑i‖∇fi‖ =∑i√(∇xfi)2 + (∇yfi)2. (3.38)While the original ROF model performs denoising only, this model can easilybe adopted for other problems: The data-fitting term in the ROF model canbe modified to represent any forward model in the form of the general inverseproblem (3.36). Furthermore, more image priors can be added to the ROFmodel (3.37) to make it more robust.3.4.4 Deriving and applying Image priorsSpecific problems call for specific priors. While natural image priors capture theunderlying models of natural images, such as the sparse gradient prior, some other62image priors result from the specific forward models or the imaging scenario weuse. In Table 3.2 we provide a list of novel priors we have proposed in this thesis.While some of this priors are more general purpose, many of the image priorsare specific to the particular inverse problems: we study the properties of thoseproblems and come up with image priors that are specific for that problem. Thetable also refers to relevant sections for further reading.3.4.5 Computational optimization for solving the ROF model:iterative convex solvers for the mixture of `1 and `2 priorsThe basic ROF model is simpler than, and therefore can be considered as thebasis of, all the image formation models we use in subsequent chapters. In thissection, we discuss how to use computational optimization for solving the basicROF model. The models that we propose in subsequent chapters incorporate morecomplex forward models and image priors, but the approach of how to formulate theoptimization problem remains the same. The example in this section can thereforeserve as a foundation for the derivations we perform in subsequent chapters.Optimization problems with only `2 terms often have a closed-form solutionwhich is easy to compute. What makes the ROF model (3.37) not easy to solveis that one of the two terms is `1. The data-fitting term by itself is a simpler `2optimization problem, which has a closed-form solution. When the `1 norm existsin the minimization objective, a direct, closed-form solution cannot be obtainedvia mathematical manipulation. We need computational linear algebra to solve thisproblem numerically.It may be noted that even with the presence of the `1 term, the ROFmodel (3.37) is still a convex problem. This means that at least one solution isguaranteed to exist, and a numerical optimization algorithm is guaranteed to reacha solution.All numerical solutions to such formulations are iterative, such as by startingwith some arbitrary guess and using gradient descent. Since the solution space is63convex, we are guaranteed to reach the solution for the ROF model. How exactlythe iterations progress through the space towards the solution varies. Dependingon the strategy chosen, convergence can be fast or slow. In the following sections,we shall first look at a direct iterative solution to the ROF model, and then we willreformulate the problem to separate the `1 and `2 priors for a more robust approachtowards a solution.3.4.6 Iterative reweighted least squares (IRLS) solutionIterative reweighted least squares (IRLS) algorithm re-weights the non-`2 term topose a series of `2 problems; as IRLS iterates over these `2 problems it convergesto the mixed-prior problem. As the “reweighted” term on the name implies, it firstrewrites a `p term into a `2 term divided by a `2−p term: ‖ · ‖p ≡ ‖·‖2‖·‖2−p . Asthe “iterative” term on the names implies, IRLS solves the problem iteratively. Inevery iteration, the values in the denominator remains fixed to the value obtainedusing the estimate from the previous iteration. Then the unknown variables liveonly on the numerator. Since the numerator is a `2 term, each individual iterationthen becomes an `2 optimization problem which can be solved using any convexoptimization technique. When the solution converges, the values for the unknownsstop changing much from iteration to iteration any more. Algorithm 3.1 lists apseudocode of this method.Algorithm 3.1 iterative reweighted least squares (IRLS)Require: λ > 01: initialization: f (0) ← some reasonable initial guess2: repeat . “iterative”3: f (k+1) = arg minf ‖f − g‖2 +λ∣∣∇f (k)∣∣‖∇f‖2 . “reweighted leastsquares”4: until convergenceEven though IRLS looks promising in theory, IRLS does not necessarilyconverge to the best solution in practice. Even though each iteration now is `2 and64therefore guaranteed to converge, the overall problem is not guaranteed to converge.Furthermore, the convergence can be very slow.Instead of using IRLS, we can reformulate the problem so that the normsare defined over two tightly coupled spaces; namely a saddle point formulation,which then be solved using a much more efficient primal-dual convex optimizationalgorithm [18], we describe next.3.4.7 Saddle point formulation and solution using primal-dualconvex optimizationFor the sake of simplicity in exposition, let us rewrite the ROF model (3.37).We note that our underlying space is the real Hilbert space H, with the innerproduct 〈·, ·〉 and the induced norm ‖·‖; i.e., for all f, g ∈ H,〈f, g〉 =∑ifigi (3.39)‖f‖ =√∑if2i (3.40)We also denote Γ0 as the set of all functions F : H → R ∪ {∞} that are closed,proper and convex. All the functions below will belong to this set.We denote the data-fitting term by G(f), where G ∈ Γ0.For convex terms that are not `2, e.g., the `1 prior , the rewriting in donein two steps. Let K : Ω → Ω∗ denote a linear operator or a space transformthat maps from the “primal” or image space : Ω to a dual space we denote by: Ω∗. For example, in the ROF model the prior term is the total variation (TV) normterm, and the corresponding dual domain is the image gradient space. Next we letF : Ω∗ → R ∪ {∞}, F ∈ Γ0 denote the norm. We get,arg minfG(f) + F(K(f)). (3.41)65Let F∗ denote the Fenchel conjugate, or the “dual” of a function F ∈ Γ0,F∗(w) = supf〈w, f〉 − F(f). (3.42)In case of convex functions, in can be shown that,F(f) = F∗∗(f) = supf〈w, f〉 − F∗(w). (3.43)Applying this in the expression above we get the saddle point problem:arg minfmaxw〈K f, w〉+ G(f)− F∗(w). (3.44)This saddle point formulation effectively splits the unknown in two spaces relatedby the linear operator K.From this definition, one can apply the primal-dual convex optimizationalgorithm due to Chambolle and Pock [18], we list as Algorithm 3.2. This algorithmuses proximity operators denoted by prox. We describe the proximity operators andderive a few of the common results in the following section.Algorithm 3.2 primal-dual convex optimization algorithmRequire: θ ∈ (0, 1], σ > 0, τ > 0, στL2 < 1 where L = ‖K‖1: w(0) ← 0 . any initialization would do2: f (0) ← 0 . any initialization would do3: f (0) ← f (0)4: repeat5: w(t+1) ← proxσF∗(w(t) + σK f (t)). Proximity operators are discussedin the following section6: f (t+1) ← proxτG(f (t) − τ KT w(n+1))7: f (t+1) ← f (t+1) + θ (f (t+1) − f (t))8: until convergence663.4.7.1 Proximity operatorsThe proximity operator of a function F denoted by proxτF(f) essentially performsa implicit gradient descent step of a length τ > 0 for the function F,proxτF(g) = (I+ τ∂F)−1 (g) (3.45)= arg minf12τ‖g − f‖2 + F(g), (3.46)where ∂F denotes the subgradients of F. Subgradients are generalizations ofgradients: the subgradient of a function at some point is the set of all planesthat are less than or equal to the curve at all points. Gradient of a functionis only well defined everywhere the function is both continuous and smooth;whereas subgradients are well-defined everywhere the function is continuous butnot necessarily smooth and differentiable. This lets us perform a generalizedgradient descent on non-smooth convex surfaces such as the `1 norm.The primal-dual convex optimization algorithm performs a series of proximitymapping i.e., implicit gradient descent steps until convergence. Therefore, for theprimal-dual convex optimization algorithm to be effective, the proximity operatorsshould be fast to compute. We discuss a few proximity operators below.We start with deriving a simple proximity operator, for a denoising data-fittingterm denoted by GDEN which is an `2 norm over a simple difference between theparameter f and some reference g,GDEN(f) =12‖g − f‖2 . (3.47)By definition, the proximity operator of this function is,proxτGDEN = arg minw12τ‖f − w‖2 + GDEN(f) (3.48)= arg minw12τ‖f − w‖2 + 12‖g − w‖2 (3.49)=f + τg1 + τ(3.50)67which is a simple linear interpolation. When convolution is present, the proximitymapping becomes a little harder to solve. Let us assume the forward model includesa small convolution with some kernel ρ, and therefore the corresponding data-fittingterm denoted by G⊗ is,G⊗(f) =12‖g − ρ⊗ f‖2 . (3.51)By definition, the proximity operator of this function is,proxτG⊗ = arg minw12τ‖f − w‖2 + G⊗(f) (3.52)= arg minw12τ‖f − w‖2 + 12‖g − ρ⊗ w‖2 (3.53)=f + τF−1 (F (g) /F (ρ))1 + τ(3.54)which is a linear interpolation with a deconvolved image. This closed formassumes that the Fourier transform of the kernel has no zeros in it. However,more commonlyF (ρ) will not be invertible, in which case a solution of the closedform (3.54) will not be valid; we solve the minimization problem (3.53) instead.proxσ F∗TV , the proximity operator of the TV term, is point-wise shrinkage [18],proxσF∗TV(y0) =y0max(1, |y0|) . (3.55)3.5 SummaryIn this chapter, we have laid out the mathematical background of a generalframework for solving the single-image HDRI problem we address in this thesis.This chapter sets the stage for the discussion of our contribution presented in laterchapters. While the mathematical derivations presented in this chapter are verygeneral, these ideas will be useful for the mathematical derivations in later chapters.Our discussions in those chapters will tie back to relevant sections in this chapterfor greater clarity of exposition.68Chapter 4Filtering at the camera apertureIn this chapter, we propose a method for scenes where the highlights occupy a smallnumber of pixels in the image, such as a night shot with a few bright objects. Whenthe areas with highlights are small, a standard dynamic range (SDR) camera cancapture most of the pixels faithfully, but fails to capture the small number of pixelswith bright highlights because of clipping due to sensor saturation.Natural scenes often contain certain bright features such as light sources, andspecular reflections. These features are much brighter than the rest of the scene(often in the order of 1000 : 1 brighter), but they occupy very few pixels (2–5%) ofthe image. We argue that it is wasteful for the entire sensor to have a high dynamicrange (HDR) capability just for capturing these small fraction (2–5%) of pixels.Instead, we propose to use an added resource (photographic or physical filter) infront of the aperture of the SDR camera, for capturing the HDR image.We take a novel view of single-image high dynamic range imaging (HDRI),which is based on a computational photography approach. We propose to opticallyspread some of the highlight information so that this information is added to theSDR image captured by the camera. Then the captured SDR image would containsome of the information about the highlights. This spreading is achieved using atype of optical physical filter known as a cross-screen filter or a star filter. Then wecomputationally invert this forward optical process (i.e., the effects of the opticalfilter) to reconstruct the latent HDR image.69Reconstruction Ground truthInput (a) The picture and the inset show theground truth of a HDR image. The insetshows a highlight zoomed in and dimmeddown to bring out the detail (a “shortvirtual exposure”, see Section 3.2).Reconstruction Ground truthInput(b) The SDR image captured by adding across-screen filter at the aperture of thecamera. The inset shows the highlight thatis clipped due to sensor saturation(marked with green).Reconstruction Ground truthInput (c) The picture and the inset show thereconstructed HDR image. The insetshows that the reconstructed lamp matchesthe ground truth quite well.Figure 4.1: Capturing a HDR image with a cross-screen filter.4.1 Optical encoding of pixel intensitiesCamera sensors capture a certain maximum number of photons before they saturateand no longer register additional light resulting in a clipped value of a bright light(highlight). It is possible to increase the saturation point by increasing the capacity70of the sensor electron well, however this is excessively expensive and reducessensor resolution.The human visual system (HVS) has developed a clever mechanism to copewith highly saturated scene regions, such as light sources. Like camera sensors,the photo-receptors in the retina of the human eye are also prone to saturation.However, the HVS is able to infer the true brightness of those saturated regionsusing the light that is scattered in the ocular fluid and spread over the retina. Theglare surrounding bright areas boosts their perceived brightness, giving additionalinformation to the brain that this part of a scene is much brighter than the photo-receptor saturation point [149].In this chapter we propose to use a similar approach to improve cameradynamic range without resorting to custom sensors, multi-sensor cameras, or time-sequential imaging. Unlike the eye, we are not limited to specific optics. Instead,we choose to modify the optical system in order to increase the informationreceived by a SDR camera. Specifically, we propose a computational photographyapproach comprised of the following steps:a) “Encoding” via optical spreading of information. A cross-screen filterspreads some of the light from a bright object in a star shared formation.In other words, the information is encoded into these specially shaped lightstreaks that are optically added to the image. The SDR camera captures anSDR image that contains these added light streaks but the highlights thatcreate these light streaks are clipped themselves. The captured SDR imageonly contains the light streaks created by the spreading of the bright objects(highlights) while the bright objects themselves are clipped.b) “Decoding” via image analysis and processing. In software, we separate thestar shaped light streaks from the captured SDR image. The separated lightstreaks pattern contains “encoded” information about the clipped highlights.We then use these light streaks to reconstruct the details of the clippedhighlights in the captured SDR image. The streaks-removed SDR image andthe reconstructed highlights together form the reconstructed HDR image.71(a) An 8-point cross-screen filteruvi i(b)(b) PSF of (a)Figure 4.2: (a) An 8-point cross-screen filter and its point-spread function(PSF). This is a mostly transparent, physical photographic filter withcarefully etched marks on its surface. The etched scratches are responsiblefor the light streaks. As it is shown in the zoomed inset, scratch marksare etched along 4 directions. The 4-directions create the 8-point starshaped light streaks when the filter is mounted in front of the lens. (b) Thepoint-spread function (PSF) (i.e., the resulting convolution operator.) Theimage shows the 8-point star shaped set of light streaks created aroundthe point light source. It may be noted that the light spreading is muchweaker compared to the bright center which is the point source. The u-vorthogonal vector pair defines the local coordinate system of a light streak.We use these coordinate systems in deriving the reconstruction algorithm(See discussion on (4.2) and in Section 4.3.2).We have experimented with a number of techniques that can capture someof the information in highlights (bright objects). Some obvious candidates wereregular lens flare and defocus blur to spread out energy from saturated imageregions to other pixels. However, to encode enough information of the highlights,energy spread must be significantly larger than the amount spread by standardlens flare. Likewise, an approach using defocus blur would have to use a strongdefocus with a blur radius that is very large, on the order of dozens of pixels.For such a large blur, we have found that even the most recent deconvolution72algorithms in combination with coded apertures are unable to reconstruct highquality images [137].In this chapter, we therefore focus on one optical encoding of clippedhighlights (bright objects) that we found most successful: spreading light in a star-shaped light streaks pattern: light is spread in a fixed set of discrete directions. Suchpatterns are produced by inexpensive physical photographic cross-screen filter (alsoknown as “star filter”), which are mounted in front of a camera lens. The scatteringpattern of these filters is most salient for bright scene features since the cross-screenfilters concentrate most of the energy in a Dirac peak rather than the light streaks.In other words, the light streaks are only visible around bright highlights.Star filters spread the light along light streak lines in a fixed number of discretedirections. The combined effect appears to have a two-dimensional shape (a p-pointstar-shape), although each of the constituent components (i.e., the light streaks)remain one dimensional. In this thesis, we use 8-point and 16-point cross-screenfilters (Figure 4.2 shows an 8-point cross-screen filter and its point spread functionwith 8 streaks. Figure 4.3 shows a plot of one constituent one-dimensional streakline.) Since the total effect is a combination of multiple streak line, the effect ofeach being a one dimensional spread, we employ one dimensional processing torestore the image. The one-dimensional (1D) processing is as follows:1. We first estimate the amount of light spread from bright image features intoeach of the discrete directions (the light streaks).2. From these light streaks, we obtain projection views of the clipped highlights.Using these projection views, we then reconstruct the clipped pixels using atomographic reconstruction technique.4.2 Image formation modelIn the following, we outline the image formation process for a camera with a cross-screen filter7310−610−410−2 2−point 6−point−1000 0 100010−610−410−2 8−point−1000 0 100016−pointPixelsIntensity(c)Figure 4.3: A cross-screen filter creates light streak that fall offexponentially. A measured PSF for the 1D slices along light streaks forvarious cross-screen filters, taken from images like Figure 4.2(b). Allof the filters that we have measured exhibit an exponential falloff (plussome small noise). The Dirac peak in the center is about three ordersof magnitude stronger than the falloff; which is why the light streak isso much dimmer than the central point in Figure 4.2(b). Exponentialapproximations for the fall-offs are shown as dashed lines. Each filterhas its own exponential falloff rate which we measure from point sourceHDR images such as Figure 4.2(b). It may be noted that the vertical axisof all these plots above have a log scale to help visualize the exponentialfalloff.A cross-screen filter is a (mostly) transparent, physical photographic filter withparallel scratch marks or grooves on its surface (Figure 4.2(a)). When mounted infront of a camera lens, the grooves disperse and diffract the light, creating a star-shaped pattern—linear light streaks in a number of discrete directions. This lightstreak is very faint and the star-shaped light streak patterns are usually noticeableonly around very bright highlights.A captured SDR image g can be expressed as a result of first applying a spatiallyvarying PSF H on the latent SDR image f , and then clipping the result to themaximum sensor value allowed9,g = min (1, H ⊗ f + η) . (4.1)9Please note that all input images in this thesis are normalized to 1, and as a result, the maximumsensor value, i.e., clipping level is also 1.74=log scaleXPSF, Hβα+ βlog scaleXDelta function, δαlog scaleXβExponential falloff, K+linear scaleX0Residual component, ρ+Figure 4.4: Analysis of the PSF of a cross-screen filter. On the left, a2-point cross-screen filter kernel is shown on a log intensity vs. linearspace plot. The log intensity scale is used to emphasize the exponentialfalloff (on a log plot an exponential function becomes linear.) The kernelcan be approximated as a sum of a Dirac delta function (which representsmost of the energy of the filter, α), an exponential falloff that modelsthe light streaks (which represents very little share of the incident lightenergy, maximum being β), and the residual component (which accountsfor a shift-variant wavelength-dependent response, shown here in a linear-linear scale.) The three components are processed separately.Here, η represents noise. H can be modeled as a combination of followingcomponents (Figure 4.4),1. δ: a shift-invariant Dirac peak representing the light that does not hit one ofthe scratches on the cross-screen filter,2. K: a light streak function which has been empirically found to be both shift-and depth-invariant10, and3. ρ: a zero-mean residual waviness in light streak that is not shift-invariant, butseveral orders of magnitude weaker in intensity.Thus,H = α δ︸︷︷︸Dirac delta+ β K︸︷︷︸exponential falloff︸ ︷︷ ︸shift-invariant+ γ ρ︸︷︷︸residual component. (4.2)10The light streaks are, however, created by focusing the light streak pattern through the cameralens, and hence are subject to radial lens distortion. In our discussion, we assume that radial distortionhas been removed in a preprocessing step by calibrating of the camera and lens used.75The light streak pattern function K is itself composed of 1D streaks (Figure 4.4).For a p-point filter, there would be p light streaks along p/2 light streak lines,K =p/2∑i=1ki(ui) (4.3)with an exponential falloff ki : R → R, ki(d) = exp [−m |d|]. Here, ui and viform an orthogonal coordinate system aligned along the ith light streak direction(see Figure 4.2(b)). It may be noted that the parameters α, β, γ and m can bemeasured for each cross-screen filter by capturing an (almost) point light sourceand measuring these statistics. In our experiments, we have observed that thesequantities are independent of focal depth and position. For the 8-point cross-screenfilter we used, we found that,α ≈ 1, β ≈ 10−4 and γ ≈ 10−7. (4.4)The scene-dependent residual function ρ (that empirically has the shape of awave) is primarily a function of the (unknown) spectral composition of the spreadlight. This function is shift-variant, it also distributes energy only along the radiallight streak lines, like K.Figure 4.3 shows a log-plot of light streaks for the two cross-screen filterswe experimented with. These measurements show that an exponential falloffmodel fits the overall shape of the light streak quite well. In our application, thisexponential model is sufficient for light streak estimation with sufficient precisionfor saturated pixel reconstruction. The high-frequency variations captured in theresidual function ρ are, however, important for removing light streaks from SDR76portion of the image. The overall image formation model is then given asg→ captured SDR image= min→ clipping due to sensor saturation(1→ image is normalized so that clipping level is 1, (α→ convolution with Dirac delta function+ β K→ exponential falloff⊗→ convolution)f→ unknown HDR image+ γ r→ wave-like residual component+ η→ noise)(4.5)where the residual component r,r = ρ⊗ f (4.6)is the result of a “convolution” of the latent image f with the shift-variant residualwaviness pattern.In summary, our image formation model consists of a Dirac part and acombination of p/2 1D functions describing both an exponential falloff and aresidual waviness. In the following, we therefore consider the light streaks removalproblem as a set of three independent 1D problems.4.3 HDR image reconstructionWe now describe our proposed method for reconstructing both the SDR image andthe clipped highlight details from a photograph taken with a cross-screen filter.Considering the convolution with the spatially varying PSF (4.5), we can notdirectly solve for the latent image, due to sensor saturation. Instead, we split the77problem by separately processing the saturated pixels and the unsaturated pixels inthe observed image g.Let the subscript ·U denote the coordinates of all unsaturated pixels in thecaptured image, and ·S denote the coordinates of all saturated pixels. With thesedefinitions, we rewrite the unsaturated component of (4.5) as follows:gU = α (fU + fS) + β K ⊗ (fU + fS) + γ r (4.7)and since fS = 0 for unsaturated pixel locations, we rewrite and group the lightstreak terms by type,gU = α fU + β K ⊗ fU︸ ︷︷ ︸spreading unsaturated pixels+ β K ⊗ fS + γ r︸ ︷︷ ︸spreading saturated pixels, (4.8)As a result, we can now obtain the latent image by estimating and removing severalkinds of light streaks:• (β K ⊗ fU): light spreading from unsaturated pixels that affects otherunsaturated pixels. This type of light streak is fairly weak and does notcontain high spatial frequencies (we present next in Section 4.3.1). (Theresidual component due to unsaturated pixels is negligible and is not shownin (4.8)).• (β K ⊗ fS + γ r): light spreading from saturated pixels that affectsunsaturated pixels can be estimated and removed through the use of imagepriors (we present in Section 4.3.2). The estimated light streaks also providesinformation about the saturated regions from which the light streaks haveemerged, and can therefore be used to reconstruct spatial detail within thosesaturated regions (we present in Section 4.3.3).As mentioned earlier, to perform a two-dimensional (2D) deconvolution, wedecompose it into a series of 1D problems (to make the solution possible and robust)and finally a tomographic reconstruction.784.3.1 Removing light streaks due to unsaturated pixelsLet g′ denote the image that contains only the light streaks from the saturated pixelsin highlights (i.e., fS) only, but does not contain light streaks from the unsaturatedpixels. In this section we ask: can we obtain a mathematical expression for g′?When light is spread from a point, it loses some intensity. To be exact, the pixelvalue reduces by a factor of α compared to the intensity it would have without thespreading. This factor α is evident from the coefficient of the Dirac component ofthe cross-screen filter PSF; the Dirac component expresses how much light does notspread. This coefficient of the Dirac component is α and that is the portion of lightthat remains in the original pixel after spreading. Therefore, obtain the originalpixel value, after removing light streaks we also need to scale the value by 1α .From the discussion above, we now write the mathematical expression for g′(the image with only saturation-generated streaks):g′→ contains only the light streaks from the saturated pixels≡ 1α{→ correction for spreading(gU→ fU plus all streaks (from fU and fS)−β K ⊗ fU→ minus streaks from fU only) (4.9)or, α g′ = gU︸︷︷︸captured− β K ⊗ fU︸ ︷︷ ︸streaks from fU. (4.10)If we rewrite (4.8), we see that the above expression for g′ appears on the left handside of the equation,gU︸︷︷︸captured− β K ⊗ fU︸ ︷︷ ︸streaks from fU= α fU + β K ⊗ fS + γ r (4.11)i.e., α g′ ≡ α fU + β K ⊗ fS + γ r. (4.12)79Again,βK ⊗ g′ = βK ⊗ 1α(α g′)(4.13)= β K ⊗ 1α(α fU + β K ⊗ fS + γ r) (4.14)= β K ⊗(fU +βαK ⊗ fS + γαr). (4.15)However, empirically (e.g., (4.4)) we know α β and α γ, and therefore wecan ignore the rightmost two terms,=⇒ βK ⊗ g′ ≈ βK ⊗ fU. (4.16)Putting (4.16) above in the definition of g′ (4.10), we obtain,αg′U ≈ gU − βK ⊗ g′U. (4.17)By solving for g′U, we remove the light streak due to unsaturated pixels,g′ =∞∑t=0(−βαK⊗)tg. (4.18)In practice, this infinite sum converges fast since α β =⇒ βα 1.4.3.2 Separating the light streaksThe next step is to estimate and remove light streak due to saturated pixels. Thislight streak component will also be used for reconstructing saturated pixel valuesin Section 4.3.3. Following the general formulation presented in Section 3.4.3, weobtain the latent image f by,arg minf∥∥∥∥∥∥ f −g′ − (β K ⊗ fS + γ r)︸ ︷︷ ︸saturated→unsaturated∥∥∥∥∥∥2︸ ︷︷ ︸data-fitting term+λTV ‖∇f‖TV +R︸ ︷︷ ︸image priors, (4.19)80Streakstreakstreakstreak0-1-21-1-2-3Figure 4.5: Analysis of the effect of a cross-screen filter. Cross-section ofthe light streak due to saturated pixels, forming between a pair of saturatedregions L andR. The cross-section is extracted from a single line of pixelsalong the light streak direction (u-axis in the inset). The light streak canbe split into two components with an exponential slope in the oppositedirections.where R gives constraints on the residual component r.In Section 4.3.2.1 next we deriveR and then in Section 4.3.2.2 we discuss howto solve the equation (4.19) above.4.3.2.1 Gradient properties of the residual light streak, rIn this section, we derive statistical properties of this function r above in (4.6).First, we present the empirically obtained properties of the derivatives of ρˆ, andthen we relate them to the derivatives of the residual light streaks, r.81Histogram of ∂r∂v (frequency vs. gradient value)FrequencyGradient value ∂r∂v ∈ {−1, 1}Figure 4.6: Motivation for modeling image gradients distribution asa Laplace distribution. The histogram of ∂r∂v shows a highly peakedstructure, which can be closely approximated with a Laplace distribution.(It may be noted that the central element has the frequency of 2727 (butthe entire bar is not shown)The wave-like residual component r accounts for wavelength-dependentvariation in the PSF. There is no need to exactly model its spectral characteristics inour application, since we only need to estimate the resulting intensity distortions.Removing this is important for removing light streaks completely; although is notessential for estimating the saturated pixel.It may be noted that the residual component is particularly visible in colorimages. For color single-image HDRI using our method, removing this residualcomponent is crucial.This residual component is caused by the third component of the cross-screenfilter PSF in (4.2): ρ has the same star-shape as the shift-invariant part, althoughthe exact energy distribution within the star is spatially varying. Like the spreading82component K, we express ρ due to a p-point cross-screen filter as a combination ofp/2 1D components:ρ =p/2∑i=1ρi (4.20)As noted before, ui and vi form an orthogonal coordinate system: ui is along thespreading direction, and vi is across. Likewise, the residual light streak r has p/21D components,r =p/2∑i=1ri, (4.21)where each of these 1D component is a 1D convolution with the bright highlight fSthat caused the spread,ri = fS ⊗ ρi, (4.22)In order to generate the constraints, we make the following empirical observations:• r is small and has a zero-mean normal distribution. The constraintcorresponding to this observation is to minimize ‖ri‖.• The gradient of ri across the direction of the light streak is dependent on thehighlight fS,∂∂vir =∂∂vifS ⊗ ρˆi (4.23)≈ ∂∂vifS ⊗ ρˆi (4.24)Since the convolution in this case is along ui, and the derivative is in theorthogonal direction, Therefore, the resulting derivative distribution reflectsthe sparse gradient prior of natural images. In other words, we assumethat these derivatives are distributed as a Laplace distribution; and thecorresponding constraint term is,∣∣∣ ∂∂vi r∣∣∣.• The gradients of ri along the direction of the light streak we empiricallyfound to be piecewise smooth; we therefore apply the sparsity constraint,i.e.,∣∣∣ ∂∂ui r∣∣∣.83We combine the statistical properties of derivatives of residual light streak rderived above to build the constraint term R:R =p/2∑i=1(λ1 ‖ri‖2 + λ2∣∣∣∣ ∂∂vi ri∣∣∣∣+ λ3 ∣∣∣∣ ∂∂ui ri∣∣∣∣) , (4.25)where λ1, λ2 and λ3 are proportionality constants.4.3.2.2 Solution via computational optimizationIn this section we solve (4.19).Following the general computation optimization formulation presented inSection 3.4.3, we obtain the latent image f by solving a generalized Rudin-Oshar-Fatemi (ROF) model,arg minf∥∥∥∥∥∥ f −g′ − (β K ⊗ fS + γ r)︸ ︷︷ ︸saturated→unsaturated∥∥∥∥∥∥2︸ ︷︷ ︸data-fitting term+λTV ‖∇f‖TV +R︸ ︷︷ ︸image priors, (4.26)We can factor this step into a number of 1D problems along directionsui, where ui,vi form a coordinate frame aligned with the ith light streak (seeFigure 4.5). In the following discussion, we consider each light streak directionseparately, and thus omit the directional identifier i for notational convenience.This lets us simplify the optimization problem. To actually apply the imageprior in the light streak estimation, we consider a single continuous segment M ≡{uL, . . . , uR} of unsaturated pixels along a light streak direction u, with some rowindex v. M is bounded by two sets of saturated pixels L andR on the left and on theright, as shown in the inset of Figure 4.5. We now rewrite the exponential nature of84the 1D light streaks (from (4.3)), and expand the convolution operator in (4.26):(k ⊗ fSv) (u, v) = `L exp [−m |u− uL|]︸ ︷︷ ︸left light streak component+ `R exp [−m |u− uR|]︸ ︷︷ ︸right light streak componentfor u ∈M,(4.27)≡ `LkL + `RkR (4.28)where kL ans kR are the shifted exponential falloff patterns centered at the left andthe right of the segment M, and `L and `R represent the amount of energy presentin the light streak from the saturated pixels to the left and to the right of M,`L =∑t∈Lexp [−m |t− uL|] f(t, v), and`R =∑t∈Rexp [−m |t− uR|] f(t, v). (4.29)These quantities, which we refer to as line integrals will be useful for reconstructingdetail in the saturated regions in Section 4.3.3. It may be noted that `L and `R havethe same value for all unsaturated pixels u ∈ M, and therefore we use all pixels inM to robustly estimate these two quantities.Now we reformulate the light streak estimator in terms of line integrals `Land `R rather than saturated pixel values. We rewrite our ROF-like model (4.26)using the line integrals (4.27), (4.29) and the residual light streak statisticalproperties (4.25) which leads to the final form of the optimization problem. Forthe sake of clarity, we omit the location parameter u below.arg min`L,`R,ru∈M∥∥∥∥∥∥f −g′ − (β `LkL − β `RkR + γ r)︸ ︷︷ ︸saturated→unsaturated∥∥∥∥∥∥2︸ ︷︷ ︸data-fitting term(4.30)+ λTV ‖∇f‖TV︸ ︷︷ ︸sparse gradient+λ1 ‖r‖2 + λ2∣∣∣∣ ∂∂v r∣∣∣∣+ λ3 ∣∣∣∣ ∂∂ur∣∣∣∣︸ ︷︷ ︸residual light streak properties (4.25)︸ ︷︷ ︸image priors. (4.31)85(a)(a) an SDR imagecaptured with thecross-screen filter(b)(b) only the exponentialfalloff component K isremoved, but the residualcomponent remains(c)(c) All light streaksremovedFigure 4.7: Chromatic issues in a light streaks created by the cross-screen filter. The light streaks resulting from applying a cross-screenfilter are not monochromatic; we observe color fringes due to diffractionand dispersion. (a) Although the color artifacts seem to be very faintin captured images, (b) they are strongly enhanced, since removing theachromatic exponential light streak boosts chromatic contrast. (c) Theestimation of wavelength-dependent variations removes most of the colorartifacts.This equation allows us to efficiently optimize on each segment M independently.However, to solve for `L and `R, the partial derivatives ∂`L∂v and∂`R∂v must be foundfor all segments and then integrated. To solve the minimization problem efficiently,we use several EM iterations. We initially set γ ∂r∂v = 0. Since γ β, thisprovides a reasonable initial estimate of exponential light streak component, butenhances color artifacts when this monochromatic light streak is removed. In theE-step, we solve for `L and `R, and in the M-step we refine the estimate of r.Minimizing (4.30) is sufficient to remove most of the light streaks (Figure 4.7).Finally, we prepare line integral estimates for the energy contributed byindividual, continuous regions of saturated pixels, which will be used in the nextsection. Each value `L and `R could contain contributions from not only one but86(a) light streaks are effectively projections from 8different angles(b) projections are expressedas line integrals forreconstructing the clippedhighlightsFigure 4.8: Reconstruction of clipped highlights from cross-screen filterlight streaks. (a) light streaks along discrete directions give different“projections” of the saturated region. (b) Bilinear sampling along thesedirections relates line integrals to saturated pixels in (4.32) and (4.33).multiple saturated segments on the left and right of M (not shown in Figure 4.5).However, isolating light streaks due to each saturated region is trivial since thereare exactly as many line integrals as there are regions M along a light streak, andtherefore we find the contributions for each region with a simple linear system. Forconvenience, we shift the origin of (u, v) to the leftmost or rightmost pixel of eachsegment M to obtain isolated line integrals ̂`L and ̂`R.4.3.3 HDR reconstruction of saturated pixelsSo far, we have decoded the values of the latent image f for the previouslyunsaturated pixels only; the values of the saturated pixels are still unknown.However, light streak removal procedure from Section 4.3.2 also yields lineintegrals along p discrete directions, as shown in Figure 4.8(a). In the final step of87the decoding procedure, we use this information to reconstruct the saturated region.To this end, we need to find saturated pixel values that would produce the lineintegrals matching the observations. This requires solving a standard tomographicreconstruction problem [67].Unlike our light streak estimation, the tomographic reconstruction is inherentlya 2D problem. We gather the estimated line integrals along all p directions in alinear system that describes the relationship between line integrals and saturatedpixels f . We therefore use a one-index representation for all line integralscontributing to a given region: ̂`i. This relationship is then expressed aŝ`i =∑jwij fj , (4.32)where the weight term wij for line integral i and an unknown pixel j is the productof exponential falloff and a bilinear resampling weight aij ,wij = aij exp [−m |ui − uj |] . (4.33)Here, ui is the reference location used while computing ̂`i. The absolute valueconsolidates different signs for light streak falloffs to the left and right.We solve this tomography problem using Simultaneous IterativeReconstruction [67, pp 284]. We start with an initial guess, f (0) = 0. Then, ineach iteration t, the residual error in the current estimate of line integrals,∆̂`i = ̂`i −∑jwij f(t)j , (4.34)is backprojected over the participating unknown pixels regardless of distance fromthe reference location, i.e., energy distribution is proportional to resampling weight(a) only,f(t+1)j = f(t)j −∆̂`i aij∑k ai k. (4.35)Using a uniform distribution for the backprojected residual, independent of anyfalloff, is a standard procedure in tomography. One should think of this as a (weak)88prior on the intensity distribution within the unknown region. We employ a simpletwo-scale approach which solves the problem for a low resolution image first. Sincewe know that actual values at the saturated pixels are larger than the saturationthreshold for the camera, we enforce this simple constraint during backprojection.4.4 Results: synthetic test casesFigure 4.9 shows a number of results using input images with synthetic light streak.In the left column, we show the SDR input image, which has been generated byconvolving an HDR image with the filter PSF, and adding noise and quantization.The center two columns show different exposures of our result, while the rightcolumn shows a short exposure rendition of the original HDR image. Figure 4.9(a)shows an image with number of small specular highlights, which are reconstructedfaithfully by our approach. The second row tests a difficult case for our light streakestimation approach, since the light streaks are aligned with a strong image edge(the horizon). Sparse gradient prior for gradient estimation is not valid in this case,resulting in a mis-estimation of the horizontal light streak (circled region in thecenter-left image), and hence a lower-quality reconstruction of the saturated region(zoomed-in in the center-right image). These artifacts are best seen in the electronicversion of the chapter. The example in Figure 4.9(c) shows that one can resolvethese artifacts by rotating the cross-screen filter such that light streaks do not alignwith any strong image edge.89(a)(b)(c)Recovered HDRLong exposure Short exposureReferenceInputLong exposure Short exposure4.244.704.29Figure 4.9: Our reconstruction results for simulated light streak from an8-point cross-screen filter. The first column shows the SDR image givento the reconstruction algorithm as input. We computed this SDR inputimage by simulating our forward model (i.e., a convolution with the 8-point cross-screen filter PSF followed by quantization) on HDR images.Saturated pixels are marked green. The recovered HDR image is shown onthe second and the third column, as a pair of a simulated long exposureand a simulated short exposure (Section 3.2). The long exposure showsthe quality of light streak removal and the short exposure shows thereconstruction of the clipped highlights. The rightmost column shows asimulated short exposure of the reference HDR image as comparison withthe reconstruction in the third column. The first test case on row (a) showsa test case with very bright highlights. The test cases on rows (b) and (c)above show the same image, but in case of (c) the filter was rotated by22.5 degrees to avoid aligning light streak patterns with the color gradientof the sky. and thus better light streak estimation was obtained. The thirdcolumn gives the dynamic range increase at the top-left corner of eachimage. See text in Section 4.4 for a full discussion.90(a)(b)(c)(d)(e)(f)long exposure short exposureRecovered HDR (Insets: reference HDR)Input: single exposure with a cross−screen filtershort (virtual) exposurelong exposure (input)7.15ReferenceReference5.416.716.815.41ReferenceReference6.49Figure 4.10: Results of reconstruction from images captured by a camerawith an 8-point cross-screen filter. The first column shows the SDR imagescaptured with the filter (input to the proposed algorithm). The secondcolumn shows the saturated pixels marked in green, and the entire imagehas been dimmed down. The third column shows a long virtual exposureof the recovered HDR image (virtual exposures are needed for visualizingHDR images, Section 3.2). It may be noted that most of the light streakspresent in the first column is removed. The numbers in the top-left cornerindicate the dynamic range (DR) increase in f/stops. The fourth columnshows the short exposure of the same reconstructed image. The insetsshow the reference image captured with the multi-exposure method andthe corresponding regions are marked with a red frame. We were not ableto capture multi-exposure reference images for rows (d) and (f) becauseof the moving objects.914.5 Results: real test casesFigure 4.10 shows a number of examples of HDR images, decoded from singleimages captured as RAW images with a Canon 40D digital SLR camera (SLR)camera using 8- and 16-point cross-screen filters, and Canon lenses ranging from50 mm to 100 mm. In this figure, the first two columns represent two exposuresof the 12 bit input image, while the right two columns represent two virtualexposures of our reconstructions. Saturated regions are reconstructed, and lightstreaks produced by the filter are removed. For color images, we run our algorithmseparately and independently on each color channel. Radial lens distortion wasremoved in a pre-processing step. Insets in the right column show ground-truthcomparisons for some of the results, i.e., short exposure images taken without thefilter, using the same camera and lens. It may be noted that the geometric andphotometric alignment may not be perfect due to the changes in the acquisitionsetup.4.5.1 Separation of light streaksAccurate estimation and separation of the light streaks created by the cross-screenfilter we use is necessary correctly reconstruct saturated regions. It is also necessaryso that the final output of this method is free of light streaks. Our sparse-gradientprior has been shown to be robust enough to estimate light streaks both for amultitude of small light sources (Figure 4.10a), as well as relatively large saturatedareas (Figure 4.10c). The main requirement for successful light streak estimationis that saturated regions be both bright and large enough (i.e., sufficient cumulativeenergy) to produce light streak above the camera noise level.4.5.2 Reconstruction of clipped highlightsGiven only 8–16 directional line integrals, tomographic reconstruction is achallenging task. Even so, the results demonstrate that our method estimates92the total energy of the saturated regions as well as the approximate values of thesaturated pixels. This is in contrast to the previous single-image methods, whichcould achieve neither of these two goals.Some SDR-to-HDR enhancement methods [29] pose it as a classificationproblem. Since bright highlights are clipped, this is a difficult estimate the clippedintensity values via classification. Our method can identify very bright objects veryeasily: from the existence of the star-shape light streaks. This renders complexclassification methods unnecessary.Figure 4.10(a) also demonstrates that the multi-exposure HDRI can exhibitsome artifacts due to alignment issues, particularly at the outline of the lightsources. Ours being a single exposure method, does not show any such artifacts.4.6 AnalysisOur method is not suitable for scenes with large saturated regions, such as the sky.This is because large saturated regions do not have enough unsaturated pixels fromwhich we can reliably obtain the light streaks. The method can conceptually handlescenes with light sources outside the image frame, but we found that the accuracyof light streak estimation is often not sufficient in such cases.Finally, our method is also likely to fail if a scene contains color gradientsthat have the same orientation as the light streaks. In this case the assumption of azero-mean gradient distribution does not hold. It is usually possible to avoid suchproblems by rotating the filter out of alignment with image gradients.4.6.1 Detection limits of light streaksThe cross-screen filter can effectively help capture information about most, but notall clipped pixels. This is because, for a light streaks pattern to be detected, it mustbe at least a few times stronger than the camera noise level. Figure 4.11 shows the930 0.5 1 1.5 2 2.5 3 3.5 4 4.5−10−505101520252468101214Distance from the saturated region [visual degrees]Glare−to−noise ratio [dB]Figure 4.11: Detectability of light streaks vs. distance from a saturatedregion. The numbers next to lines indicate how much brighter (in f/stops)the light source is relative to the sensor clipping level.glare-to-noise ratio in dB for a highlight that has a width of 0.5 visual degrees, isof a uniform intensity, and is from 22 to 216 times brighter then the sensor clippinglevel. The glare-to-noise values are given for the pixels that are located x visualdegrees from the source of light streak (x-axis, 100 mm lens). The image regionreceiving the light streak signal is uniform, and its pixel value is 2 f/stops below thesensor clipping level. The exponential model of the 8-point PSF was used to createthe plot. We used a simple camera noise model that consists of normally distributedstatic noise with the standard deviation σs = 0.0002 (for a maximum sensor valueequal to 1) and signal-dependent noise with the standard deviation σd = 0.013.The glare-to-noise ratio was computed asGNR = 10 log10 g − f√σ2d g + σ2s , (4.36)where f is the original pixel value without light streaks, and g is the pixel value withlight streaks. We used the noise parameters to approximate the characteristics ofour Canon D40 camera (200 ISO, 5.6f, standard post-processing settings), althoughthese can vary with aperture, ISO settings, sensor temperature and other factors.The parameters we found by least-square fitting of the model to the noise found ina gray card photographed with varying illumination levels.940 0.5 1 1.5 2 2.5 3 3.5 4 4.500.511.522.533.52 4 68101214Distance from the saturated region [visual degrees]Noise increse due to glare [dB](a) Noise increase due to light streak removal. The higher noiseis caused by higher shot noise for the pixels captured with lightstreaks than for the same pixels captured without light streaks.0 10 20 30 40 50 60 70 8000.20.40.60.81Frequency [cycles per visual degree]Modulation 2−point6−point8−point16−point(b) Modulation transfer functions (modulation transfer function(MTF)s) of cross-screen filters.Figure 4.12: Analyzing noise added due to a cross-screen filter. Thecurves are generated for the same conditions as in Figure 4.11. Theexponential models of the PSFs were used to compute the MTFs in orderto remove the MTF of the lens system.Figure 4.11 shows the trade-off between clipping glare pixels and capturinglight streaks that is too weak to be detected. A 0.5 visual-degree segment of clippedpixels must be at least 5–6 f/stops (32–64 times) brighter that the clipping level toproduce gthat is detectable. A brighter or larger source of light produces a higherglare-to-noise ratio, but if it is much brighter, pixels can get saturated, and thus lose95encoded information. This is shown on the plot as clipping of the lines above 20 dB.To avoid saturation, the exposure time needs to be shortened, but this increasesnoise in an image [128]. Saturation of light streak pixels can be also avoided if across-screen filter that produces weaker light streak (i.e., smaller β) is used. Sucha filter, however, results in a smaller glare-to-noise ratio, making the light streakdifficult to detect and estimate. The best results are achieved if the cross-screenfilter is selected to produce just detectable light streak with possibly large exposuretime, while avoiding saturation of light streaks pixels. The specific values on theplot apply to the specific setting outlined above, but the qualitative analysis appliesequally to other cameras and light sources.4.6.2 Noise analysisEncoding additional information in unsaturated pixels has one drawback: itincreases the noise level. Fortunately the cross-screen filter has a relatively smallimpact on noise. Since shot noise is proportional to the square root of the signal,pixels affected by light streak due saturated pixels (S→ U) have higher shot noisethan if the same pixels were captured without light streaks. In Figure 4.12(a) thiseffect is simulated for the same parameters as used in the light streak analysis(Figure 4.11). Since the cross-screen filter spreads light in discreet directions,this noise increase is much smaller than for typical veiling glare in lenses [128],and affects only a small percentage of pixels. The noise is also increased due tothe deconvolution that we perform when removing glare due to unsaturated pixels.This noise increase is also moderate, about 0.31 dB for all our filters except the8-point one, which can boost noise up to 1.17 dB. The numbers are explained bythe MTF of cross-screen filters, which have very high values for all frequencies, asshown in Figure 4.12(b). If the deconvolution is performed in the Fourier domain,the frequency components are multiplied by the inverse of the MTF. Since thismultiplication boosts the contrast of both image details and noise, the noise increasecan be approximated by the inverse of the MTF values.964.7 SummaryIn this chapter, we have proposed a novel method for obtaining high dynamicrange (HDR) images of scenes with small highlight areas. Our method canfaithfully reconstruct the clipped highlights using an approach that opticallyencodes information of the highlights by spreading them in light streaks, andthen reconstructs the clipped highlights images from these light streaks as intomographic imaging problems. We note that because of the low number ofdirectional glare streaks (8 or 16) compared to the high number of views (64,128, or more) used in traditional tomographic imaging applications, there are someartifacts in the reconstruction as expected.This method can be extended for HDR video in an interesting way: if thefilter can be made to rotate in sync with the shutter then we can gather structuralinformation on highlights from multiple frames, and it will be much denser thanjust the 8 or 16 views.This method works well when the image has a small number of highlight pixelsbut fails when the number is high. We investigate and develop methods appropriatefor such cases in Chapter 5 of this thesis.97Chapter 5Filtering at the camera sensorIn this chapter we address “demanding” scenes that have large brightness variationswith large areas with highlights and shades. The existing methods as well as themethod we proposed and presented in Chapter 4 would fail in case of such largehighlights. To overcome the limitation, in this chapter, we propose using a recentlymade available sensor technology known as the exposure-multiplexed mode [1] tocapture encoded images. Then we reconstruct the high dynamic range (HDR) imageoffline with the help of the image prior we develop and present in this chapter.The rest of the chapter is organized as follows: In Section 5.1 we discusswhat sensitivity-multiplexing is, and in Section 5.2 we introduce its applicationin high dynamic range imaging (HDRI). In Section 5.3 we analyze the forwardmodel, and in Section 5.4 we formulate the inverse problem corresponding to theforward model and we propose a general solution to the problem as well. Wediscuss why this exposure-multiplexed HDRI problem is a missing value problem,and argue that we need two image priors: the sparse gradient prior for sharpeningthe image, and another image prior that would help interpolate missing values inan edge-preserving or edge-aware fashion. In Section 5.5 we first discuss usinga non-local self-similarity prior and present an implementation of the idea usingblock-matching and 3D filtering (BM3D). In Section 5.6 we present a novel imageprior, namely the smooth contour prior, we demonstrate its performance using thesuper-resolution (SR) problem in the same section, and then in Section 5.7 we derivean single-image HDRI method using the proposed smooth contour prior.985.1 Sensitivity-multiplexingIn this section, we produce a general discussion on sensitivity-multiplexing, and inthe next section 5.2 we present HDRI as an application of sensitivity-multiplexing.In this chapter, we propose to modify the information presented to the sensorso that more scene details are obtained in the captured single-image comparedto the single-image obtained without this filtering. This per-pixel filtering canbe performed optically (Section 5.1.1 next) or, more preferably, electronically(Section 5.1.2). A post-processing step then would recover the full imageinformation and produce a HDR image.5.1.1 Optical sensitivity-multiplexingOptical sensitivity-multiplexing, as the name implies, optically modifies the imageincident on the sensor.A direct way to perform such sensitivity-multiplexing is by using, a modifiedBayer pattern filter [14]. For example, the method known as “Assorted Pixels”by Nayar and Mitsunaga [102] (Figure 2.2) proposed such an arrangement of pixelfilters that would capture all three colors and a wider than usual dynamic range(DR). Here is how it works: camera sensor pixels “sees” color using color filters(Figure 3.4). For example, green pixels have green color filter attached on top of thepixels; light within the range of the spectrum we consider “green” can pass throughand the rest of the light is absorbed by the filter. Optical sensitivity-multiplexingrequires that these pixel-specific filters also have a transparency component. Forexample, instead of having just one of the three color filters (red, green, and blue),in sensitivity-multiplexing method a pixel can have any combination of color andND filters: dark green, light green, dark red, light red, and so on. Such a filter fittedon a sensor pixel will let in a specific color, and will also have a preset level ofbrightness sensitivity.99While optical filtering is useful in forming color images, it has a downside. Thedownside of such optical filtering is that some incident light is blocked. Blockinglight is necessary in color photography; but it is not at all ideal in other fieldsof photography since light conveys information and we want to capture as muchinformation as possible (within the allotted time) (more pixels, higher DR, largercolor gamut, etc.) because capturing more information always results in more vividimages.Optical sensitivity-multiplexing also requires a permanent modification to thesensor. With no way to “turn off” the HDRI feature, when a high DR is not needed,such sensors would perform worse than the average image sensor.5.1.2 Electronic sensitivity-multiplexingUnlike optical filters, electronic filters do not waste light and perform per-pixelfiltering by varying either the exposure duration or the sensitivity of the pixels.Varying exposure duration from one pixel to another has a strong limitationbecause pixels next to one another might integrate the observed light over timewindows with different lengths. This makes the reconstruction problem muchharder in the presence of object motion [59]. Please see the background materialfor a more detailed discussion in Section 2.2.3.Instead of varying the exposure time, we prefer the sensitivity modificationapproach for electronic sensitivity-multiplexing. In particular, in this chapter, weinvestigate the approach where exposure-index (EI) is multiplexed. We describe theimaging process in the next section.Furthermore, we recognize the value in using existing, off-the-shelf consumerhardware so that HDRI feature can be “turned off” at will. This way themodifications we use are done in software, and can be undone easily. Consumersdo not need to acquire specialized hardware.1005.2 OverviewThe exposure-multiplexed mode, or more commonly the “dual-ISO mode” [1],available in recent camera sensors, is a new capability designed for high dynamicrange imaging (HDRI). Sensors with this capability can expose alternate rows ofpixels with two different EI settings (also known as “ISO sensitivity” [64]). Inother words, we can choose two different exposure settings simultaneously. We canchoose one exposure setting (e.g., for the even row-pairs) such that the highlightsdo not saturate, and choose the second exposure (e.g., for the odd row-pairs) suchthat the shades are not noisy or under-exposed (Figure 5.1).It may be noted that in the context of our discussion in this chapter, sensitivity-multiplexing (which is more specific) and exposure-multiplexing (which is moregeneral, see the glossary entry) are effectively the same thing, and therefore wewould be using these terms interchangeably.Since both of the exposures are taken at the same instant and for the sameduration, no temporal inconsistencies (i.e., ghosting) result from camera or objectmovements unlike conventional multi-exposure HDRI techniques. This makesexposure-multiplexed approach ideal for capturing dynamic scenes. A smalldownside of this approach is that the higher EI comes at a cost of higher noise.The novel edge preserving image prior we propose in this chapter, which werefer to as the smooth contour prior, helps reconstruct the full resolution HDRimage from a sensitivity-multiplexing capture. The simple method of directlymerging subsampled exposures demonstrated in Figure 5.1 results in a resolutionloss. Since alternate sets of rows are missing from each exposures, the problem ofreconstructing the latent full resolution HDR image boils down to a formulation verysimilar to that of the super-resolution (SR) problem. In the following sections, wepresent the smooth contour prior and demonstrate its application to the SR problem.Then we adapt the prior for exposure-multiplexed HDRI (Section 5.7).101(a) Input dual-ISO RAW image (b) Low ISO RGB image(c) High ISO RGB image (d) Output HDR RGB imageobtained from (b) and (c)Figure 5.1: The “exposure-multiplexed mode” [1]. (a) An exampleRAW image captured in the “exposure-multiplexed mode” with a Canon6D digital SLR camera (SLR) camera and Magic Lantern firmwareextensions [87] installed. The sensor rows were set in an alternatingfashion to low and high exposure-index (EI) settings, i.e., sensor pixelsensitivity settings. Zoom in on the electronic version to view at thefull resolution. (b) Extracting only the low-EI rows gives us the “lowexposure” RGB image, which has details in the highlights but the lowlightsare very noisy (red arrows). (The noise might not be visible in theprint version.) (c) Extracting only the high-EI rows gives us the “highexposure” RGB image, which has details in the shades better preservedbut the highlights are clipped due to sensor saturation. (d) Merging (b)and (c) gives a high dynamic range (HDR) image that contains details bothin the highlights and the shades. However, this straightforward methodloses sensor resolution as is evident from the insets. Our goal is to obtainthe HDR image with the same resolution as the input RAW image.1025.2.1 BackgroundFor this chapter, we use exposure-multiplexed imaging [1] in which the sensor canacquire more information on a single image compared to a single image obtainedwith conventional imaging with a uniform EI.Camera sensors can faithfully capture only a limited range of brightnessvariation; this range is known as the dynamic range (DR) of a camera. Each pixelof the sensor acts as a photon counter; a bright scene point will send more photonsthan a pixel can count effectively saturating the sensor. Measurements from allpixels of the sensor are fed into an analog-to-digital converter which applies anEI (which is an analog gain similar to film speed, or more commonly referred toas the “ISO speed” [64]). The absolute range of values faithfully captured in anexposure depends on the shutter speed, the aperture size and the EI. Note that therelative brightness ratio between the brightest and the darkest measured pixel valuesremains constant, and this constant is the DR: formally defined as the ratio betweenthe clipping “ceiling” and the noise “floor”.High dynamic range (HDR) imaging techniques use these standard dynamicrange (SDR) cameras and capture HDR images. The most well-known HDR imagingtechnique involves capturing multiple images of the same scene, each imagecaptured with different camera sensor exposure settings – shutter speed [27, 98],aperture [54] or EI [56]. This way bright areas will be captured by an imagecaptured with a short exposure setting while the dark areas will be captured byanother image of the sequence captured with a long exposure setting. When allthis captured SDR image data is combined offline, a HDR image is produced. Thismethod is guaranteed to produce the best HDR images; however, there are strictlimitations that have to be met for the sake of high quality: the camera cannotmove, the scene has to be static, and the lighting conditions cannot vary betweenexposures. If one or more of these constraints is violated, ghosting will result whenmerging the image sequence into a HDR image. A recent method [56] aims atcapturing multiple very short exposures in rapid succession aka “burst mode” inorder to reduce the misalignment problem but it fails to capture details in the dark103Lowpassfilterexposure-multiplexedanalog-to-digitalconverterProposed(a) Scene (b) Anti-aliased (c) Bayer-filtered(d) observed(e) Captured image(f) HDR reconstructionreconstructionmethodColorfilterarrayCameraImage formation modelFigure 5.2: Overview of the exposure-multiplexed high dynamic rangeimaging (HDRI) method: The image formation model. (a) The scene.Light from the scene is focused onto the image sensor for acquisition. (b)The incident image is optically band-limited using a spatial low-pass filterto avoid aliasing. (c) Each sensor pixel can only measure in either red,or green or blue, as set out by the color filter array pasted on top of theimage sensor. (d) This band-limited and color-filtered optical informationthen is captured by the sensor into an array of noisy analog measurements,which is then enhanced using per-pixel exposure-index (EI) and convertedto digital image data by an analog-to-digital converter. Pairs of rowsalternate between a low EI and a high EI. (e) The low sensitivity pixelscan capture the bright scene areas well but the dark and shaded sceneareas lose detail due to noise. The high sensitivity pixels can capturethe relatively darker scene areas better, but are clipped due to saturationin bright parts of the image. (f) Finally, we solve an inverse problemand obtain maximum-a-posteriori (MAP) estimate of the unknown band-limited latent image (b).104regions effectively. Deghosting HDR images has been extensively explored [62],however it still remains largely an open problem. To avoid ghosting, one can usemultiple cameras sharing the same optical axis and imaging with different exposuresettings to capture the exposure sequence at once. However, this is an expensivesolution, this introduces additional optical components on the light path which addglare to the system, and this setup requires perfect calibrations; slightest relativecamera motions can result in a misalignment which cannot be precalibrated.Some recent image sensors as a middle path have introduced an optionalfeature that allows a user to perform a resolution-DR tradeoff; in exchange for apotential resolution loss a user can choose to capture a wider DR in a single capture,which can be particularly useful in situations where taking multiple exposures isnot an option. These sensors can spatially multiplex EI. An exposure-multiplexedimage contains more scene information compared to a uniform-EI capture. Pixelsthat are exposed at same exposure can observe the same brightness variation, whileother pixels with a different exposure setting can observe a different range ofbrightness values of the same scene. Effectively, this process allows capturing“multiple” exposures on a single exposure on a single sensor, each of which is takenwith a different exposure setting. This however, comes at the cost of resolution loss.The general idea is: in the same single exposure of an imaging sensor, differentgroups of pixels use different EIs. While each group individually still has theoriginal SDR, when the data is demultiplexed the combined image effectively hasa wider DR. Nayar and Mitsunaga [101] first proposed to lay spatially varyingneutral-density (ND) filters for EI multiplexing. However, this is not a desirablesolution sincea) once installed the filters cannot be modified based on scene types, andb) ND filters will absorb a large portion of the light incident on the sensor.Some recent sensors can multiplex electronic shutter speed [59] or EI [1] betweenrow-pairs, i.e., rows 1, 2, 5, 6, 9, . . . can be set up with one setting, and rows 3, 4,7, 8, 11, . . . with the other. Varying the shutter speed has the advantage of havinguniform noise properties throughout the sensor for any one exposure, but this also105poses the risk of ghosting [59] as the slower shutter speed will result in a largerblur in case an object moves in the scene. On the other hand, modifying EI canvary sensor noise properties but no temporal alignment is required since both setsof pixels come with the same exposure begin and end times.In this chapter, we aim to recover the full resolution image from such anexposure-multiplexed RAW (i.e., straight-out-of-camera) image. In this imagereconstruction problem, depending on the choice of EIs used in the exposure-multiplexed imaging, up to half of the RAW pixels might have no image data(Fig. 5.2). The reconstruction of this missing data is an underdetermined inverseproblem. This problem has similarities to other inverse imaging problems suchas super-resolution (SR), demosaicking and inpainting, in that the input imagehas missing pixels, but none of these methods would directly fit our exposure-multiplexed imaging problem. For exposure-multiplexed imaging, several methodshave looked into recovering the lost resolution: Magic Lantern an open sourcecommunity has developed tools to take advantage of the exposure-multiplexedimaging capability on certain Canon cameras [2], but they use a simple edge-guided interpolation only. Other methods apply local statistical properties forspatial interpolation [53]. We observe the similarities between our problem andthe SR and adapt [120].5.3 Forward model: image formationThe forward model shown in Figure 5.2 describes the optical pipeline (seeSection 3.3.1 for a general discussion). The camera observes a standard dynamicrange (SDR) image we denote by g. Since light is linear, a system of linear equationscan present this transformation. We derive this transformation below.We assume that the unknown latent HDR image f has three color channels:RGB, and n pixels (i.e., a total of 3n pixels). The captured image g is a Bayerimage (Figure 5.3) with n pixels, each pixel containing single channel data (one ofred, green and blue).106(a) Conventional Bayer pattern (b) exposure-index multiplexedvuFigure 5.3: For color imaging, a color filter array is placed on top ofa sensor; for high dynamic range (HDR) color image, we modify thebasic color filter array. Left: most common color filter array (CFA)pattern is known as the Bayer pattern [14]. Bayer pattern repeats a 2 × 2“RGGB” pattern over the entire sensor. Right: exposure-multiplexedimaging modifies the sensitivity by row-pairs of pixels. We show the highexposure-index (EI) with brighter colors, and low with dimmer colors. Itmay be noted that Since the Bayer pattern has a basic block size of 2× 2,EI is alternated only every two rows. The pixels u and v denotes twogreen neighboring pixels but different sensitivities (EIs). Since naturalimages are mostly smooth, it is likely that both of these pixels receivethe same incident light intensity. We use all such {u, v} pairs to computea polynomial estimate of the intensity-response curve r using a robustpolynomial fitting.First, for the ease of derivation, we vectorize these images (see Section 3.4.1for details). The vectorized unknown HDR image contains 3n positive real numbersf ∈ R3n. Similarly, the captured image has n known values between 0 and 1 (wenormalize all captured images so that the clipping level is 1), and therefore g ∈[0, 1]n. Then the forward image formation process is a linear map from unknown3n real numbers of f to the known n numbers of g.107In sensitivity-multiplexing (or more generally, in exposure-multiplexing), theconventional imaging process remains mostly unchanged. The only modification ishow sensor EI is set: row-pairs alternate between a low EI and a high EI (Figure 5.2).We briefly describe this image formation model to introduce the notations.a) Antialiasing. A low-pass spatial filter is necessary to remove aliasingartifacts in the captured image. Most image sensors implement this filteringby optically diffusing the incident image just before it hits the sensor.Effectively, We model this filter as a small Gaussian blur ρ.b) Color filtering. A color sensor has a red-green-blue (RGB) CFApasted ontop. As a result, each pixel can only observe one of the three colors. Themost common pattern of these three colors is called Bayer pattern [14](Figure 5.3a). RGB image on to the single-channel Bayer image as M : 3n→n which is a mostly sparse 3n × n matrix with only one 1 per row—a colorchannel selector.c) Noise. Multiple physical and electronic processes cause the noise in imageacquisition. For SDR cameras, we can assume that the noise is additive whiteGaussian noise (AWGN), i.e., is distributed as a zero-mean Gaussian withvariance σ2: η ∼ N (0, σ2).d) Intensity-response. We denote the mapping from input intensity to outputreadout level by r : R+ → [0, 1]. This response is almost linear. It onlydiffers from the ideal linear response on either end of the observable intensityrange: close to 0 and 1.e) Exposure-index (EI) or sensitivity. EI γ denotes the factor by which theanalog electronic signal from a pixel is boosted before converting it to digital.Sensitivity-multiplexing (or more generally exposure-multiplexing) needs totake into account the basic block structure of the CFA(Figure 5.3b).f) Sensor saturation. Sensor pixel electronics have a physical capacity thatlimits the maximum brightness a sensor pixel can observe. Beyond this levelthe brightness signal is clipped at that maximum. Without loss of generality.,108we set this highest value to 1. Then sensor clipping is simply an operatormin(1, ·).The forward image formation process is therefore,g→ captured SDR image scaled to 0 . . . 1= min→ clipping due to saturation(1→ clipping level set to 1, r→ intensity-response curve(γ→ exposure indexM→ color filter array on the sensor(ρ→ anti-aliasing filter⊗→ convolutionf→ latent HDR image) + η→ noise)) (5.1)where ⊗ denotes convolution. Since saturated pixels provide no information, weignore these pixels11. Rearranging (5.1),g1 ≡ 1γr−1 (g){→ γ is known, and we estimate r in Section 5.3.1= M (ρ⊗ f) + η{→We solve for f in Section 5.4. [g < 1] (5.2)Most terms are known in this equation: g is the known captured image, per-pixelsensitivity γ is preset by the user before capturing the photo, the color map M is afixed color filter array which is known, ρ is fixed for every sensor type and can bemeasured. The only unknowns are the unknown HDR image f , the response curver and the noise term η. We estimate the response curve r from the single image gin Section 5.3.1 next.11when we ignore a pixel we also ignore the corresponding row in the matrix that represents theforward model109(a) Before correction (b) After correctionFigure 5.4: Preprocessing. (a) an unprocessed image. Artifacts due tosensitivity-multiplexing clearly shows up in the insets as horizontal lines.More accurately, the response curve r is nonlinear; but high-EI rows havebeen scaled or “boosted” (linearly) via a divison by γ. (It may be notedthat the low-EI pixels have a low sensitivity, and therefore these havea low brightness in the captured image, compared to the nearby high-EI pixels capturing parts of the same object as the low-EI pixels. Thedivision by γ should bring the two sets of pixels at the same brightness(and the horizontal bars that are visible in (a) should vanish. But due tothe nonlinearlity, a simple divison is not enough, that is why the lines arevisible.) So these boosted low-EI rows do not match the high-EI rows(as predicted by equation 5.3). Insets show the mismatched rows causingstrong artifacts. (b) The preprocessed image. Low-EI rows have been(linearly) boosted and then the estimated non-linear response function hasbeen applied. The insets clearly show that the high-EI and the low-EI areasnow match.1105.3.1 Estimating the response curve rIt is possible to estimate r by separately calibrating the camera response curves foreach sensitivity or exposure-index (EI). Instead of estimating the response curve rfor every possible exposure setting, we use a random sample consensus (RANSAC)-like iterative polynomial-fitting to calibrate the intensity mapping r from the singleinput image.We perform this calibration once per each color channel. For each color, wegather all pairs of neighboring pixel with different EI sensitivity and no clipping.Let, for one pair of nearby pixels with the same color, the original anti-aliasedintensities are u and v (Figure 5.3b), and let the EIs are 1 and γ > 1 respectively.We assume that u does not change much since the response curve is mostly linear.Then according to the forward model, the observed values are r (u) ≈ u and r (γv)ignoring noise and clipping. However, because the response curve r is not linear,r (γv) /γ 6= (u) . (5.3)An example is shown in Figure 5.4, on the left. We know from the sparse gradientsprior [80, 106] that natural images are piecewise smooth [80, 106], i.e., naturalimages are for the most part smooth, except at sparse sharp edges in the scene.Therefore, except for a few outliers, for most u, v pairs, the following will hold,u ≈ v. (5.4)We then seek the a fifth order polynomial to model r to solve (5.3) and (5.4).We perform a robust RANSAC-style polynomial fitting for obtaining the estimatedcurve. We initialize the curve by fitting all points. After this initialization, wereject more and more outliers in each iteration: we reduce sigma and recomputethe curve until at least a large enough part (25% in experiments) of the pairs arerejected. This ensures a fit that is robust against outliers, i.e., edges. We denote thescaled captured image as g1 (5.2). This simplifies the forward model in (5.1),g1 = M (ρ⊗ f) + η, [g < 1] (5.5)111uv(a) Initialization (b) 2 iterations (c) Final mapuvuvFigure 5.5: Preprocessing: corrections for the nonlinearity in the responsecurve using RANSAC polynomial fitting. We initialize it the polynomialfitting using all points. This means that a wide dispersion is allowedinitially, admitting the outliers which we subsequently reject iteratively. Ineach iteration, we reduce the allowed dispersion, find the accepted pointsthat are within the allowed dispersion, and refit the polynomial with onlythese accepted points (b). We iterate until 25% points are rejected (c).This simplification has reduced the problem now to a missing data problem—the EI variations have been taken out of the equation. In Section 5.4 nextwe formulate a convex optimization problem to solve the simplified forwardmodel (5.5).5.4 Formulation of the high dynamic range (HDR)reconstruction problemThe simplified forward model has data missing from many pixels due to over-exposure (i.e., clipping in the highlights) or under-exposure (i.e., noise in theshades) of image sensors (Figure 5.2e). The crux of our HDR imaging methodis reconstructing the information that is missing; to obtain the full-resolution HDRimage, we need to reconstruct the data that is missing in g1.112We first restore the green channel of g1 from the previous section. This isbecause it is more densely sampled by the Bayer filter [14], it is less hard to restoreusing single-channel data. We then use the restored green channel to guide theedge-preserving interpolation in the other two channels using the super-resolution(SR) method we presented in Section 5.6.3 (Figure 5.12). Below, we first formulateHDR reconstruction as a global optimization problem and then give our algorithm.Our algorithm relies on our smooth contour prior, which we present in Section 5.6.The image formation model (5.5) gives the forward model. In order to estimatethe unknown latent HDR image f , we perform a maximum-a-posteriori (MAP)estimation in the same fashion as the SR method described in Section 5.6.3: weminimize noise η ∼ N (0, σ2), (i.e., underdetermined data-fitting) such that thefollowing image priors (for well-posedness) are satisfied:a) natural image gradients are sparse (the “sparse gradient prior” [121]), andb) a natural image prior that imposes the following property in the reconstructedimage: for some image operator D, ‖f − D(f)‖ should be minimized.This gives our formulation of the inverse problem,arg minf12σ2‖g1 −M (ρ⊗ f)‖2︸ ︷︷ ︸data-fitting, g < 1+λTV ‖∇f‖TV︸ ︷︷ ︸sparse gradient+λD ‖f − D(f)‖2︸ ︷︷ ︸image prior(5.6)where ‖ · ‖TV is the total variation (TV) norm and λTV and λD are proportionalityconstants.For the image prior ‖f − D(f)‖, we explore the following two priors:• In Section 5.5 we discuss using a non-local self-similarity image prior. Thiscould be implemented via a number of methods. We choose the block-matching and 3D filtering (BM3D) denoiser to implement the non-local self-similarity constraints.• In Section 5.6 we propose and use a novel image prior we call the smoothcontour prior. The prior employs an edge-aware interpolation method113known as edge-directed interpolation (EDI). We discuss EDI, propose a fastalgorithm to compute EDI, and verify the performance or our smooth contourprior by applying it in the problem of SR. In Section 5.7, we adapt EDI forour high dynamic range imaging (HDRI) problem and show results.5.5 A non-local self-similarity image priorWe present an image prior based on the state-of-the-art denoising algorithm block-matching and 3D filtering (BM3D) [21, 24]. As outlined above, we define the prioras the `2 difference between an image and its BM3D denoised version.The maximum-a-posteriori (MAP) estimate of f can be obtained via theminimization of an energy function composed of two parts: an ill-posed data-fittingterm corresponding to the forward model, and prior terms for well-posedness. Thedata-fitting term follows directly from (5.26) while for prior terms we apply thesparse gradient prior and our proposed smooth contour prior,minf‖g − (ρ⊗ f)|Γ‖2︸ ︷︷ ︸data-fitting+λTV ‖∇f‖TV︸ ︷︷ ︸sparse gradient+λB2‖f − B (f)‖2︸ ︷︷ ︸BM3D, (5.7)where λTV and λB are regularization weights, ‖ · ‖TV is the total variation (TV)norm, and B : Γ→ Ω is the BM3D denoiser [21, 24].5.5.1 Global optimization for reconstructionWe use the primal-dual convex optimization method [18] to solve the convexoptimization problem (5.7). The primal-dual method is more efficient in optimizingfor the various norms involved. This iterative method uses general projectionoperators, known as the proximity operators and denoted by “prox”, to iterativelyconverge to the optimum solution. In order to derive the primal-dual convexoptimization problem, we first rewrite our problem in the primal-dual form. We114then develop the corresponding algorithm from [18] and derive the proximityoperators employed by this algorithm.We first rewrite our convex optimization problem (5.7) asminfG(f) + FTV(KTVf) + FE(KBf), (5.8)where the data-fitting term is captured by G(·) defined asG(f) ≡ ‖g − (ρ⊗ f)|Γ‖2 , (5.9)KTV and KB are linear transformations from the primal (image) domain torespective dual domains defined asKTVf ≡ λTV∇f (5.10)KBf ≡ f − B (f |Γ) , (5.11)and FTV and FB are functions defined asFTV ≡ ‖·‖TV (5.12)FB ≡ λB2‖·‖2 . (5.13)Although (5.8) is convex, solving it directly is not easy because of the mixed `1 and`2 norms. Instead, we solve the corresponding primal-dual saddle point problemwhich is an equivalent problem and has the same optimum solution,minfG(f) + maxyTV〈KTVf, yTV〉 − F∗TV(yTV)+ maxyB〈KBf, yB〉 − F∗E(yB), (5.14)where ·∗ denotes the convex conjugate of a function, and yTV and yB are slackvariables defined over respective dual domains. In order to solve (5.14), we havedeveloped an algorithm, based on [18], which is presented in Algorithm 5.1.115Algorithm 5.1 Single-image high dynamic range imaging (HDRI) using the BM3DpriorRequire: σ > 0, τ > 0 and θ. f¯ (0) = f (0) = B(g), y(0)TV = 0, y(0)B = 0.1: repeat2: y(k+1)TV ← proxσF∗TV(y(k)TV + σKTVf¯(k))3: y(k+1)B ← proxσF∗E(y(k)B + σKBf¯(k))4: f (k+1) ← proxτG(f (k) − τ(KTTVy(k+1)TV +KTB y(k+1)B))5: f¯ (k+1) ← f (k+1) + θ(f (k+1) − f (k)).6: until convergenceIn Algorithm 5.1, σ > 0, τ > 0 and θ are parameters of the algorithm thatdetermines the step sizes for the iterates.5.5.2 DiscussionsSince BM3D uses self-similarity, running BM3D every iteration becomes costly;making this prior unfeasible for everyday imagery. This brings up to the smoothcontour prior that we propose next.5.6 A fast image prior: smooth contour priorBesides the non-local self-similarity prior discussed in Section 5.5, we havedeveloped an image prior function for solving the high dynamic range (HDR)reconstruction problem (5.6). This prior is the smooth contour prior. This priorprimarily relies on edge-directed interpolation (EDI). In this section, we derive thesmooth contour prior and demonstrate its performance using super-resolution (SR).Our discussion on HDRI using the smooth contour prior resumes in next Section 5.7.In this section we derive the smooth contour prior. In Section 5.6.1 weprovide a rationale behind our choice of the edge-aware anisotropic filter. Then116in Section 5.6.2 we provide a review of the EDI algorithm. We have developed anorder of magnitude faster algorithm for computing EDI, which makes it possibleto make an image prior based on it. Since EDI is an upsampling operator, inSection 5.6.3 we derive a SR method using this image prior first. Our discussionon HDRI then continues in Section 5.7: we propose an extension of this prior forexposure-multiplexed imaging, and propose our solution to the single-image HDRIproblem using this our proposed novel image prior: smooth contour prior.5.6.1 Intuition behind the smooth contour priorThe proposed smooth contour prior term promotes the reconstructed image edges tobe smooth along the natural contours of edges, and thus it complements the sparsegradient prior. The edge-directed interpolation (EDI) operator E produces smoothcontour, i.e., for some high-resolution (HR) estimate f∗ with smooth contour,E(f∗|Γ) is expected to produce the same image f∗. In other words, the smoothcontour prior term penalizes reconstructed edges that are not smooth along theircontours. Since, in contrast, the sparse gradient prior improves sharpness acrossedges, these two priors complement each other. A simple test case in Figure 5.9demonstrates that the reconstruction quality when both of the priors are applied isbetter than that when either of them is applied separately.Our choice of the anisotropic interpolation operator is made based on thefollowing criteria:1. Local calculations: Since non-local methods employ a search for (patch orother) similarities in the image, the runtime complexity tends to be muchhigher than that of local methods. Although non-local methods are betterfor denoising [21] and similar image reconstruction problems [59], localmethods produce comparable results for super-resolution (SR) [43]. Wetherefore choose a local method for speed without sacrificing the quality.2. Direct estimation of anisotropic interpolation coefficients: Methods thatdepend on explicit edge detection [140] are prone to discontinuity artifacts117in case of edge mis-estimation. Our proposed method can directly estimatethe anisotropic interpolation coefficients. In difficult cases (e.g., when ourmethod is not confident about a strong local edge) our method gracefullyfalls back to bilinear interpolation and produces no noticeable discontinuity.3. Fast implementation: Anisotropic interpolation methods are generallysliding-window algorithms. This means that the accuracy of such methods(such as the bilateral filter methods [35]) depends on the amount of dataavailable, i.e., the window size; but a larger window usually results in ahigher run-time. Our proposed method has a run-time that is independentof the window size, which makes it much faster than any other local method.Our proposed fast edge-directed interpolation technique, which we refer to asEDI, meets all of these criteria.5.6.2 Fast edge-directed interpolation (EDI)In this section, we present a fast EDI operator, the building block of our proposedsmooth contour prior. Our proposed method is based on the original method dueto Li and Orchard [84], but is more stable and much faster. The improved stabilityis due to the regularized regression (5.21), and the speedup is from the proposedfast two-pass calculations using summed column tables (5.25).EDI copies the observed pixel values and estimates the unobserved values fromthe known neighbors using anisotropic interpolation. EDI derives the anisotropicinterpolation coefficients via sliding-window linear least-squares regressions.The EDI operator E : Γ → Ω takes as input some partial observations g, andproduces a 2× 2 edge-aware anisotropic-upsampled image fˆ ,fˆ ≡ E(g). (5.15)118ti(r + a, c−b− 1)ri(r, c)=ri(r, c− 1)− + + −ti(r − a−1, c− b− 1)ti(r − a−1, c+ b)ti(r+a, c+b)Figure 5.6: An illustration of the recurrence that lets us compute the area-sums incrementally and very fast. We calculate ri(r, c) (the sum of allvalues inside the red box at (r, c)) from ri(r, c− 1) (the sum of all valuesinside the red box at (r, c − 1)) by adding and subtracting precomputedpartial column sums ti, marked above with the red rectangles.Since pixel values are already known over Γ, EDI obtains these values directly fromthe input (Figure 5.7(a)),fˆ∣∣∣Γ= g. (5.16)For the unknown pixels in Ω−Γ, EDI runs the same anisotropic filtering algorithmtwice (Figure 5.7). The first stage estimates pixels in Ω1. In the second stage, theoutput of the first stage, i.e., values at pixels in Γ ∪ Ω1 rotated by 45◦ are inputback to the same algorithm which produces estimates of pixels in Ω2. Therefore,without loss of generality, we limit our discussion to one stage.EDI estimates each unobserved pixel separately. Let p ≡ (r, c) ∈ Ω1 denotethe current pixel to estimate the value of. Let p denote the list of neighbors of pto interpolate from, where || = 4 and ip ∈ Γ, 1 ≤ i ≤ 4. In particular, 1pdenotes the top-left neighbor (the closest known pixel to the top-left of p), 2p thetop-right neighbor, and so on. Then we obtain an estimate of the unobserved pixelfˆ(p) via a weighted interpolation of its neighbors [84],fˆ(p) =∑||i=1 αi(p)fˆ(i(p)), (5.17)where α(p) ∈ R|| are the edge-aware anisotropic interpolation coefficients. Wedetermine α(p) by a linear least-squares regression over a window of size (2a +119Ω1Pixels estimated inEDI stage 1ΓPixels withobserved values Ω2Pixels estimated in EDIstage 2Γ ∪ Ω1Pixels known after EDIstage 145◦Figure 5.7: An illustration of the two stages of EDI. The input low-resolution (LR) image pixels contribute to one-fourth of the target high-resolution (HR) image pixel grid Γ. The rest of the pixels are calculatedin two stages: First, the pixels in Ω1 are estimated via an anisotropicinterpolation of the diagonal neighbors. And second, half of the pixelsare known Γ ∪ Ω1, from which the remaining pixels Ω2 are estimated.The second stage is algorithmically identical to the first rotated by 45◦.1)× (2b+ 1) centered around p denoted by W (p),α(p) = R(p)−1r(p), (5.18)where R ∈ R||×|| is a 4× 4 matrix defined asRij(p) ≡∑q∈W (p) fˆ(iq)fˆ(jq) (5.19)and r ∈ R|| is a 4-vector defined asri(p) ≡∑q∈W (p) fˆ(iq)fˆ(q). (5.20)In order to make the regression more stable, we propose to perform regularizedregression instead of (5.18),αˆ(p) = (R(p) + µI)−1(r(p) + µ||−1), (5.21)where || = 4, and µ is a regularization parameter.120Since (5.21) is a small 4 × 4 linear system, the overall complexity is O(N),where N = hw is the number of pixels in the image. The main bottleneck is(5.19) and (5.20): when implemented in a straightforward manner, the complexityis O(N |W |). In contrast, our proposed algorithm has a time complexity of O(N),i.e., independent of the window size, which allows us to use a large window sizefor obtaining accurate estimates of the interpolation coefficients α.We observe that (5.19) and (5.20) are sums over partially-overlapping (sliding)windows, and therefore partial sums can speed up the process. Without loss ofgenerality, we show calculations for ri(p), 1 ≤ i ≤ 4. Rij can be computed ina similar fashion. We use two O(N) passes: first we obtain an intermediate datastructure, the summed column table ti, and then we calculate ri, for all pixels inΩ1.In the first pass, we precompute an intermediate summed column table ti ofsize h×w to hold partial columnwise sums where every element is a sum of all the(known) products corresponding to that location and above it, and we get,ti(r, c) =∑1≤j≤r,(j,c)∈Γfˆ(i(j, c))fˆ(j, c). (5.22)We obtain the O(N) complexity by computing each element of t incrementally as,ti(r, c) =0, if r ≤ 0 or c ≤ 0ti(r − 1, c), if (r, c) /∈ Γti(r − 1, c) + fˆ(i(r, c))fˆ(r, c), otherwise.(5.23)In the second pass, we process pixels in a row-major order to obtain ri. Toachieve the O(N) overall complexity, we reorganize the terms in (5.20) to obtaineach element ri(r, c) with a constant number of operations: from the previouslycomputed element ri(r, c− 1) and conjunctions and disjunctions of partial column121sums (Figure 5.6) asri(r, c) ≡r+a∑j=r−ac+b∑k=c−bg(i(j, k))g(j, k) (5.24)=ri(r, c− 1)− ti(r + a, c− b− 1)+ ti(r − a− 1, c− b− 1)+ ti(r + a, c+ b)− ti(r − a− 1, c+ b). (5.25)5.6.3 Using edge-directed interpolation (EDI) in the smooth contourprior for super-resolution (SR)A super-resolution (SR) method takes a low-resolution (LR) image as input andproduces a high-resolution (HR) image. This is an underdetermined inverseproblem because the input LR image does not contain the full HR imageinformation. The missing information is crucial in making the HR image looksharp. In order to reconstruct the unknown HR image from the input LR image,SR methods must therefore fill in the missing information, using prior knowledge.It is well-known that SR involves three tasks: upsampling, deconvolution, anddenoising. The upsampling task uses the input LR image data to form the targetHR image. However, the upsampled image appears blurry because the upsamplingstep does not account for the optical anti-aliasing employed by cameras. This anti-aliasing is implemented via an optical low-pass filter—an optical element such asa diffuser is introduced on the optical path. This filter slightly blurs the signalincident on the camera’s image sensor and suppresses the spatial frequencies abovethe Nyquist limit. Another reason for blur is that the camera optics may notbe perfectly focused at the imaged target. The deconvolution step accounts forthis anti-aliasing blur when restoring the high spatial frequencies in the capturedimage. Since these high spatial frequencies have been suppressed during capture,122the signal-to-noise ratio tends to be poor in the high spatial frequencies, and as aresult the deconvolution step ends up enhancing the high-spatial-frequency noise.A denoising task can reduce this noise. While these three tasks can be appliedsuccessively to solve the SR problem, a better alternative is a global optimizationapproach that addresses all of these three aspects at the same time.We argue that an edge-aware anisotropic filtering component is desirable forSR. Many existing SR techniques use an isotropic upsampling component, such as abilinear or bicubic resampler. However, since isotropic upsampling is essentially aconvolution with a sampling kernel, such techniques introduce additional blur to theupsampled image on top of the anti-aliasing blur already present. This additionalblur makes it harder to deconvolve the image. On the contrary, an edge-awareanisotropic filter can reduce the blurring of strong edges in the image, which makesSR a better-posed problem.The strength of our proposed method lies in its use of a prior that assumes thatimage edges are smooth along their contours. We call this prior a smooth contourprior. This prior is fast to compute. We employ this prior in conjunction with thewidely-used sparse gradient prior. The sparse gradient prior is applicable to manyinverse problems in imaging, including SR. The sparse gradient prior assumes thatnatural images are piecewise smooth and thus prefers sharp spatial edges. However,it does not explicitly model the edges to be smooth (not jaggy) along their contours.We find that our proposed smooth contour prior complements the sparse gradientprior and helps to reconstruct the unknown pixel values by interpolating along thecontours of strong spatial edges. This results in a strong combined prior. As aresult, edges reconstructed with our proposed method are both sharp and smoothalong their contours.The proposed smooth contour prior uses an edge-directed interpolationoperator as the main building block. While any edge-aware anisotropic filter woulddo, we propose a new method which improves New Edge Directed Interpolation(NEDI) [84] in terms of speed and stability. This improvement directly translatesinto a reduction of the time complexity of our proposed smooth contour prior. Sinceedge-preserving methods estimate scene edge-directions explicitly or implicitly123at every pixel, these methods use sliding windows, i.e, they process a windowaround every pixel. Larger windows provide more accurate estimates but generallyrequire more processing. The implementation of our proposed fast edge-directedinterpolation (EDI) method (and therefore the evaluation of our proposed smoothcontour prior) has a time-complexity that is linear with the number of image pixelsand is independent of window size.We formulate the SR problem as a convex optimization problem, and for thesolution we develop an efficient algorithm based on the primal-dual optimizationframework of Chambolle and Pock [18]. The primal-dual convex optimizationmethods are not only efficient, but are also easy to implement and have goodconvergence properties. We demonstrate the performance of our algorithm on anumber of images. To summarize, our contributions are:• A smooth contour prior which enforces smoothness along the contours of theimage edges and is complementary to the sparse gradient prior.• A fast edge-directed interpolation operator which applies the smooth contourprior with a time complexity that is linear in number of image pixels andindependent of the window size (Figure 5.8).• A primal-dual optimization algorithm incorporating the smooth contourprior; this algorithm produces results that are better than the state-of-the-art conventional methods and on par with the very recent methods based onconvolutional neural networks.5.6.3.1 Previous work on image super-resolution (SR)Classical super-resolution (SR) approaches such as the bilinear and bicubic [73] re-sampling methods reconstruct the unobserved pixels via isotropic interpolation, andas a result these methods produce blurry high-resolution (HR) edges. More recentedge-aware anisotropic filtering approaches perform interpolation along spatialedges so that the strong edges do not appear blurry in the HR image. Wang andWard [140] used an explicit per-pixel estimation of the angle of the local isophote orthe equi-intensity contours. In contrast, other techniques such as the edge-directed124Figure 5.8: A comparison of results and timings of our SR approachwith sparse gradient prior, smooth contour prior and the combination ofboth priors. The SR reconstruction with the well-known sparse gradientprior looks jaggy and over-sharpened. Our fast edge-directed interpolation(EDI) algorithm can compute an HR image very fast (∼ 0.2sec) butshows a few artifacts. Our proposed combination of the two producesresults with the highest peak signal-to-noise ratio (SNR) with a small timeoverhead added to the sparse gradient prior reconstruction. PSNR in dBand computation times in seconds are shown.interpolation techniques [84, 129] make statistical estimations of the dominantdirections of spatial edges in the neighborhood. However, since these anisotropicinterpolation methods do not account for the spatial band limit of the low-resolution(LR) image signal, the resulting images appear slightly out of focus.Edges provide strong visual cues, therefore it is important that SR methodsrestore the sharpness of edges. Most approaches focus on the edge structure“across” natural image edges (i.e., not “along” their natural contours), and asa result they perform the edge-aware reconstruction only indirectly. Farsiuet al. [35] combined the bilateral filter with the sparse gradient prior. Recently,125Venkataraman et al. [139] used the Bilateral filter to regularize their multiviewSR algorithm. Markov random fields (MRF) are also known to preserve edges inreconstruction [107, 144]. Dai et al. [23] used a soft edge prior for alpha-matte SR.Their method obtains soft edge reconstructions but gives up sharpness to obtainedge smoothness along the alpha matte cut-edges. In very specific cases, suchas building facades and other man-made objects showing a repetitive structure,Fernandez-Granda and Candes [38] apply a global transform to axis-align allscene edges so that the sparse gradient prior can be applied without producingjaggies. This method is not applicable to natural scenes with scene edges inrandom configurations; however, it asserts that the detection of dominant edgedirections can help the sparse gradient prior, which is otherwise oblivious of theedge directions. This further justifies our claim that our proposed smooth contourprior works in a complementary fashion to the sparse gradient prior.Example-based, or more generally learning-based, methods build an implicitprior knowledge base from preprocessed training examples. These methods aimat learning image-patch-based SR rules. Early work such as that by Freemanet al. [44] used nearest neighbors search for looking up similar LR-HR examplesfor reconstruction. Later work aimed at leveraging various sources of sparsity inthe data. Methods such as by Yang et al. [146, 147] and He et al. [58] employsparse coding and simultaneous LR-HR dictionary learning. More recent relatedwork investigated improved nearest-neighbor strategies, and machine learning (ML)techniques in general [74, 77]. The performance of such ML-based techniques willalways depend on the training on the previously seen or learned examples. RecentlyZhu et al. [153] at least partly overcame this limitation by introducing a deformationmodel that allows patches to be deformed so that the learned dictionary can bemore expressive. Since a map from HR patches to LR patches is many-to-one, asuccessful ML technique still requires local image priors to avoid high-frequencyartifacts. SR methods using deep convolutional neural networks [30, 133] are insome cases able to produce results that are a little better than ours but require a hugeamount of data for training as opposed to our method which requires no trainingand is therefore applicable to problems where training data is hard to obtain.126Figure 5.9: A demonstration of the combined effect of the smooth contourprior and the sparse gradient prior with a Siemens star chart. Enlargedparts of each image are shown are in the insets. Bicubic citekeys1981cubicupsampled edges show jaggy artifact, that are missing in the edge-directedinterpolation (EDI)-upsampled image. As EDI only performs upsamplingand not deconvolution, its resulting edges are blurry, as seen by comparingthe orange insets. Reconstruction with the sparse gradient prior restoresthe sharp edges comparably to the ground truth. However, the bottom-left corner of the green insets demonstrate that the sparse gradientreconstruction fails to restore high spatial frequencies around the centerof the star chart) and this is only as good as the bicubic upsampling resultslightly sharpened. On the other hand, EDI can infer higher frequenciesbecause the interpolation is edge-aware. It is evident from the insetsthat our method combines the strengths of the both approaches discussedabove.Recently, non-local self-similarity has been proven to be a powerful naturalimage prior for denoising [22] and similar image reconstruction problems [59]. Afew recent work have leveraged local self-examples as the source of the training127data for SR. Glasner et al. [47] utilized cross-scale self-similarities on an imagepyramid. Zhang et al. [151] combined non-local means (NLM) and steerable kernelsin order to leverage both non-local self-similarity and edge preservation. Themethod by He et al. [57] can be loosely described as a non-local version of EDIfollowed by a deconvolution: they perform a Gaussian process regression to minethe structural similarities across 3 × 3 patches. However, for SR, local methodsare expected to perform just as well as the more expensive non-local methods:Freedman and Fattal [43] and more recently Yang et al. [148] have shown that self-examples from the exact same location of an image patch but from a different scaleof the same image can produce plausible SR reconstructions. An intuition behindthis finding is that, while non-local approaches can potentially strengthen thedenoising component of an SR solution, an edge-directed upsampling componentcan reduce noise at the source by avoiding additional blurring due to isotropicresampling. This has motivated the proposed smooth contour prior with theproposed fast EDI.5.6.3.2 Single-image super-resolutionWe propose a new method for super-resolution (SR). Our method takes a low-resolution (LR) image as input and produces a high-resolution (HR) image. In thefollowing discussion, we first describe the forward model: how an LR image isformed from the latent HR image. We invert the forward model in order to find theunobserved HR image from the input LR image. This inverse problem, under themaximum-a-posteriori (MAP) sense, becomes a convex optimization problem. Wesolve this convex optimization problem with the primal-dual method [18]. Thisinverse problem is underdetermined, and we make it well-posed by using twocomplementary priors: the well-known sparse gradient prior, and a novel imageprior that we propose, namely, the smooth contour prior. We discuss the intuitionbehind this proposed smooth contour prior. At the core of this novel prior is ourproposed fast edge-directed interpolation (EDI) algorithm we develop later in thissection.128Let f denote the unknown latent HR image, defined over the set of pixels whoselocations span Ω ≡ [1, h]× [1, w]. Let g denote the (partially observed) LR image,where the observed pixels are centered at image locations Γ ⊂ Ω.We follow the standard LR image formation model, e.g., by Ng and Bose [104].We assume that the observed image has been properly anti-aliased before it wascaptured, i.e., the spatial signal of the observed image has been band-limited viaa known low-pass filter ρ so as to reduce aliasing in the captured data. We alsoassume that the observed image has been corrupted by independent and identicallydistributed (i.i.d.) additive white Gaussian noise (AWGN) η,g = (ρ⊗ f)|Γ + η, (5.26)where ⊗ denotes two-dimensional convolution, and ·|Γ denotes the restrictionoperator that selects the subset Γ.The forward model (5.26) above describes the relationship between the LRimage g and the latent HR image f . In order to estimate the latent HR image, wesolve (5.26) for f . Since the forward model constitutes convolution, downsampling,and corruption by noise, the inverse problem involves denoising, upsampling, anddeconvolution.The MAP estimate of f can be obtained via the minimization of an energyfunction composed of two parts: an ill-posed data-fitting term corresponding to theforward model, and prior terms for well-posedness. The data-fitting term followsdirectly from (5.26) while for prior terms we apply the sparse gradient prior andour proposed smooth contour prior,arg minf‖g − (ρ⊗ f)|Γ‖2︸ ︷︷ ︸data-fitting+λTV ‖∇f‖TV︸ ︷︷ ︸sparse gradient+λE2‖f − E (f |Γ)‖2︸ ︷︷ ︸smooth contour, (5.27)where λTV and λE are regularization weights, ‖ · ‖TV is total variation (TV) norm,and E : Γ→ Ω is an edge-aware anisotropic filter.1295.6.3.3 Global optimization for reconstructionWe use the primal-dual method [18] to solve the convex optimizationproblem (5.27). The primal-dual method is more efficient in optimizing for thevarious norms involved. This iterative method uses general projection operators,known as the proximity operators and denoted by “prox”, to iteratively converge tothe optimum solution. In order to derive the primal-dual algorithm, we first rewriteour problem in the primal-dual form. We then develop the corresponding algorithmfrom [18] and derive the proximity operators employed by this algorithm.We first rewrite our convex optimization problem (5.27) asminfG(f) + FTV(KTVf) + FE(KEf), (5.28)where the data-fitting term is captured by G(·) defined asG(f) ≡ ‖g − (ρ⊗ f)|Γ‖2 , (5.29)KTV and KE are linear transformations from the primal (image) domain torespective dual domains defined asKTVf ≡ λTV∇f (5.30)KEf ≡ f − E (f |Γ) , (5.31)and FTV and FE are functions defined asFTV ≡ ‖·‖TV (5.32)FE ≡ λE2‖·‖2 . (5.33)Although (5.28) is convex, solving it directly is not easy because of the mixed`1 and `2 norms. Instead, we solve the corresponding primal-dual saddle point130problem which is an equivalent problem and has the same optimum solution,minfG(f) + maxyTV〈KTVf, yTV〉 − F∗TV(yTV)+ maxyE〈KEf, yE〉 − F∗E(yE), (5.34)where ·∗ denotes the convex conjugate of a function, and yTV and yE are slackvariables defined over respective dual domains. In order to solve (5.34), we havedeveloped an algorithm, based on [18], which is presented in Algorithm 5.2.Algorithm 5.2 SR using the smooth contour priorRequire: σ > 0, τ > 0 and θ. f¯ (0) = f (0) = E(g), y(0)TV = 0, y(0)E = 0.1: repeat2: y(k+1)TV ← proxσF∗TV(y(k)TV + σKTVf¯(k))3: y(k+1)E ← proxσF∗E(y(k)E + σKE f¯(k))4: f (k+1) ← proxτG(f (k) − τ(KTTVy(k+1)TV +KTE y(k+1)E))5: f¯ (k+1) ← f (k+1) + θ(f (k+1) − f (k)).6: until convergenceIn Algorithm 5.2, σ > 0, τ > 0 and θ are parameters of the algorithm thatdetermines the step sizes for the iterates. Convergence is guaranteed [18] when: i)0 < θ < 1, and ii) στL2 < 1 where L is the operator norm of the combined primal-to-dual linear map, i.e., L =∥∥∥∥∥[KTVKE]∥∥∥∥∥op(here, ‖ · ‖op denotes the operatornorm). The proximity operator (prox) represents a generalized projection [18] onto a feasible set. The three proximity operators used in our algorithm are discussedbelow:a) proxτG, the proximity operator of the data fitting term, follows directly fromthe definition of proximity operators,proxτG (f0) = arg minf‖f − f0‖22τ+ G(f) (5.35)131= arg minf‖f − f0‖22τ+ ‖g − (ρ⊗ f)|Γ‖2 . (5.36)This is a linear least-squares minimization problem, which we solve usingthe conjugate gradient method.b) proxσ F∗TV , the proximity operator of the TV term, reduces to pointwiseshrinkage [18],proxσF∗TV(y0) =y0max(1, |y0|) . (5.37)c) proxσF∗E , the proximity operator of the convex conjugate function F∗E can bederived using Moreau’s Identity [18]. Moreau’s Identity relates the proximityoperator of a convex conjugate function (e.g. F∗E ) with the proximity operatorof the original function (e.g. FE ), and we get, proxσF∗E (·) in terms ofprox1/σFE (·),proxσF∗E (y0) ≡ y0 − σ prox1/σFE(y0σ), (5.38)where the proximity operator of the original function proxFE follows directlyfrom the definition, and we get,proxσF∗E (y0) ≡ y0 − σ(arg minyσ2∥∥∥y − y0σ∥∥∥2 + λE2‖y‖2)=λEσ + λEy0. (5.39)5.6.3.4 super-resolution (SR) resultsImplementation details. We have used 13 × 13 windows for estimating per-pixel edge-directed interpolation (EDI) weights. This EDI method performs 2 × 2upsampling. For other SR factors we re-apply our method: For 4× 4 SR, we applyour algorithm twice. For 3 × 3 SR, we compute the 4 × 4 super-resolved imageand bicubic-downsample it by a factor of 3/4. The default parameter values in ourimplementation are: µ = 0.001, σ = 0.6, θ = 0.9 and λTV = λE = 0.0025.132(b) 4x4(a) 2x2ground truthground truthOursOurs(a-1) (a-2)[77]*[77]*[57][57]35.13/0.9928 34.97/0.996132.25/0.985830.90/0.9088 29.83/0.8820 30.62/0.9063(b-2)(b-1)Figure 5.10: A few dyadic super-resolution (SR) experiments with ouralgorithm. Each row represents one data set. On each set, the first imageon the left is the ground-truth (GT), the second image is our SR result, thethird and the fourth images are results from baseline methods as citedbelow the image. The SR factor is listed under the ground truth image.peak signal-to-noise ratio (SNR) and structural similarity index (SSIM) arelisted under each image, the best numbers are shown in bold. Imagescourtesy of respective sources marked with a ’*’. Insets on the right arepresented in the same order as the full-size images. Insets are enlarged bya factor of 4 with point sampling.We have implemented EDI as a Matlab executable (MEX-file). It takes about0.24 seconds to compute a 512 × 512 upsampled image from an input 256 × 256133[148]*Oursground truth [43]*(c) 4x4(c-1) (c-2)31.49/0.9384 27.49/0.8911 31.34/0.9389ground truth(e) 2x2 26.54/0.8446 Ours 26.29/0.8428 26.42/0.8438[147] [58]*(e-1) (e-2)[30]* [133]*Oursground truth(d) 4x4(d-1) (d-2)33.12/0.9504 33.28/0.951332.95/0.9442Figure 5.10: A few dyadic super-resolution (SR) experiments with ouralgorithm (continued).13426.07/0.9552Ours(a) 3x3(b) 3x3ground truthOursground truth[23]*[84][146] [77]*24.17/92.1823.93/91.22(a-1)(b-2)(b-1)23.28/90.4126.43/0.9512 26.12/0.9473(a-2)Figure 5.11: A few nondyadic SR experiments with our algorithm.Nondyadic SR ratios are not natively supported by our method but can beimplemented by downsampling from SR with the smallest larger dyadicratio. This comes with the potential cost of a small performance loss,which is why in (b) above our method falls a little behind the baselinemethods despite producing plausible results. (The organization of theimages and insets is similar to Figure 5.10).image. This time complexity scales linearly with number of pixels as expected.We have implemented the primal-dual SR algorithm in Matlab. In most cases, fora 512 × 512 high-resolution (HR) target image, our Matlab implementation takesless than one minute to reach within 0.2 dB of the final solution, and takes less thanthree minutes to converge to the final solution. The most expensive operation is135the computation of proxτG, the proximity operator of the data-fitting term. Theexperiments were run on an Intel Core i7 1.9 GHz machine.Experiments and analysis. In order to validate the performance of our SR method,we have run our algorithm on a number of test images and compared our resultswith several state-of-the-art methods. Figure 5.10 shows a few results of dyadicSR, with two baseline results per test case. Figure 5.11 shows a few results onnondyadic SR. The supplemental material may be consulted for full resolutionimages and additional results and comparisons.Our method takes a reconstruction approach to SR, and therefore it does notproduce high-frequency artifacts as produced by nearest-neighbor search-basedmethods such as [57] shown in Figure 5.10(a-1).Methods based on convolutional neural networks such as [30, 133] produceslightly better results, although our result can reproduce details better in some casesas shown in 5.10(d-1).Since our method is EDI-based, jaggies are easy to avoid, particularly when adominant edge is present. This is evident from Figure 5.11(a). SoftCuts [23] hasproduced noticeable jaggies in 5.11(a-1) and 5.11(a-2) whereas both NEDI [84] andour method produced straight edge contours.When compared to methods based on self-examples, in Figure 5.10(c), we seethat all of [148], [43], and our method have been able to restore the strong edges,e.g., the outline of the face. The strength of the non-local methods is evident fromthe repetitive texture areas, such as the section of the woolen hat shown in inset(c-2). However, the dependence on self-similarity is a weakness as well; [43] hasturned the eyelashes into sharp features on the eyelids in inset (c-1), whereas ourmethod produces a plausible reconstruction free of such sharp artifacts. Our methodalso reconstructed the details of the eye and in the pupil better in (c-1).Our method produces results that are slightly better than the state-of-the-art method due to Kwon et al. [77]. We present three comparisons with [77];two dyadic cases in Figure 5.10, test cases (a) and (b), and a nondyadic case in136Figure 5.11(b). For test case 5.10(a) and 5.10(b), our result has better PSNR and/orSSIM, although it is hard to visually identify much difference from the resultspresented in [77]. In test case 5.10(e), our method also outperforms SR methodsthat are based on joint LR-HR dictionary learning [58, 146]. Since many HR imagepatches can explain a low-resolution (LR) image patch, strong local image priorssuch as our proposed combination of priors is needed for SR reconstruction.Limitations. In case of more than one locally dominant edge or no dominant edgeat all, e.g. a fine texture such as foliage or fur, our algorithm might not accuratelyestimate the anisotropic interpolation coefficients for upsampling as shown inFigure 5.10(c-2). The smooth contour prior falls back to bilinear interpolation toavoid discontinuity artfacts, and our method essentially becomes a sparse gradient-based reconstruction method in these image areas.An SR factor of larger than 2 × 2 is not natively supported by our methodsince EDI is designed for upsampling by 2 × 2. Other dyadic factors are alsopossible via successive application of our method, but nondyadic factors such as3× 3 involve a bicubic downsampling step in which our result may lose sharpness.Our 3 × 3 reconstruction in Figure 5.11(b) suffers a small 0.24 dB dip in PSNR,since we obtain the final result by downsampling our dyadic 4× 4 result.5.7 Adapting the smooth contour prior for single-imagehigh dynamic range imaging (HDRI)Now we consider the application of the smooth contour prior in single-image highdynamic range imaging (HDRI). Since super-resolution (SR) and single-imageHDRI has certain commonalities, it is possible to adapt the smooth contour prior.Figure 5.12 shows a high-level description of the proposed method. We startby obtaining an exposure-multiplexed RAW image from a camera. We then performthe full-resolution HDR reconstruction in two logical steps:137(a)(b)(c)(d)(e)(f)(g)(h)(i)exposure-multiplexed super-resolve [120]super-resolve [120]edgesedgesour restorationourrest.full res HDRreconstructionrestored Bayer imagefull resolutionHDR green channelfull resolution red and blueFigure 5.12: Flowchart of our proposed exposure-multiplexed HDRImethod. (a) The input exposure-multiplexed image has no data in manypixels due to clipping or noise (marked with black). We first restorethe missing pixel values in the green channel (b) using our algorithmand obtain all green pixel values (d). We use the edge structures fromthis restored green channel to calculate smooth contour prior’s localanisotropic filters and use those to reconstruct the red and blue channels(c). The resulting image (e) contains all red and blue pixel information onthe Bayer image. (d) and (e) combined constitutes the fully restored Bayerimage (f). Going from this image to the full resolution image is essentiallya demosaicking problem and we use a modification of the SR method fromSection 5.6.3 for this purpose. As in the restoration step, we first obtainthe full resolution green channel (g), then we use the edge structure in thischannel to guide the reconstruction of the other two channels (h). Thisgives us the reconstructed full resolution HDR image (i).a) We reconstruct the pixels that are missing because they are ever-exposed orunder-exposed. After this step we have the full resolution Bayer image.b) We demosaic the Bayer image [14] to obtain the full resolution RGB HDRimage.Although (5.6) is convex, solving it directly is not easy because of the mixed`1 and `2 norms. Instead, we solve the equivalent saddle point forms. First, let G(·)denote our data-fitting term,G(f) ≡ 12σ2‖g1 −M (ρ⊗ f)‖2 (5.40)138? ?(a) Regular sampling (b) Exposure-multiplexed samplingFigure 5.13: Augmenting the smooth contour prior for single-imageHDRI: Choice of support for the smooth contour prior. Only the greenchannel is shown. Extending this idea for other channels is trivial. (a)Green channel pixels are organized on a diagonal grid, shown in green. Inconventional imaging, the sensor has a uniform exposure setting, and thegreen channel gets sampled evenly both horizontally and vertically. Thesupport–the size of the local anisotropic filter kernel we estimate–uses adisk shape. The disk shape support on a grid turns into a 4-neighborhoodon a grid. (b) In exposure-multiplexed imaging, the vertical samplingrate can be as low as half the horizontal sampling rate; the white pixelsdenote missing data. The anisotropic interpolation kernel we seek in thiscase needs to be twice as long in the vertical direction, to account for thehalf sampling rate. This way we are able to gather as much structuralinformation as the regular sampling case. Taking into account green pixellocations, this amounts to the 6-neighborhood on the right.KTV and KE are linear transformations from the primal (image) domain torespective dual domains defined asKTVf ≡ λTV∇f (5.41)KEf ≡ f − E (f) , (5.42)139and FTV and FE are functions defined asFTV ≡ ‖ · ‖TV (5.43)FE ≡ λE2‖ · ‖2. (5.44)Now, rewriting the convex optimization problem (5.6), we get,arg minfG(f) + FTV(KTVf) + FE(KEf), (5.45)which is equivalent to solving the primal-dual form,arg minfG(f) + arg maxyTV〈KTVf, yTV〉 − F∗TV(yTV)+arg maxyE〈KEf, yE〉 − F∗E(yE), (5.46)where ·∗ denotes the convex conjugate of a function, and yTV and yE are slackvariables defined over respective dual domains. The algorithm we propose in thissection directly follows from the method we developed in Section 5.6.3 earlierin this chapter. We use the primal-dual algorithm [18] to solve the saddle-pointformulation (5.46) of our original inverse problem (5.6).Algorithm 5.3 Exposure-multiplexed high dynamic range imaging (HDRI) usingthe smooth contour priorRequire: σ > 0, τ > 0 and θ. f¯ (0) = f (0) = E(g1), y(0)TV = 0, y(0)E = 0.1: repeat2: y(k+1)TV ← proxσF∗TV(y(k)TV + σKTVf¯(k))3: y(k+1)E ← proxσF∗E(y(k)E + σKE f¯(k))4: f (k+1) ← proxτG(f (k) − τ(KTTVy(k+1)TV +KTE y(k+1)E))5: f¯ (k+1) ← f (k+1) + θ(f (k+1) − f (k)).6: until convergenceThis primal-dual convex optimization Algorithm 5.2 starts with some initial140values for the unknown image f and the dual-domain slack variables y. Thealgorithm then proceeds with a series of generalized projections alternatingbetween the primal and dual domains. These generalized projections are called“proximity” operators and are denoted by prox. Please consult [18] for detailedderivation of the method and convergence guarantees. Below we list the proximityoperators we use in our algorithm, and derive the one related to the smooth contourprior:a) The proximity operator of the data fitting term follows directly from thedefinition of proximity operators,proxτG (f0) = arg minf‖f − f0‖22τ+ G(f) (5.47)= arg minf‖f − f0‖22τ+ ‖g1 −M (ρ⊗ f)‖2 . (5.48)This is a linear least-squares minimization problem, which we solve usingthe conjugate gradient method.b) The proximity operator of the total variation term is pointwise shrinkage [18],proxσF∗TV(y0) =y0max(1, |y0|) . (5.49)c) The proximity operator of the convex conjugate function F∗E can be derivedusing Moreau’s Identity [18]. Moreau’s Identity relates the proximityoperator of a convex conjugate function (e.g. F∗E ) with the proximity operatorof the original function (e.g. FE ), and we get, proxσF∗E (·) in terms ofprox1/σFE (·),proxσF∗E (y0) ≡ y0 − σ prox1/σFE(y0σ), (5.50)141where the proximity operator of the original function proxFE follows directlyfrom the definition, and we get,proxσF∗E (y0) ≡ y0 − σ(arg minyσ2∥∥∥y − y0σ∥∥∥2 + λE2‖y‖2)=λEσ + λEy0. (5.51)We now present a discussion on our prior.5.7.1 Modified smooth edge-guided interpolationIn this section, we adapt the super-resolution (SR) method from Section 5.6.3 forour problem. That SR method can be described as a sliding-window blur kernelestimation followed by a re-application of these kernels in the super-resolved grid.Our image data sampling strategies are different from theirs, and consequently wemodify the kernel estimation process below.We note that our SR method in Section 5.6.3 uses a 4-neighborhood kernel.Over some sliding window W, usually of size 10× 10 centered around each pixel,they estimate a nontrivial convolution kernel ρ of size ≈ 2 × 2 that W is locallyinvariant to, i.e.,W ≈ ρ⊗W, ρ 6= I. (5.52)This means that the kernel ρ describes the local dominant direction of smoothnessinside of the window W. The SR method then uses these estimated kernels forSR. In their case, image pixel data is sampled evenly along both dimensions on aregular grid, and as a result a 2D disk-shaped support logically boils down to the4-neighbor support (Figure 5.13a)For our problem, we use a 6-neighborhood instead. We observe that in theworst case when scene dynamic range is too wide, two very different EIs have tobe used to capture as much of the scene information as possible. The two vastly142(a-4) HDR resultlong exposure(a-2) SDRimage(a-3) HDR resultshort exposure(a-1) Exposure-indexedBayer image(b-1) Exposure-indexedBayer image(b-4) HDR resultlong exposure(b-3) HDR resultshort exposure(b-2) SDR imagebrightness reducedFigure 5.14: Exposure-multiplexed single-image HDRI results. Thegrey image on the left column shows the input exposure-multiplexedBayer image. For our HDR reconstruction, a short simulated exposureand a long simulated exposure are shown, to present the quality of ourreconstruction in highlights and shades respectively (see Section 3.2 for adiscussion on virtual exposures). for the demonstration of the quality ofour reconstruction. Some details are shown in the blown up insets.different exposure settings will then capture two orthogonal sets of regions in thescene. In this case up to half of the pixel data would be lost due to clipping ornoise (e.g., Figure 5.2e). Let us denote the resulting vertical low-pass filter withΓ. Then the contents of a window with the same size in our case is Γ ⊗W. Since143(b-1) Exposure-indexedBayer image(b-4) HDR resultlong exposure(b-3) HDR resultshort exposure(b-2) EDI onlybrightness enhancedFigure 5.14: Exposure-multiplexed single-image HDRI results(continued)convolution is commutative and Γ⊗ Γ = Γ, we get from (5.52),Γ⊗W ≈ (Γ⊗ ρ)⊗ (Γ⊗W). (5.53)The resulting vertically low-passed kernel Γ ⊗ ρ has to have twice the size inthe vertical dimension compared to ρ, i.e., of size ≈ 4 × 2, and this yields a 6-neighborhood as demonstrated in Figure 5.13b.It may be noted that this change of kernel size does not change the other partsof the method proposed in Section 5.6.3, in particular, the speed-up.5.7.2 ResultsWe show a few results in Figure 5.14. For each test case, we show two simulatedexposures of our reconstructed high dynamic range (HDR) image.The simulated short exposure images show details in the bright areas. InFig. 5.14a-3 and Fig. 5.14b-3, our method has fully restored details inside of thebright regions a standard dynamic range (SDR) camera would fail to capture. For144comparison, Fig. 5.14a-2 shows the SDR image in full brightness while Fig. 5.14b-2has its brightness matched with the simulated short exposure Fig. 5.14b-3.The simulated long exposures demonstrate that the dark image areas have beenrestored relatively well. It may be noted that we chose parameters such that weretain most of the noise in the dark areas. This is because strong denoising canpotentially remove detail as well as noise. This is why the dark areas in simulatedlong exposures in Fig. 5.14a-4 and Fig. 5.14b-4 appear noisy.In the examples presented, for the exposure-multiplexed capture of the inputimages we used dual ISO settings {100, 800}with an effective dynamic range (DR)gain of 8x compared to a SDR image.5.8 Discussion: non-local self-similarity prior vs. smoothcontour priorThe two methods we have explored in this chapter have pros and cons, and aresuitable for different situations.The first method uses the non-local self-similarity prior (Section 5.5). Thismethod is comparatively slow since it implements the non-local self-similarityaspect using the denoising algorithm block-matching and 3D filtering (BM3D) [21,24] once on every iteration of the iterative optimization method. For images as largeas 20-megapixel this method requires hours to converge. However, in many cases,particularly when imaging urban scenes with many straight edges and repetitivestructure in the scene, this method produces more favorable results.The second method uses the smooth contour prior (Section 5.7). This prior isvery fast, and often produces results that are comparable to the first method above.This method can process 20-megapixel images within a few seconds, which makesit possible to apply the method when interactive rate is required.1455.9 SummaryIn this chapter, we have explored one way to use sensitivity-multiplexingtechniques for single-image high dynamic range imaging (HDRI). While there areother sensor based filtering techniques available, most of the techniques require amodified image sensor, which makes a custom sensor inaccessible for the masses.Instead, we have focused on the exposure-multiplexed imaging technique, thesensitivity-multiplexing technology which is available on consumer cameras.The exposure-multiplexed imaging technique does post-processing to obtainthe full dynamic range (DR) of a single-image capture. The sensitivity-multiplexing“modifies” the DR information in the captured image, and post-processing restoresthis information to reconstruct the high dynamic range (HDR) image.We have formulated the exposure-multiplexed single-image HDRI problem asan inverse image reconstruction problem. We observe that the inverse problem isunderdetermined and we employ natural image priors to make the problem well-posed. Then we solve this problem using a fast solver, namely the primal-dualconvex optimization method.We have also developed an image prior that is designed specifically for thepurpose of our problem. Our proposed prior, denoted by the smooth contour prior,is fast to compute, works in conjunction with the popular sparse gradient prior,and produces results comparable or superior to those obtained using the state-of-the-art techniques. This is a very strong prior, and we use it in Algorithm 5.2 forsuper-resolution (SR) and Algorithm 5.3 for single-image HDRI.In this chapter, we have demonstrated that our proposed sensitivity-multiplexing methods can be a part of any consumer camera with no hardwaremodification. Our methods require very little computational overhead whichwe expect to be comparable to the complexities of advanced denoising anddemosaicking algorithms, which are part of any consumer camera system today.By using the methods we propose in this chapter, consumer cameras can performsingle-image HDRI in a much better way and can produce images with a higher DR.146One extension of this method would be to perform a fast exposure-index (EI)selection. This can be achieved by modifying the EI set on the image sensor. Asan image is read row-by-row for digitization, we can do the following: just afterwe have read one whole row, we can estimate the average brightness of that row,and based on this estimate we can select the best EI estimate possible for digitizingthe next two rows of pixels and so on. However, both these approaches ignore acrucial property of natural images, which is that spatial distribution of intensity iscorrelated across different color channels.In the next chapter, we further propose other methods which accept single colorimages as input and produce the corresponding HDR image.147Chapter 6Cross-color-channelpostprocessing for single-imagedynamic range (DR) expansionIn a natural scene, illumination conditions can vary greatly. A faithful recordingof such scene details with such a wide intensity variation requires a high dynamicrange (HDR) image. A vast majority of the digital images been captured have astandard dynamic range (SDR). For these images it is not possible to go backand upgrade the image acquisition hardware or to improve the image acquisitionprocess. The only way to improve the image fidelity, i.e., the dynamic range (DR)of these images is to expand the DR by post-processing them.HDR images are not trivial to obtain because the SDR of conventional camerasis lower than the DR required. As a result of this limitation, bright scene areas canover-expose a camera image sensor leading to clipping of image pixel values andscene details, while dark scene areas become underexposed, noisy and lose detailsdue to a poor signal-to-noise ratio.Conventional cameras capture images with a SDR, that is they fail to captureall the details in a scene that has high intensity contrast. For example, when anoutdoor scene with bright sunlight and dark shades is captured by an SDR camera,one of two things happens: Firstly, a longer exposure setting captures darker areasof a scene well but bright parts of the scene appear washed out (“clipped”) due tooversaturation of the camera sensor. Alternatively, a shorter exposure setting favors148the bright areas of the scene but the dark or shaded parts of the scene will lack detailand appear black and noisy due to under-exposure.Conventional multi-exposure high dynamic range imaging (HDRI) techniquescombine multiple SDR images exposed with different camera and lens settings.However, situations, such as using a hand-held camera or capturing a changingscene, require a difficult spatiotemporal realignment of the SDR images before theyare combined. Furthermore, multi-exposure techniques cannot be applied on thevast collection of SDR images we already have. To address these problems, wepropose different methods that produce an HDR image from a single SDR exposureHDR imaging comes to the rescue, however such a technique generallyinvolves multiple exposures by an SDR camera [27, 55, 56, 90, 98]. For staticscenes, the SDR images are averaged pixel-by-pixel. However for dynamicscenes or camera motion, complex spatiotemporal alignment of the multi-exposuresequence is required. Rigid transformation methods can only handle camera shakeand only when there is no parallax. For full camera-object motion, optical flow [71]or patch-based methods are required [51, 62, 69, 123]. Instead of using multipleimages, in this chapter we expand the DR of a single SDR image, i.e., the imageacquisition process is not altered. Based on how well they have been exposed,pixel data can be classified as: highlights, midtones and shades. The highlightsor bright areas in an image often have desaturated color due to clipping in one ormore channels, i.e., partial saturation. Lowlights or under-exposed areas are darkand noisy. The midtones are well-exposed areas that have data in all three colorchannels. Given a single SDR photograph, we aim to recover as much additionalinformation as possible. That is, our restoration of clipped highlights raises theclipping level and our denoising of lowlights lowers the noise level. This raising ofthe clipping “ceiling” and lowering of the noise “floor” expands the DR of the inputSDR image.In this chapter, we consider the problem of expanding the DR of single colorimages, i.e., images that have three color channels: red, green and blue. We proposethree methods in this chapter and all of these methods have a common underlyingtheme. We employ the image prior known as cross-color-channel correlation149(Section 3.1.1). It says that each of the different color channels of a captured imagetends to be spatially correlated. In other words, in natural images, it is common tohave continuous regions with each region having pixels of the same color.The cross-color-channel correlation is also known as color constancy, orcolor-line prior. It can be thought of as a generalization of the sparse gradientprior [80, 106]. The sparse gradient prior states that natural images tend to containsparse sharp edges and smooth areas in between those sharp edges. For colorimages, the generalization of the sparse gradient prior is that edges represent sharpboundaries between scene parts with “smooth color” (i.e., either color is uniforminside of each scene part or color only varies smoothly). Aside from the pixels thatlie on these scene edges, smooth patch pixels should have a local correlation, sinceedges are sparse and most of the image has to exhibit piecewise cross-color-channelcorrelation property.In this chapter, we apply this observation to single-image DR expansion . Weformulate solution to improve the DR of existing SDR images. We propose threeapproaches specific to three cases of standard DR images.• We first consider a direct application of the cross-color-channel correlationproperty that the natural images have. In particular, we look at colordesaturation: when some but not all of the color channels are saturatedaround a color bright object in the scene. We use properly exposedchannels to reconstruct the clipped channels and increase the DR as a result.We perform this information transfer in the gradient domain for a robustreconstruction of the desaturated area (Section 6.1).• Secondly, we observe that for SDR images the color constancy approach canbreak down in the presence of texture. To mitigate this, we utilize anotherproperty that natural images exhibit, the non-local self-similarity property.This property assumes that small image patches tend to reappear multipletimes in the same image. We use this repetition of image patches to identifygood matches for the partially saturated image patches. By transferring150information from a well-exposed image patch to a poorly exposed one, wecan improve the DR of the resulting image (Section 6.2).• Finally, we explore the effect of undersampling of chromatic channels inimage sensors. We argue that, since high frequency scene data is difficultto separate from high frequency noise, image denoisers will remove somescene data when denoising the image. We propose to augment naturalimage denoisers so that we can add back this removed data. We perform anadditional step in the denoiser where we analyze the noise that was separatedout from an input image. Using cross-color-channel correlation, we identifythe part of this noise that is most likely data, and add it back to the denoisedimage (Section 6.3).6.1 Gradient domain color and clipping correctionIn this section, we explore what can be done given a standard dynamic range (SDR)image with pixels clipped due to saturation. We observe that in the special caseof color highlights, partial information about a highlight pixel might be availableif one or two out of the three color channels are saturated and at least one is notsaturated. In this chapter we present a method to utilize this partial information torestore the partially clipped color highlights.Sensor clipping destroys the hue of colored highlight regions bymisrepresenting the relative magnitude of the color channels. This becomesparticularly noticeable in regions with brightly colored light sources or specularreflections. We present a simple yet effective color restoration algorithm in thegradient-space for recovering the hue in such image regions. First, we estimatea smooth distribution of the hue of the affected region from information at itsboundary. We combine this hue estimate with gradient information from thechannels unaffected by clipping to restore clipped color channels.151Figure 6.1: Color and clipping restoration in the gradient domain.Discoloration artifacts due to sensor clipping are evident in the inputimage—notice the color shift in the neon sign and the loss of detail in thecurtain region. Our restoration process first estimates the hue of partiallyclipped pixels. Next, we combine the hue with the gradient field of theinput image in order to compute the restored gradients. The weightsindicate the level of confidence in each captured value in red, green andblue channels—we have low confidence in values close to zero or one.Then an integration gives the final result. Our enhancements to the neonsign and the top-right corner of the curtain and corresponding restorationin the gradient domain are clearly evident. Images have been tone-mappedin a color-preserving way to help visualize the restored colors.6.1.1 Restoration of Saturated ColorColored light sources are ubiquitous in modern environments, with examplesranging from sodium street lights to neon signs, warning and exit signage, aswell as colored LEDs used both as indicator lights and for architectural lighting.Photographing scenes with such colored lights is challenging—the light sourcesthemselves are often orders of magnitude brighter than reference white surfacesin the scene. Due to the SDR of image sensors, color channels are then clippedindependently of each other, based on the color of the light.In the case of colored light sources, the clipping does not only alter the152sgfgswrsfssWeightsClipping boundaryGradient smoothingCross−channel detail transferHue estimationInputmaskRestored imageFigure 6.2: Flowchart of our method. We compute a gradient field ∇ggiven an input g. We estimate a smooth hue distribution ρ over theclipped region from the observed hue at its boundary. Guided by ρ,we then estimate the unknown latent gradient field.f with a weighted(w) combination of gradients from unclipped channels. It may be notedthat we can only restore the original.f at pixels where least one channelremains unclipped. In regions with all channels clipped, we optionallysmooth the gradients to avoid abrupt changes known as Mach bands [86].All steps, including the final restoration of f can be cast into simplePoisson equations.intensity of the affected image region but also its hue, which can affect the moodof the scene or make its rendition unrealistic. While standard multi-exposure highdynamic range imaging (HDRI) imaging techniques such as the work by Debevecand Malik [26] can restore both the intensity and the hue, we aim at restoringjust the correct hue of the clipped regions from a single photograph. Our goalis therefore not a new high dynamic range (HDR) capture technique or heuristic forboosting the dynamic range (DR) but instead to devise a simple yet effective methodfor restoring colors of clipped regions, thereby generating a version of a traditionalSDR image with improved color rendition.Our method is based on the observation that, in many cases, colored lights mayonly result in clipping some of the color channels while leaving others unaffected.We can use this property to restore the hue and brightness of the clipped channelsfor certain types of scenes.153As an example, consider the neon sign depicted in Figure 6.1. From thereflection in the building facade we can infer that the sign itself emits red light,yet the neon tubes are depicted as yellow due to selective clipping of the red andgreen channels in the photograph. Our algorithm manages to reconstruct the correctcolor of the neon sign and to restore washed out details in partially clipped regionssuch as the curtains (Figure 6.1).Our method is based on a gradient domain approach. For each image regionwith at least one clipped color channel, we first estimate smooth hue distributionsusing data from pixels just outside the clipped region. We combine these hueestimates with gradient information from the color channels unaffected by clippingin order to estimate the gradients of the clipped color channels. We then restorethe image by solving a Poisson problem. In an optional pre-processing step, wecan also fill in a smooth gradient field in regions where all three color channels areclipped. Doing so avoids discontinuities in the gradient field and resulting Machbands at the transition from partially to fully clipped image regions.Gradient domain image processing has become a powerful technique forimage manipulations, starting with the work by Elder and Goldberg [34] oncontour domain image editing and continuing with general formulations for imagemanipulations in the gradient domain [108]. Levin et al. [82] proposed a userguided colorization of photographs using gradients under the assumption thatdrastic color changes in natural images are usually correlated with strong edgesin the grayscale input image. While our method falls within the scope of generalgradient domain processing, to the best of our knowledge, our method is the firstautomatic method to use this tool for restoring clipped highlights.6.1.2 MethodOur method for gradient reconstruction is based on three steps (see Figure 6.2):an (optional) pre-processing step that smoothly fills in gradients in image regionswhere all color channels are clipped (Section 6.1.2.4), smooth hue estimation forclipped pixels from information just outside the clipping region (Section 6.1.2.2),154and finally, detail transfer from unclipped to clipped channels (Section 6.1.2.3).In partially clipped areas where at least one channel of the input image remainsunsaturated, this approach recovers both the hue and the texture; in fully clippedregions we recover the hue only.All three steps are performed in gradient space and can be reduced to simplegradient manipulations and a sequence of independent Poisson solutions. Whilethis is a very simple algorithm, it has the advantage of being easy to implement, andwe demonstrate that it is highly effective in producing high quality hue restorations.6.1.2.1 Image formation modelLet g is the captured SDR image while f is the unknown latent HDR image. Letfk(p) be the color channel k ∈ {R,G,B} of the (unknown) latent image f atposition p ∈ R2. We seek to restore the latent image f , color channels areunaffected by clipping and correspond to the native color channels that directlycorrespond to the color filters of a camera. If we assume a camera with a SDR(0 . . . 1], the image captured by this camera is given by,g , min(1, f + η), (6.1)where η represents a noise term.We now define Ωk ={p ∈ R2 : gk(p) = 1}as the set of image positions pwhere channel k is clipped (Figure 6.3). Image regions with at least one clippedchannel are denoted as Ω∪ (these are “partially clipped” pixels), and regions withall channels clipped are denoted by Ω∩:Ω∪ ≡⋃kΩk and Ω∩ ≡⋂kΩk. (6.2)Finally, we define ∂Ωk, ∂Ω∪ and ∂Ω∩ to be the boundaries around thecorresponding sets.155Figure 6.3: Partial clipping resulting in color desaturation: how differentchannels are clipped at a different rate depending on the hue. Clippedregions of red, green and blue color channels are expressed as ΩR, ΩGand ΩB respectively. Ω∪ denotes pixels with any channel clipped (i.e., theunion of the aforementioned three regions), while Ω∩ denotes pixels withall channels clipped (i.e., the intersection of the same). ∂Ω∪, etc. denotecorresponding region boundaries. It may be noted that we have partialdata in Ω∪\Ω∩ and no data in Ω∩; hence our algorithm reconstructs scenedetails of f only in Ω∪\Ω∩, and restores color in Ω∩.The fundamental assumption we make in our work is that the hue variessmoothly over Ω∪ and can be estimated from the pixels on its boundary ∂Ω∪. Thisassumption is valid for highlights generated by a single colored light source suchas an LED or a neon sign, similar to the image in Figure 6.1. It is, however, violatedfor scenes such as sunsets where the hue of the sky may not be independent of theluminance. In such scenes, estimating the hue based only on measurements that aredim enough to fall below the clipping threshold mis-estimates the hue and will notresult in plausible reconstructions, as we demonstrate in Section 6.1.3.1566.1.2.2 Hue interpolationFirst, we generate a smooth hue12 estimate of all regions Ω∪ containing at least oneclipped channel. As mentioned above, we assume that the hue of this region can beinterpolated from the (known) hue on its boundary.In our implementation, the hue ρ is represented as a multichannel image,with the same number of channels and color space as g and f . We perform theinterpolation by solving a Laplace equation over Ω∪ with a Dirichlet boundarycondition in ∂Ω∪:∇2ρ = 0 over Ω∪ with ρ|∂Ω∪ = g|∂Ω∪ . (6.3)This is a standard Poisson problem that can be solved very efficiently. Although onecould use more sophisticated inpainting techniques to produce more detailed huemaps, we found that our smooth hue estimates work well for a large range of imagesand are in fact more robust than, for example, the edge-stopping interpolation usedby Masood et al. [95] (see discussion in Section 6.1.3, Figure 6.7).Boundary cleanup. Image noise and sampling artifacts from single-chip cameraswith color filter array (CFA)s such as Bayer patterns [14] can result in high-frequency hue variations on the boundary that result in distracting artifacts whenthey serve as the basis for hue estimation. In order to suppress these high-frequencyvariations, we apply a one-dimensional (1D) bilateral filter along the boundary ong|∂Ω∪ to suppress noise.The bilateral filter [135] is like a conditional Gaussian filter; filter weightdepends on the product of similarity in value (“range”) and similarity in space. Bothof these differences are assumed to be Gaussian distributed with zero mean and12for the clarity of exposition, we are using the term “hue”; but to be exact we are still processing3-channel un-normalized pixel values.157variance of σ2range and σ2space respectively. Then, for some 1D signal v : R→ [0, 1]:Bilateral (v(t)) =∑s v(s)w(t, s)∑sw(t, s), (6.4)wherew(t, s) ≡ Pr (s− t) Pr (v(s)− v(t)) (6.5)= exp(s− t)22σ2spaceexp(v(s)− v(t))22σ2range. (6.6)In our experiments, we use a spatial (domain) sigma, σspace = 5 pixels and a rangesigma, σrange = 0.25 (clipping value is assumed to be 1). Where possible, wefurther suggest to use simple linear interpolation for demosaicking the boundarypixels ∂Ω∪, while more sophisticated methods can be used elsewhere in the image.6.1.2.3 Cross-channel detail transfer and color restorationIn a second step, we combine the estimated hue with information from unclippedchannels, where available, to estimate the pixel values of f . We first discuss thecase of image regions where all channels gj except for gk = fk are clipped. In thiscase, the known values from channel fk and the estimated hue ρj from (6.3), weobtain an estimate of the pixel values of the clipped channels:fj =ρjρkfk =ρjρkgk. (6.7)Gradient domain formulation. The spatial reconstruction mentioned aboveworks well when only one channel has unclipped image data. In regions wheretwo channels provide valid data, the competing information must be reconciledwith the hue estimates in a spatially smooth fashion (Figure 6.4). To this end, wefirst re-cast the problem as a gradient domain reconstruction.Let ∇g be the gradient vector field of the captured image, and.f ≈ ∇f be theestimated gradient vector field of the latent image. The gradient domain version158of (6.7) can be obtained by computing the gradient of both sides and assuming alocally constant hue ρ:.f j =ρjρk∇gk. (6.8)Given a gradient estimate, we recover each channel fk by solving a Poissonequation over the clipped region in that channel Ωk with a Dirichlet boundarycondition in ∂Ωk:∇2fk = ∇ ·.fk over Ωk with fk|∂Ωk = gk|∂Ωk . (6.9)In fully clipped regions Ω∩, where no scene detail is present in the captured imageg,.f will be (mis-)estimated as 0, and consequently the reconstructed image will besmooth but will contain the estimated hue. The transition from valid gradient datato smooth image regions may cause Mach bands. In Section 6.1.2.4 we describe amethod for filling in smooth gradients before the detail transfer step to avoid thisproblem.Multiple reference channels. If two or more channels remain unclipped, wehave multiple, possibly conflicting sources of gradient information. In this case,we use a weighted combination of reference gradients,.f j =∑k 6=j wk · ρj/ρk · ∇gk∑k 6=j wk. (6.10)Since.f is a combination of multiple gradient fields, it might not be integrable eventhough ∇g is. The Poisson system projects this estimated gradient field onto afeasible space. In order to choose an appropriate weighting function w in (6.10)above, we observe that:a) Weights should be proportional to the reliability of the gradients. In imagesexhibiting photon shot noise, smaller pixel values should have lower weights.159Spatial restoration[Masood et al. 2009][Dcraw]Saturation mask [Zhang and Brainard]with gradient fill−inOur resultOur resultwithout gradient fill−inInput (cropped)Figure 6.4: Advantages of our gradient domain method over spatialapproaches. We show a particular case cropped out from Figure 6.8(f).In this case, the spatial approaches fail. In comparison, our gradient basedapproach faithfully restores the color.b) In order to avoid discontinuity artifacts like Mach bands at the borderbetween regions with different numbers of clipped channels, the weightingfunction should have a smooth profile overall, including close-to-zero slopesnear values 0 and 1.In consideration of these factors, we choose a piecewise cubic weighting functionwith an off-center peak m ∈ [0, 1]:wk(p) =3(gk(p)m)3 − 2(gk(p)m )2 + if gk(p) ≤ m3(1−gk(p)1−m)3 − 2(1−gk(p)1−m )2 + otherwise (6.11) is used to avoid zero weighting which can cause division-by-zero. In ourimplementation, m = 0.65 and = 10−3.Figures 6.4 and 6.7 show comparisons of our gradient based method with aspatial reconstruction using the same channel weights, as well as several spatialmethods. It may be noted that the methods that perform reconstructions in the160spatial domain suffer from discontinuous changes in hue while our gradient-basedapproach provides a smooth reconstruction.6.1.2.4 Gradient smoothing for fully clipped regionsAs mentioned above, gradient fields in fully clipped regions Ω∩ are “flat-top” (i.e.,zero) because all sensor values are clipped to the clipping threshold (Figure 6.5).Derivative discontinuities in the gradient field at the boundaries ∂Ω∩ of these areasmay become visible in the reconstruction results as Mach bands (Figure 6.4, row 1,column 3). To suppress such artifacts, we use an optional pre-processing stage, inwhich we generate gradients for one of the channels over Ω∩. We only apply thismethod if there is one channel k whose clipping region is completely containedwithin the clipping regions of the other channels, i.e. Ω∩ = Ωk.fkf∗kfˆkfˆ∗kgˆkgˆ∗klinear image log image log gradientsFigure 6.5: Gaussian infilling. The input signal (blue) is clipped at theclipping threshold (green), resulting in a discontinuous gradient field, aflat-top structure in the fully clipped region Ω∩ shown above. Log-spacegradient interpolation (red) results in Gaussian infilling of clipped regions.This smooth gradient infilling can again be cast as a set of two sequentialPoisson problems with Dirichlet boundary conditions, this time in log space(quantities withˆare computed on log images):∇2 log.fk = 0 over Ωk with log.fk|∂Ωk = ∇ log gk|∂Ωk , (6.12)∇2log fk = ∇ · log.fk over Ωk with log fk|∂Ωk = log gk|∂Ωk . (6.13)161(a) (b) (c)Figure 6.6: Examples of color and clipping restoration with our method.In each pair, Left: The input image. There are white or discolored pixelsthat have full or partial saturation. These pixels have lost color anddetail, and need to be restored. Right: Our result. The discolorationsdue to partial and full saturation have been restored with vivid colors(compare the insets). Our images have a higher DR compared to theinput, consequently we needed to tonemap our reconstructions with colorpreservation (see text), so that the restored scene detail and color is clearlyvisible.The linear space channel k can then be recovered as fk = exp(log fk). Themotivation for solving this problem in log space is that it corresponds to ageneralization of fitting a Gaussian to the gradients on the boundary Ω∩, ascan be seen by analyzing a 1D example (Figure 6.5). Given a clipped inputsignal (blue) in linear domain (Figure 6.5, left), we first take the log of thissignal and then solve for a gradient (red in Figure 6.5, right), which will varylinearly over the clipping region. We integrate gradient field by setting upand solving a second Poisson problem; and we obtain a log image channel inwhich the intensity varies quadratically over the clipped region. In linear space,this corresponds to a Gaussian extrapolation. In two-dimensional (2D) images,true Gaussian extrapolations are obtained for circularly shaped regions in whichboundary gradients are rotationally symmetric. Other configurations result inasymmetric reconstructions, which are, however, still smooth everywhere. Withgradients defined continuously over the image domain, the reconstruction smoothlyrestores colors in clipped regions (Figure 6.4, top right).162Our resultDCRAWSpatial reconstruction [Masood et al. 2009][Zhang and Brainard 2004]InputFigure 6.7: Comparison with other methods. Our method faithfullyrestores the neon sign (green box) and the curtain (blue box), which areclipped in the input image.6.1.2.5 DiscretizationOur derivation so far has been based on continuous images and gradients. To workwith digital images, we discretize the resulting systems in a straightforward fashion,using 4-neighborhoods (x,y) (i.e., for every pixel (x, y), four nearby pixels areconsidered neighbors: the ones directly above (x, y − 1), below (x, y + 1), to theleft (x−1, y), and to the right (x+ 1, y)). The boundaries are defined as unclippedpixels with at least one clipped pixel in their neighborhood.For gradient estimation we use divided differences over the neighborhoods(x,y). The blending weights are first computed independently for eachpixel (6.11), but since they are applied to gradients estimated over a neighborhood,we low-pass filter the weights over the same neighborhood, using a minimum filter.6.1.3 Results and analysisWe have run our algorithm both on images we captured in RAW mode with differentmodels of Canon single-lens reflex camera (SLR)s, as well as images obtained from163other sources. For the RAW images, we use linear interpolation for demosaickingalong the boundaries of the clipped regions and Dcraw [25] for the rest of theimage. Images obtained from outside sources are first approximately linearizedby applying the inverse of the sRGB gamma curve. Our implementation uses amultigrid Poisson solver for all subproblems and takes about one minute to solve a10 megapixel image on an Intel Core 2 Duo machine running at 3 GHz.Figures 6.4 and 6.7 show comparisons of our results with [25] and [95], usingthe respective authors’ implementations, and comparisons with [152], using a thirdparty implementation. Figure 6.4 shows a cropped region of Figure 6.8(f), depictingflashing police lights. We can see that the spatial methods all generate artifacts atthe boundaries between regions with different numbers of clipped channels. Ourgradient-based approach avoids these artifacts.The neon signs in Figure 6.7 appear yellowish white, although from thereflection on the windows it is evident that the neon signs should be red in color;also it may be noted that the upper right corner of the curtain is completely flatdue to clipping. Dcraw [25] fails to correct either of these discoloration artifacts.The approach by Zhang and Brainard [152] reconstructs the curtain well with theirglobal model but fails to reconstruct the neon sign due to the absence of a localizedmodel, which implies that local control is important for such restoration. A spatialreconstruction (6.7) restores the neon sign well but shows discontinuity artifactsin the curtain. In this example, the quality of the result by Masood et al. [95] iscomparable to ours for both regions.Figures 6.6 and 6.8 contain examples of a variety of scenes including day andnight shots, man-made light sources, a sunset scene, and a human face. Sinceour color restoration produces pixel values outside the three-dimensional (3D)gamut of the original image, we choose two different visualizations to illustrate theresults. The first is a split-image representation for two different virtual exposures(Section 3.2), which is commonly used to visualize HDR images (e.g. [112]).The second is a tone-mapped version of the output using Reinhard et al. [111]’sphotographic operator with the color correction from Mantiuk et al. [93]. Weemphasize that we consider these representations only as visualizations for print164(a)(c)(b)(d)(f)(e)(h)(g)Figure 6.8: Our method faithfully restores the color and clipping of theclipped highlight image regions. In each group: Top-left quadrant: theinput. Top-right quadrant: our result. Both of these images on the toptwo quadrants are shown with alternating “virtual long exposure” (theseare the bright bands, to show details in the shades) and “virtual shortexposure” (these are the dark bands, to bring out the restored details inthe highlights). Virtual short exposure is necessary to present for the DRexpansion; otherwise all the restored details would are clipped when weshow the image (for example, as expected, the virtual long exposures, i.e.,the bright bands, look identically clipped in the input on the left and theoutput on the right while the dark bands look different). (More detaileddiscussion on why we need virtual exposures to visualize HDR images inSection 3.2). Bottom-left quadrant: our result image again, but with acolor-preserving tonemapping operator applied on it (that compresses theDR so that the colors look more vivid while details while contrast is alsopreserved.) Lower-right quadrant: zoomed in select regions from theother three quadrants.165purposes; the full restored color image could also be presented on alternativedevices with a larger 3D gamut, could be explored interactively with simulatedDR viewers, or could simply serve as the input for further manual processing withtools such as Adobe Photoshop.Our method restores scene details washed out due to clipping, includingdetails in the curtain in Figure 6.1, in the water droplets in Figure 6.8(c), in thesunny background in Figure 6.8(d), and on the petals of the skunk cabbage inFigure 6.6(b). Figure 6.6(a) shows that the method works well even when unclippedregions with different hues touch.One downside of transferring data from unclipped channels is that noise isenhanced when the unclipped channel is very dark. Sunset scenes like Figure 6.8(h)often have strong red and green components close to the sun but a very small bluecomponent. Since static sensor noise and quantization dominate at small luminancevalues, when we amplify the unclipped channel, the noise is amplified as well.However, this problem can be alleviated by applying a noise removal step beforetransferring the gradients.In Figure 6.9, we demonstrate a failure case for all existing methods, includingour own. In this example, intensity and hue of the latent image are correlated sothat the correct hue of the clipped regions is not observed anywhere, and the hueestimation fails. With a mis-estimated hue, the correlation between gradients in thedifferent channels is inconsistently estimated, which results in discontinuities in allmethods. However, as the results show, the unclipped channels do provide a lot ofinformation about the cloud structure, and we believe that as future work one couldderive subject-specific algorithms to handle such scenes.6.1.4 DiscussionsWe have presented a novel gradient-space algorithm to restore discolorationartifacts due to clipping. We have demonstrated that our algorithm generatessmooth and artifact-free results in many real life situations. We have presented166(a)(c)(b)(d)(f)(e)(h)(g)Figure 6.8: Our algorithm applied to a variety of images (continued)comparisons with recent work and demonstrated the advantages of our gradient-based approach. Since all parts of our algorithm can be cast as simple Poissonproblems, the algorithm can be easily implemented and incorporated in modernimage processing.Our current method assumes that the hue in a region is independent of itsintensity, implying that clipped pixels have the same hues as unclipped ones. As wehave shown, this is not the case for scenes such as sunsets, where hue and intensityare correlated in a way that cannot be learned from the same image since the sameclipping threshold is applied everywhere in the image. However, we believe itshould be possible to learn this relationship from other images showing similarscenes. In this way, a collection of similar short exposure images of sunsets couldbe used to fix the colors in our image without altering the cloud structure in it.167Input Estimated hueSpatial restorationOur result[Masood et al. 2009] [Zhang and Brainard 2004]Figure 6.9: A failure case. Correlation between hue and intensity inthe latent image means that the correct hue for the clipped region is notobserved anywhere in the image and thus cannot be recovered. The mis-estimation of hue also results in discontinuities between different clippingregions (see text).6.2 Using intensity-invariant patch-correspondences forsingle-image dynamic range (DR) expansionIn this section, we propose another new method for expanding the dynamic range(DR) of a single standard dynamic range (SDR) image. This method exploits thenon-local self-similarity property of natural images and scene lighting variations.Our method first uses a proposed robust distance metric to match ill-exposed (i.e.,over-exposed or under-exposed) patches of the input image with well-exposedpatches from the same image; it then uses details from these matched patchesto restore the ill-exposed patches. We model this restoration problem as globaloptimization where the energy function penalizes the difference between thematching patches. Since this system is underdetermined we use natural imagestatistics for well-posedness. Our results demonstrate improvement over the state-of-the-art.This method is a novel application of the natural image self-similarity prior168in an illumination-invariant way. Given one input image, when the non-local self-similarity prior holds, we find illumination-invariant patch correspondences, i.e.,image patches that have similar scene content but vary in terms of intensity. Thenby collaborative reconstruction of these patches, we restore scene details in all thesepatches. It can also be used to construct high dynamic range (HDR) images frompre-existing SDR images.It may be noted that, when all color channels get clipped it is very difficultto reconstruct the clipped region without a capture-time optical modulation (e.g.,the method presented in Chapter 4). Computation-based methods that only addresspartial clipping make a local hue constancy assumption [119, 145]. However, sincelocal hue constancy breaks down in the presence of color texture, these methodsfail to recover clipped highlights with texture. Our proposed patch-based methodis able to restore such clipped highlights. Furthermore, in contrast to hallucination-based DR expansion methods [112] which stretch the intensity values to fill a widercontrast ratio, we restore real scene detail in the process.In this section we exploit the non-local self-similarity property of naturalimages to expand the DR of a single SDR image. It is well understood [16, 88]that natural images exhibit a high number of repetitions in similar image patcheswithin one image; that is, for some image patch, there are other image patcheswithin the same image that are similar. This natural image prior is known as non-local self-similarity and it has been extensively used in denoising methods [21]. Inthis section, we further observe that similar patches can also appear under differentlighting conditions within the same image. Particularly in daytime images, wherethere is a high degree of intensity variation and strong shadowing, similar partsof similar objects can be in light or in shadow. As a result, a certain patchcould be clipped (due to over-exposure) or noisy (due to under-exposure) whilea patch similar to it would be well-exposed. We use the well-exposed patchesto restore other similar patches that are ill-exposed or not as well-exposed. Wealso use another property related to color channels in images: the color channelsin most objects in a natural scene do not get over-saturated or under-exposed inthe same way. That is, when one or two channels in a certain image patch suffer169(a) Input (e) Output(b) Search (c) ReplacementFigure 6.10: An overview of our method. (a) The original scene containsa wide intensity variation; an SDR image of the scene has the followingartifacts: The dark areas or shades have a poor signal-to-noise ratio (SNR)and appear noisy when enhanced (a simulated long exposure inside thecyan circle brings out the noise). The bright area or highlights is clippeddue to sensor saturation (a simulated short exposure is shown inside ofthe black circle). The bright red pixels here appear yellow because ofpartial clipping of the color channels: red channel is clipped but blueand green channels are not. The simulated long and short exposures helpdemonstrate DR effects and artifacts on images properly and help focuson what is often difficult to show such as the highlights and the shades inan image (Section 3.2). (b) To restore an ill-exposed “target” image patch(marked with yellow), we need a matching “source” patch that is well-exposed. Since the well-exposed patches will have a different intensity,we use a robust distance function we present in this section to compareimage patches in an intensity-invariant fashion. A few misses are shownin red, and a hit is shown in green. We use PatchMatch [12], a stochasticsearch technique, to efficiently search for matching patches in our method.(c) The pixel values of the well-exposed source image patch are thenscaled to match that of the target patch. (d) The output of our methodis shown. Clipped highlights and noisy lowlights have been fixed.from clipping or noise, the other channel(s) would still contain some detail. Weexploit this behavior when searching for a similar patch that is well-exposed. Wereconstruct image patches that suffer from partial clipping (due to over-exposure)or noise (due to under-exposure) using the non-local self-similarity prior that is170oblivious to intensity variations.6.2.1 MethodThe dynamic range (DR) of an image is limited due to sensor clipping and noise(Section 3.3.1). We reconstruct poorly-exposed parts of an standard dynamicrange (SDR) image using content from the patches in well-exposed image areas.First, we use a hierarchical randomized search method [12] to obtain the well-exposed matching patches from the same image (Sec 6.2.1.2). Once these matchesare obtained, the reconstruction of the final image is formulated as a globaloptimization problem (Section 6.2.1.4). Natural image statistics are used to reacha feasible solution.As having exact matches in one image is unlikely, we derive a robust distancefunction that allows sub-pixel structural mismatch (Section 6.2.1.3). Our proposeddistance function is simple enough to be used within an optimization framework,as opposed to sophisticated distance functions Furthermore, although obtainingpoor matches is possible, these matches are weeded out in the global optimizationreconstruction step.6.2.1.1 Image formation modelWhen the optical sensor of an SDR camera is exposed, noise due to variousphysical processes corrupts its measurement. Furthermore, each sensor pixel has amaximum capacity and it can get saturated when a higher intensity beam hits it. Wedenote the unknown high dynamic range (HDR) image as f , and the correspondingclipped and noisy SDR observation as g. Then simplifying the forward model fromSection 3.1, we obtain,g = min (1, f + η) , (6.14)where 1 is the clipping level (i.e., we normalize the input image g so that thepixel intensity values lie within 0 and 1), and η is the noise which in turn hastwo components (Section 3.1.2.3):171(d-1) log pixel intensitylog noise(a) Input image(c-1) Additive GaussianPoisson noiseTheoretical optimumTotal noise(b) noise plot (c-2) (d-2) (d-3) Figure 6.11: Noise reduction using the proposed method. (a) Aninput image g. The red box highlights a dark and noisy patch; and amatching well-exposed patch is highlighted with green. (b) Differentnoise components are shown on a log-log plot of noise sigma vs. pixelintensity. Additive Gaussian noise with the variance σ2A dominates in thelow intensity end while the multiplicative Poisson noise with the varianceσ2M dominates over the additive noise in the higher end of the intensityrange. The resulting total noise is shown. (c-1) An ill-exposed “target”patch ga at location a highlighted with red. This image patch has beendigitally enhanced so that the embedded noise is visible. (c-2) The verticalred bars show the range of pixel values in ga. The part of the total noiseplot within this value range is highlighted with red. (d-1) A well-exposed“source” image patch gb at location b highlighted with green. (d-2) Thevertical green bars show the range of pixel values in gb. The part of thetotal noise plot within this value range is highlighted with green. (d-3)our output: When the source patch gb is intensity-scaled to match thetarget patch ga, the resulting noise is much lower compared to the originalshown in (c-2). The cyan line—locus of possible improvements—showsour theoretical optimum noise reduction (Section 6.2.2.2).a) photon shot noise and other Poisson-distributed or multiplicative Gaussiannoise with a variance σ2M andb) a combination of quantization noise, leakage current, thermal noise, fixed172pattern noise, and other additive Gaussian noise with a combined varianceσ2A,η ∼ N(0, σ2η), (6.15)where σ2η = σ2Mf2 + σ2A, (6.16)and σM and σA are camera parameters which we can measure for a given cameraand exposure settings.Most denoisers process the well-exposed areas very well but fail at ill-exposedareas. Our proposed method restores ill-exposed patches by searching for one well-exposed match from within the same image. Since well-exposed patches have muchbetter signal-to-noise ratio (SNR) than ill-exposed patches, a single brightness-invariant matching yields improved results compared to traditional denoising bymatching multiple equally exposed patches.6.2.1.2 Stochastic searchThe first step of our method is obtaining matching patches from within the sameimage. The time complexity of an exhaustive search for matches is quadraticin the number of pixels. However, such matching tends to exhibit a spatiallocality, i.e., neighboring patches tend to match with patches that themselves areneighbors to each other. PatchMatch [12], a stochastic search technique, exploitsthis observation.PatchMatch, at a very high level, works in the following manner: PatchMatchiterates over random matching assignments and always keeping track of the bestper-patch matching found so far. It alternates between one stochastic search stepand one propagation step.• In the stochastic search step, PatchMatch first makes a random assignmentto every patch. Many of these assignments yield poor matches but a few ofthese random assignments yield good matches.173• In the propagation step, PatchMatch takes advantage of spatial locality ofmatching by propagating good matches to the neighboring patches.6.2.1.3 Adding intensity-invariance to non-local self-similarityIntensity-invariant patch distance is essentially the difference between thecorresponding intrinsic image patches. However, obtaining intrinsic images isan underconstrained inverse problem [96]. In particular, finding the intrinsicimage straightforwardly requires a division by an intensity estimate and is ill-posedbecause of poor SNR in dark pixels. Instead we derive a robust intensity-invariantdistance function D using image noise properties. This robust distance functionenables us1. to accommodate for intensity variations, and model the resultingmodification of per-pixel noise,2. to admit patches with only subpixel misalignment as a good match, and3. to exploit spatial locality of matching.When multiple good matches exist, small perturbations in noise can lead topreferring one over the other match. Our robust distance function can ignore smallperturbations and the resulting matching exhibits a much stronger locality.Let ga ∈ Rm×m×3 be a m × m ill-exposed “target” SDR RGB image patchlocated at some image location a. The corresponding unobserved (i.e., unknown)HDR RGB image patch we seek to reconstruct is denoted by fa ∈ Rm×m×3. Now,let gb at location b be a candidate “source” patch (Fig. 6.11). We obtain theintensity-invariant difference by matching mean intensity of source gb to that oftarget ga, ≡ ga − E [ga]E [gb] gb, (6.17)174(c) Input Our output(a) Input Our output(b) Input Our outputBM3D [21]Color constancy [119]BM3D [21]Figure 6.12: Results. The left column shows the input SDR images, inthe middle column we present our results, and in the right column wepresent comparisons with state-of-the-art. We present three cases, (a) arelatively easy denoising scenario when there is not much texture present,(b) a partially clipped highlight with texture present, and (c) a difficultcase with high-frequency details that makes it difficult to match patchesexactly. The insets have been enhanced for ease of comparison.where E [·] denotes the mean operator. However, we really need to estimate theintensity-invariant difference, ω, of corresponding original image patches,ω ≡ fa − E [fa]E [fb] fb. (6.18)This unknown ω is the distance function we derive next. Below we model its175statistical properties, we connect it with the computed known quantity , and thenwe derive a maximum-a-posteriori (MAP) estimate of ω.We model ω as the sparse difference along scene edges present in imagepatches to allow sub-pixel misalignments. It is well-known that natural imagestend to be piecewise linear and scene edges tend to be sparse [83], and as a result,scene edges (and therefore ω) can be modeled to be distributed as a zero-meanLaplace distribution, i.e., ω ∼ L(0, σ2ω).To obtain an estimate of the unknown ω from the measured , we first observethat from the forward model (6.14),ga = fa + ηa, (6.19)where the noise ηa has zero mean and variance σ2a,σ2a ≈ σ2ME [fa]2 + σ2A, (6.20)where E [·] is the mean operator. Since η has zero mean,E [ηa] ≈ 0, (6.21)E [ga] = E [fa] + E [ηa] ≈ E [fa] . (6.22)When we combine it with (6.17) and (6.18), we obtain, ≈ ω + ηa − E [gb]E [ga]ηb ≡ ω + ηab, (6.23)where ηab ∼ N (0, σ2ab) is zero-mean Gaussian noise with variance,σ2ab ≈ σ2M(f2a +E [gb]2E [ga]2f2b)+ σ2A(1 +E [gb]2E [ga]2). (6.24)176Now, we estimate the unknown ω given the measured in the MAP sense,ωMAP = arg maxωPr (|ω) Pr (ω) (6.25)= arg maxωPr (− ω|ω) Pr (ω) (6.26)From (6.23)= arg maxωPr (ηab|ω) Pr (ω) (6.27)Since ηab and ω are independent, the condition vanishes,= arg maxωPr (ηab) Pr (ω) (6.28)= arg minω‖ηab‖22σ2ab+|ω|2σ2ω, (6.29)then replacing ηab = ω − from (6.23), and setting tab ≡ σ2ab2σ2ω,= arg minω12‖ω − ‖2 + tab |ω|. (6.30)This is the least absolute shrinkage and selection operator (Lasso), and the closed-form solution to this optimization problem is soft-thresholding,ωMAP = sgn () (|| − tab)+ , (6.31)where (·)+ ≡ max(0, ·) gives the point-wise positive part. Therefore, calculatingωMAP boils down to a simple soft-thresholding operation.Finally, the distance function is an absolute sum of per-pixel estimateddifference ωMAP elements, from (6.31),D(ga, gb) =∣∣(|| − tab)+∣∣ . (6.32)6.2.1.4 Reconstruction with global optimizationWe obtain the MAP estimate of the unknown HDR image f via the minimization ofan energy function. The data-fitting term ensures that the reconstructed image isas close to conforming to the forward model (6.14) as possible, while minimizing177the total variation (TV) (i.e., a sparse gradient prior [80, 106]) makes the imagepiecewise smooth to satisfy natural image statistics prior. Although the data-fittingterm itself is underdetermined, inclusion of the sparse gradient prior term makes thesystem well-posed. Furthermore, the non-local self-similarity term makes every ill-exposed patch contain the same content as its matching well-exposed patch. Thegoal here is to balance between the observed value and the matching patches, andwhen neither is reliable to fall back to a piecewise smooth reconstruction via thesparse gradient prior term,arg minfw ‖g −min(1, f)‖2︸ ︷︷ ︸data-fitting+ λTV ‖∇f‖TV︸ ︷︷ ︸sparse gradient prior+λS∑aq(a)∥∥∥∥∥fa − E [ga]E [gS(a)]gS(a)∥∥∥∥∥2︸ ︷︷ ︸non-local self-similarity,(6.33)where λS and λTV are regularization weights, w ≡ ση/g is per-pixel level ofconfidence in observed pixel values, S : R2 → R2, S(a) ≡ arg minbD (ga, gb),is the operator from Section 6.2.1.2 that returns the location of the best matchingpatch found, q(a) ≡ 1/ (D (ga, gS(a))+ ta,S(a)) is per-patch match quality, andta,S(a) is the threshold (6.31).The minimization problem (6.33) is convex but contains a mixture of `2 andtotal-variation (which is `1) terms, which cannot be solved directly. We use astraightforward modification of the primal-dual formulation (Section 3.4.7). Weuse the first order primal-dual convex optimization algorithm [18] to solve thisformulation. We start by reformulating the convex problem into its primal-dualform which is a saddle point system. The optimization algorithm iterativelyconverges to a solution by repeatedly projecting intermediate solutions back ontothe feasible set as it alternates between the primal and dual spaces.6.2.2 Results and analysisFigure 6.12 demonstrates the performance of our method both in partiallyclipped highlights and in dark noisy areas. Figure 6.12(a) shows that our178method performs better than the state-of-the-art block-matching and 3D filtering(BM3D) [21, 24] because we find matching patches from the well-exposedmidtones. This demonstrates that matching a well-exposed patch improves signal-to-noise ratio (SNR) more compared to BM3D’s matching multiple ill-exposedpatches. Figure 6.12(c) shows a case where both our method and BM3D performspoorly because of the high frequency randomized texture on the ground; bothmethods suffered from the lack of exact matches. While BM3D’s output appearsblurry, our result retains sharp details because of the total-variation minimizationin our reconstruction. Figure 6.12(b) demonstrates that our method effectivelyreconstructs partially clipped highlights when a color texture is present. Pixel-basedmethods make a color constancy assumption (Section 6.1) which breaks when colortexture is present. In contrast, our method is able to reconstruct these highlightsfaithfully.We have presented a new patch-based method for dynamic range (DR)expansion of single images. We have derived a robust distance function that enablesour proposed method to search for matching patches in an intensity-invariant way.These well-exposed matches are incorporated in a global optimization formulationfor reconstructing the ill-exposed patches. Reconstructing the ill-exposed patchesexpands the image DR. We have demonstrated on a variety of test cases that ourmethod performs better than the state-of-the-art.6.2.2.1 LimitationsOur method assumes a single light source. In cases when the scene has light sourceswith different colors, each colored light has to be processed separately. We discussone particular example here: daylight scenes have at least two light sources: thesun, and the bright blue sky due to the blue-scattering property of air. As a result,shaded areas not only differ by intensity but also have a blue tint. In such cases, weuse human intervention for figuring out the blue tint. A pair of matching locationsis selected by a human user, one well-exposed and one in shade that should appear179as having the same color but instead has a blue tint. We take into account this colorinformation when calculating intensity-invariant distance.6.2.2.2 Theoretical boundsIn this section we investigate and present the expected performance of ouralgorithm. Since our proposed method relies on finding matching patches in thesame image, the performance largely depends on how well the matches are andwhether or not matches exist to begin with. We derive the best case performancehere, which can act as the theoretical upper bound on the performance of ouralgorithm.In order to intuitively understand the behavior of our method, we furtheranalyze the toy example presented in Figure 6.10. This figure presents both theinput to and the output of our algorithm, per-pixel estimated brightness normalizedto 1. From this presentation, we observe that our method reduces noise primarilyin the dark areas, whereas the noise in the well-exposed areas remains largelyunchanged. Since our algorithm reduces noise in the low intensity area, thedynamic range of the reconstructed image is higher than that of the input image.Now we quantify the best case noise reduction. Let ga be an ill-exposed targetimage patch and gb be a good matching “source” patch. We then reconstruct thetarget patch by replacing it with the intensity-corrected version of the source patch,i.e., E[ga]E[gb] gb. Assuming the latent image patches are identical,fa = fb, (6.34)we are effectively reducing the noise level in the target patch from σa down to anew noise level denoted by σ′a,σ′2a =E [ga]2E [gb]2σ2b (6.35)=E [ga]2E [gb]2(σ2ME [fb]2 + σ2A), from (6.20) (6.36)180(a) The intensity-normalized inputimage (also known as the intrinsicimage) from Figure 6.10. This imageonly contains color, not intensity, andthis brings out the noise (top and leftof the image) and partial saturation(the dark smudge near the center).(b) Reconstructed image,intensity-normalized. All the artifactsresulting from ill-exposure in the leftimage have been restored. Both noiseand clipping have been fixed,improving the dynamic range (DR) asa result.Figure 6.13: An analysis of the toy example from Figure 6.10. (a) Theintensity-normalized “intrinsic image” corresponding to the input. Thevisible noise is due to a poor signal-to-noise ratio (SNR). On the otherhand, the partially clipped or highlight appears dark after the brightnessnormalization (since these pixels have been clipped due to saturation inthe input image). (b) Our reconstruction where the clipping been restoredand noise has been removed, which in turn expands the DR.≈ σ2ME [ga]2 +E [ga]2E [gb]2σ2A, from (6.22) (6.37)= σ2a −(1− E [ga]2E [gb]2)σ2A, from (6.20) (6.38)< σ2a (6.39)since the source gb is of a higher intensity compared to the target patch ga.181Furthermore, the theoretical optimum lower bound of σ′2a is obtained byputting E [gb] = 1 in (6.37),σ′2a >(σ2M + σ2A)E [ga]2 . (6.40)Therefore at the theoretical optimum reconstruction, the residual noise will havea multiplicative property. This shows that we reduce the additive Gaussian noisecomponent, effectively lowering the noise floor and improving DR from a single-image (Figure 6.11).6.3 Retrieving information lost by image denoisingReducing noise increases dynamic range (DR). This is by definition since DR isdefined as the ratio between the clipping level and the noise level. Intuitively, lowernoise implies retrieving new information. Increased information directly translatesto increase DR.Removing noise from images usually results in smoothing of edges and areaswith discontinuities. Such nonsmooth areas however play a significant role inthe perception of image quality. This section studies the restoration of theseregions during denoising. We exploit the fact that the discontinuities in the pixelchromaticity in these regions are less abrupt than those in the pixel luminance.We derive a Bayesian method that estimates the parts of the latent image data inthe nonsmooth areas that a denoiser erroneously removes. We demonstrate thatadding back this recovered part of the latent image data improves the denoisingperformance.It is often difficult to separate the noise from noisy data. Since noise corruptsinput data, denoising methods have to depend on prior knowledge about the imagecontent and about the noise, in order to recover as much as possible of the originaldata (the latent ground-truth (GT) image). However, this does not necessarilylead to the true latent image. In particular, when the underlying image prior isineffective, the noise estimate might include parts of the true latent image data. The182denoising method would then potentially introduce blur to the image as it separatesout a noise estimate from the noisy image. That is, it may add to or remove parts ofthe true latent image data as it removes the noise estimate. This error occurs mostlyaround edges and in high frequency textured areas in an image. In this section, wepropose the use of pixel chromaticity to restore such edges and discontinuities.Conventional color sensors capture color images by applying a color filterarray (CFA) having luminance-sensitive (green) and chromaticity-sensitive (blueand red) elements. The luminance elements dominate the array and usually occur atevery other element position to provide a high frequency sampling pattern while thechromaticity pattern has a lower sampling frequency [14]. Therefore, noise is moreapparent in the luminance than in the chromaticity. Consequently, in the processof removing noise, existing denoising algorithms remove noise from luminancedata but leave the chromaticity data mostly unmodified. From this observation, wederive our method for improving denoising.We present a Bayesian argument and experimentally validate our claim viaapplying our proposed algorithm to improve on the state-of-the-art denoisingmethod known as block-matching and 3D filtering (BM3D) [21, 24].6.3.1 Previous workImage denoising has been extensively studied, and many natural image denoisingmethods have been proposed.The simplest form of denoising is to apply a blurring kernel. This, however,reduces the image’s high frequency contrast. The median filter [63] is a non-linear filter that results in images with very sharp edges, but small texture detailsget washed out giving the resulting images a cartoon-like effect. Media filteringhowever demonstrates that nonlinear filtering is necessary for restoring sharpfeatures in image data. The bilateral filter [135] brings in an explicit use of pixelsimilarity—it filters together only the nearby pixels with a similar pixels value. Thebilateral filter can also be considered to be the simplest of the non-local techniques.183Non-local means (NLM) denoising [16] extends the bilateral filtering approach byusing patch-similarities instead of pixel-similarities. The underlying image priorhere is referred to as the non-local self-similarity prior—it assumes that the sameimage patch often repeats in an image. For every image patch, NLM finds similarpatches from within the same image, and then denoises the patch via averagingthem.Some denoising methods [19, 41] utilize compressibility of natural imagesvia transform-domain spectrum manipulation. The underlying prior assumptionhere is that the image data is compressible while noise is not. As such, with theright transform (e.g., discrete cosine transform (DCT), wavelets), most of the imageenergy can be compressed into a small number of transform-domain coefficients.Noise will not be compressed; and therefore most of the noise can be removed byhard-thresholding these transform-domain coefficients.The state-of-the-art denoising method block-matching and 3D filtering(BM3D) [21, 24] is a patch-based collaborative non-local filter that combines thebest of all approaches above. As a result, BM3D can denoise natural images quiteeffectively. However, even this denoiser can blur sharp features while denoising.We observe that its underlying non-local self-similarity image prior is ineffectivein restoring non-repeating sharp image features. In such cases, parts of the latentimage data gets intermixed with what BM3D considers to be noise, and this imagedata is thus erroneously removed in the denoising process. This results in anundesirable blurry reconstruction of noisy non-repeating sharp image features andhigh frequency textured areas.6.3.2 MethodSince our method processes each pixel separately, we derive our method below interms of individual pixels. A summary of our method is presented in Figure 6.14.184Figure 6.14: An overview of our proposed method. We simulate anoisy input image g by adding white Gaussian noise η to the groundtruth image f . By running a denoiser on this noisy input, we obtainthe denoised image φ and the RGB noise estimate . Then for everypixel, we project this estimated noise onto the pixel chromaticityφ/|φ| to obtain the potentially lost latent data ⊥φ (erroneously lostby the denoiser in the process of denoising). The weights image map(λ|∆φ|2)/(1 + λ|∆φ|2) show the nonsmooth pixels where the underlyingprior is ineffective, and also shows the weight used at every pixel torecover the latent data to add back. Finally, we obtain our result f∗by adding the recovered latent data to the denoised image φ. On eachimage, the top-left peak signal-to-noise ratio (SNR) value in dB has beencalculated over the whole image, while the one below has been calculatedover only the pixels that we modify with our method, i.e., the nonsmoothareas. The image that shows the improvement map validates our method:the pixels where our method has further reduced the noise compared to φare marked green.6.3.2.1 Image formation modelLet f ∈ R3 be the RGB 3-element vector values of a pixel in the unknown latentimage, and g ∈ R3 be the corresponding noisy observed 3-element RGB vectorpixels value subject to i.i.d. zero-mean additive white Gaussian noise, η ∈ R3, i.e.,g − f = η ∼ N (0,Ση), (6.41)185 gfφηϵ ωFigure 6.15: The relationships between f , φ and g. The unknown true 3-element vector value of an RGB pixel f gets corrupted by noise η the valueg is observed. When we run the a denoiser on g, we get a noise estimateω and the denoised image φ. Since there is no perfect denoiser, there is adifference between the ground truth f and the denoised image φ, denotedby . In this section, we derive statistical properties of this denoising error for estimating the latent data lost.where N is a Gaussian distribution, Ση = σ2ηI ∈ R3×3 is the covariance matrixand ση ∈ R is the standard deviation of noise which can be measured and is henceassumed known.Let φ ∈ R3 be the 3-element (RGB) vector values of the denoised pixelcorresponding to f . We define two types of error and ω,g − φ = , (6.42)where ∈ R3 denotes the known noise estimate, and,f − φ = ω ∼ N (0,Σω), (6.43)where ω ∈ R3 denotes the unknown error—the difference between the groundtruth f and the denoised image φ. We assume ω to be distributed as a zero-meanGaussian.In order to retrieve the latent image data that was removed lost by a denoiser,we need to estimate the denoising error ω ( = f − φ), Since ω cannot be observeddirectly, we estimate its statistical properties instead. We derive Σω below:1861) The error covariance Σω ∈ R3×3 is directly proportional to the noise varianceσ2η . That is, higher noise will lead to higher error,Σω ∝ σ2η. (6.44)2) We expect that the error is high only in the nonsmooth areas. We use theLaplace operator to find the degree of local nonsmoothness of the image.The Laplace operator ∆ ≡ ∇ · ∇ calculates how smooth an image is. Sincethe latent data f is not known, the true smoothness is also unknown. Insteadof using the true smoothness which is not known, we use an approximatemeasurement of smoothness calculated from the denoised image φ,Σω ∝ |∆f |2 ≈ |∆φ|2 . (6.45)3) We expect that in these nonsmooth areas, pixel chromaticity is much betterrestored by a denoiser than the pixel luminance is. That is, we expect thelatent pixel value f to have a chromaticity similar to that of the denoisedvalue φ (for example, the (normalized) chromaticity of a 3-element (RGB)vector φ is the 3-element unit vector φ/ |φ|). Therefore, the three vectors:the latent pixel value f , the denoised value φ, as well as the denoising errorω ( = f − φ), are expected to be parallel but differ in magnitude. In otherwords, we can assume that the covariance matrix Σω is approximately ofrank 1 with the pixel chromaticity φ/ |φ| being the dominant Eigenvector.Since φ/ |φ| ∈ R3 represents a 3-element unit vector value, the covariancematrix Σω becomes a projection operator,Σω ∝ φ · φT|φ|2 ≡ ⊥φ, (6.46)where ⊥φ : R3 → R3 is the projection operator that, given some 3-elementvector RGB pixel value (∈ R3), calculates its component along the pixelchromaticity φ/ |φ|.187Figure 6.16: Results. Each set shows three images: Left: The groundtruth from which the noisy test images were generated. Middle: Thenonsmooth areas where we anticipate that block-matching and 3D filtering(BM3D)’s underlying prior would fail. Our method recovers intrinsic datalost by BM3D at these pixel locations. Right: The map showing where ourmethod has improved the noise estimates: green represents improvement,red opposite. The three numbers on the right are average results of 10 runswith randomly generated noise. They show the peak signal-to-noise ratio(SNR) increase over the whole image, PSNR increase over the pixels thatwere modified by out method, and % improvement of the noise estimateat these pixels.188By combining (6.44), (6.45) and (6.46) we obtain,Σω = λσ2η |∆φ|2⊥φ, (6.47)where λ is a proportionality constant.6.3.2.2 The estimation of the lost latent dataIn this section, we address the following question: given the noisy observation gand the denoised image φ, what is the best estimate of the unknown true value f .In the following, we derive the maximum-a-posteriori (MAP) estimator f∗,f∗ = arg maxfPr (f | φ, g) (6.48)= arg maxfPr (φ | f, g)︸ ︷︷ ︸likelihoodPr (f | g)︸ ︷︷ ︸prior, (6.49)where the likelihood term can be derived as,Pr (φ | f, g) = Pr (f − φ | f, g) (6.50)= Pr (ω | f, g) [(6.43)] (6.51)= exp(−12Σ−1ω ‖ω − 0‖2)[(6.43)] (6.52)= exp(−12Σ−1ω ‖f − φ‖2)[from (6.43)], (6.53)and the prior term as,Pr (f | g) = Pr (f − g | g) (6.54)= Pr (−η | g) [(6.41)] (6.55)= exp(−12Σ−1η ‖η − 0‖2)[(6.41)] (6.56)189= exp(−12Σ−1η ‖f − g‖2)[from (6.41)]. (6.57)By combining the likelihood expression (6.53) and the prior expression (6.57) intothe MAP expression (6.49), and taking the logarithm, we obtain,f∗ = arg minfΣ−1ω ‖f − φ‖2 + Σ−1η ‖f − g‖2 . (6.58)Now we solve (6.58) for f , and apply our observations from the previous sectionsummarized in (6.47),0 = f∗ − φ+ ΣωΣ−1η (f∗ − g) (6.59)= f∗ − φ+ ΣωΣ−1η (f∗ − φ− ) (6.60)= f∗ − φ+ λ |∆φ|2⊥φ(f∗ − φ− ), [from (6.47)] (6.61)where ⊥φ is the operator for projecting vectors onto the pixel chromaticity (6.46).We assume that a denoiser almost accurately restores the chromaticity of thedenoised image pixel φ, i.e., we assume that φ/ |φ| ≈ f∗/ |f∗|, and therefore itfollows that the projection operator has no effect on the MAP estimator: ⊥φf∗ ≈f∗. By utilizing this approximation, from (6.61),0 = f∗ − φ+ λ |∆φ|2 (f∗ − φ−⊥φ). (6.62)Solving (6.62) for f∗, we obtain our algorithm,f∗ = φ+λ|∆φ|21 + λ|∆φ|2⊥φ. (6.63)This shows that the Bayesian argument reduces to a simple algorithm: we addback the potentially lost latent data ⊥φ weighted by a function of smoothness:λ |∆φ|2/(1 + λ |∆φ|2).190Figure 6.16: Our method applied to test images (continued)6.3.2.3 PseudocodeAll operations in Algorithm 6.1 are per-pixel operations. Each pixel is assumed tobe a 3-element (i.e., RGB) vector.191Algorithm 6.1 Improved denoising using a statistical model of color noiseRequire: observed image g ∈ R3, output from a denoiser φ ∈ R3, andregularization parameter λ ∈ R.1: ← g − φ . This is what the denoiser considers as noise.2: ρ← φ/|φ| . We obtain the color of each pixel using the denoised image and3: ⊥ ← ( · ρ) ρ . the component of the noise along this vector.4: L← ∆(g) . We compute spatial per-pixel gradient magnitude of the image,and then5: f∗ ← φ+ λ|L|21+λ|L|2 ⊥ . we add back some of the projected noise to the imagepixels with high gradient magnitudes; this is the output of our algorithm.6.3.3 Results and discussionIn this section, we test the validity of our method and demonstrate that ourmethod results in better denoised image edges (Figure 6.16). In particular, weexperiment with the state-of-the-art denoising algorithm block-matching and 3Dfiltering (BM3D) [21, 24]. We generate test images by adding white Gaussian noiseto the known ground truth images. We run BM3D with the known noise level σηand the default settings. Our proposed method modifies nonsmooth image areasonly; the weight images show the pixels that we modify. We report the peak signal-to-noise ratio (SNR) improvement compared to BM3D over the entire image, andalso over the pixels that we update. These numbers show that, for all the test casespresented here, our denoising results are on average about 0.1 dB better than that ofBM3D over the pixels that we modify.We use λ = 1 for all of our experiments. It is possible to vary λ from image toimage in order to obtain the best improvement in terms of PSNR.The expected improvement from our method varies with the level of noisepresent in the input image. A smaller noise variance σ2η means a denoiser canreconstruct a near-perfect denoised image from the noisy input, which implies thatthere is little need for improvement (6.44). As the noise level increases, our method192Figure 6.17: Proposed improvement in denoising vs. noise level. Theaverage performance of our method (red curve) initially increases withmore noise ση up to the peak average improvement of noise estimate ofaround 8%, but then it starts to decrease as we predict in the text. We ranour experiments for a number of noise levels. For each noise level, we ranour method on 10 randomly generated test cases. The %-improvements ofthe noise estimate using our method over BM3D for each fixed noise levelση is shown on a blue vertical line. The ground truth image we used togenerate this plot is shown in the inset.can contribute more to the improvement. However, very high noise can impedeour method: at higher noise levels, BM3D, as well as other denoisers, starts tosmear noisy sharp edges in the denoised image which we use in the approximatecomputation of image smoothness in (6.45). As a result, in case of high noiselevels, BM3D leaves fewer nonsmooth image areas for our method to improve thequality of. This is demonstrated in Figure 6.17 which summarizes the results of ourexperiments with a variety of noise levels.This study proposes the use of the low frequency nature of observed pixelchromaticity to restore sharp edges in denoised images. Image denoising methodscan erroneously blur some edges and high frequency areas of a noisy image in193the process of denoising. In this section, we have presented a Bayesian approach toimprove existing denoising results. Our proposed method utilizes the low frequencynature of observed pixel chromaticity and retrieves image data lost by the denoiser.We illustrate the effectiveness of our method by applying it to improve the resultsproduced by the state-of-the-art denoising algorithm BM3D.6.4 SummaryIn this chapter, we consider the problem of expanding the dynamic range (DR)of single images with RGB color information. Conventional multi-exposure highdynamic range imaging (HDRI) techniques combine multiple standard dynamicrange (SDR) images exposed with different camera and lens settings. Thesetechniques, however, cannot be applied to extend the DR of the vast collection ofSDR images that we already have. It is well-known that the different color channelsin a captured RGB image tend to be correlated (Section 3.1.1). This property ofnatural images is called cross-color-channel correlation. In this chapter, we utilizethis prior knowledge in three cases of single-image HDRI.In Section 6.1, our method is based on the observation that, in many cases,colored lights may only result in clipping some of the color channels while leavingothers unaffected. This novel gradient-space algorithm restores discolorationartifacts due to clipping. We have demonstrated that our algorithm generatessmooth and artifact-free results in many real life situations. We have presentedcomparisons with recent work and demonstrated the advantages of our gradient-based approach. Since all parts of our algorithm can be cast as simple Poissonproblems, the algorithm can be easily implemented and incorporated in modernimage processing.In Section 6.2, we propose another new method for expanding the DR of asingle SDR image by exploiting the non-local self-similarity property of naturalimages and scene lighting variations. Our method first uses a proposed robustdistance metric to match ill-exposed (i.e., over- or under-exposed) patches of194the input image with well-exposed patches from the same image and then usesdetails from these matched patches to restore the ill-exposed patches. Our resultsdemonstrate improvement over the state-of-the-art.In section 6.3 we utilize the cross-color channel correlation in a differentway: we use it to analyze noise and propose an improved method for denoising.Removing noise from images can result in smoothing of edges and of areas withdiscontinuities. Too much smoothing, however, results in poor image quality. Wederive a Bayesian method that estimates the parts of the latent image data in thenonsmooth areas that a denoiser removes. We demonstrate that adding back thisrecovered part of the latent image data improves the denoising performance. Wedemonstrate the effectiveness of our method by applying it to improve the resultsproduced by the state-of-the-art denoiser block-matching and 3D filtering (BM3D).195Chapter 7ConclusionsDynamic range (DR)—the range of brightness values camera pixels can measurein one exposure—is a physical, hard limit that prevents cameras from capturingimages with full fidelity. High dynamic range (HDR) photography can captureimages with high fidelity, but conventional high dynamic range imaging (HDRI)methods are expensive for many real-world situations. In this thesis, we analyzethis problem, and develop a few practical and inexpensive methods for single-imageHDRI using only off-the-shelf photography hardware.The DR in a natural scene is usually too high for conventional off-the-shelfstandard dynamic range (SDR) cameras to capture, even with optimal exposuresettings. Since the DR of natural scenes is high, a camera fails to capture details inimage areas that are too bright or too dark. In a SDR image, the intensity values ofsome bright objects (highlights) that are above the maximum exposure capacity are“clipped” (effectively, deleted) due to sensor over-exposure, while objects that aretoo dark (shades) appear dark and noisy in the image. On the other hand, a HDRimage would contain full scene details with high fidelity. However, as we discuss inChapter 2, the existing HDRI methods are either expensive, or impractical in real-world situations; particularly because real-world situations are often very dynamic,are full of a variety of bright colors, and have a varied illumination.In Chapter 3, we present a framework for obtaining HDR images usinga SDR camera and a single-image. We believe that a general solution forthis general single-image HDRI problem is very hard, if not impossible; so, tosolve this hard problem, we divide-and-conquer: we develop a framework thatlets us develop individual methods, each method addressing a particular case196of the general problem. Each of these special imaging cases is more relaxedversions of the general problem, and therefore are more tractable than the generalproblem. Together these cases provide almost comprehensive coverage of allpossible HDRI cases as far as natural imaging is concerned. More specifically,this framework allows the selection and use of state-of-the-art computationaloptimization techniques to solve single-image HDRI problems.In Chapters 4–6, we present and develop single-image HDRI methods to obtainHDR images in common imaging cases. These methods are developed based on theframework we discussed above. The methods that we have presented in this thesisfocus on the three most common cases when HDRI is necessary (a brief discussionon this can be found in Chapter 1)• Firstly, in Chapter 4, we make use of an off-the-shelf optical photographicfilter and develop a method using our framework to obtain HDR images ofnight or dark indoor scenes with a single exposure.• Secondly, in Chapter 5, we use a new, off-the-shelf image sensor feature(namely, exposure-multiplexing) and our framework to obtain HDR imagesof daytime scenes or scenes with strong illumination variations in general.• Finally, in Chapter 6, we propose methods for cases for which we cannotapply any camera modification for extending the DR of existing SDRphotographs.We discuss these methods in brief in Section 7.2.The novel methods that we have proposed expand the DR capabilities oftoday’s cameras. In the process of developing these methods, we have alsodeveloped a few novel image priors such as the smooth contour prior in Chapter 5and the intensity-invariant non-local self-similarity prior in Chapter 6. These priorsare by design applicable to single-image HDRI as well as other imaging problems(e.g., we demonstrate the application of smooth contour prior to super-resolution(SR) in Chapter 5). We discuss our contributions in Section 7.3.1977.1 Our high dynamic range imaging (HDRI) frameworkA general solution for single-image HDRI problems is very hard, maybe impossible.Hence, we divide-and-conquer: we develop a framework that lets us addressdifferent solutions for a few special cases of natural scenes. Each of these cases ismore tractable than the general problem, and the methods we developed for thesecases provide almost comprehensive coverage.In Chapter 3, we develop this mathematical framework (Section 3.4) forformalizing and inverting the general forward imaging process of single-imageHDRI. More specifically, this framework• describes a generalized high dynamic range (HDR) image reconstructionproblem (in other words, an “inverse problem”),• leverages natural image priors suitable for the various cases of HDRI, and• allows the use of state-of-the-art convex computational optimizationtechniques such as primal-dual convex optimization.In this thesis, we focus on three most common cases, when HDRI is necessary(discussed in Section 7.2 next).The framework makes it easy to develop HDRI methods. To develop a solutionfor a type or case of single-image HDRI using this framework, we need to derivethe framework “components” for that case. Conceptually, our framework has twomajor components:1) a “data-fitting term”, corresponding to the forward model (image formationmodel), makes sure that the final reconstructed HDR image is consistent withthe observed standard dynamic range (SDR), and2) “image prior terms”, corresponding to what we know about how a naturalscene should look, ensure that the reconstructed HDR image is “feasible”,i.e., the reconstructed HDR image does not contain unrealistic, undesirable,visible artifacts (or basically, error in reconstruction).198Obtaining the unknown latent HDR image given an observed SDR image isby definition an inverse problem. We are trying to “invert” the physical SDRimaging process, which we assume to be in the forward direction as far as the SDRimage formation is concerned, and is often aptly called the “forward process”. Weopt for computing a maximum-a-posteriori (MAP) estimate of the unknown latentHDR image and derive the corresponding image reconstruction problem undera Gaussian noise assumption. All numerical solutions to such formulations areiterative,• the solution is initialized with some arbitrary guess, and then• some form of a gradient descent algorithm is used to iteratively refine thesolution until convergence.In our framework, the solution space is convex, and therefore we are guaranteed tosolve the problem.To demonstrate how to use this framework (Section 3.4.3), we chose towork with the famous Rudin-Oshar-Fatemi (ROF) model for denoising. The ROFmodel is a simple inverse problem designed for denoising; it makes a Gaussiannoise assumption and leverages the sparse gradient prior [80, 106] in the problemformulation. It is a good example because it is a simple model that we canobtain by simplifying our framework (i.e., the generalized HDRI inverse problem).Put another way; one could consider the ROF model as the “basis” of the imageformation models we use. Of course the various reconstruction problems that wepropose in the subsequent chapters (also discussed in Section 7.2 next) incorporatemore complex forward models and image priors.We also demonstrate how to reformulate the ROF problem (and by extension,our single-image HDRI problems, e.g., as we do in Chapter 5) as a saddle pointproblem and then solve this equivalent problem using a much more efficient primal-dual convex optimization algorithm.The methods we have developed in Chapters 4–6 also demonstrate that thisframework can leverage a variety of image priors and even a combination of priors199(as we demonstrate with the combined smooth contour prior and the sparse gradientprior, in Chapter 5). Our framework is not limited to the image priors that we usein this thesis. As new and more sophisticated image priors are invented, those newpriors can be plugged into our proposed framework to obtain new and improvedHDRI methods.Mathematical derivations similar to our framework are available in theliterature. Therefore, we cannot claim novelty on the entire framework, but ratherspecific parts or aspects of it, such as• Our forward model is a novel image formation model we developedspecifically for HDRI.• We are the first to propose aperture-filtering for HDRI. We demonstrate asolution in Chapter 4. Our framework is designed to incorporate other formsof aperture-filtering as well.• We propose to use general sensor-filtering for HDRI. We develop asolution using exposure-multiplexing in Chapter 5; exposure-multiplexingis a specific form of sensitivity-multiplexing. Our framework is designed toincorporate other forms of sensor-filtering as well.7.2 High dynamic range imaging (HDRI) methodsdeveloped: results and conclusionsThis thesis presents some novel methods suitable for single-image high dynamicrange imaging (HDRI). Our methods employ state-of-the-art computationaloptimization techniques and robust natural image priors for image reconstruction.We have shown that our methods outperform state-of-the-art methods.The general single-image HDRI problem is hard, if not impossible, to developa solution for. Instead of solving the general problem directly, we investigate three200most common cases in HDRI of natural images. Each of these cases is a relaxationof the general single-image HDRI problem.7.2.1 Methods utilizing aperture-filtering for single-image highdynamic range imaging (HDRI)These apply to cases that occur when the scene has highlights that occupy a smallnumber of pixels such as in the images of night-time or dark indoor scenes.We make use of an off-the-shelf optical, photographic filter, installed at thelens aperture, and develop a method that relies on the framework of Chapter 3to obtain high dynamic range (HDR) images using with a single exposure only.During post-processing, we automatically detect the spread-out brightness createdby this off-the-shelf optical filter and use this spreading information to reconstructthe clipped highlights.The general idea in Chapter 4 is using this filter at the aperture, i.e., aperture-filtering. We choose to use a photographic filter known as the cross-screen filteror simply the “star filter.” This filter creates structured light streaks in the scenewherever there are highlights. The star-shaped light streaks contain structuralinformation about those clipped highlights.The results demonstrate that our method can estimate the spreading of thelight streaks caused by the attached cross-screen filter and reconstruct the clippedhighlights from the light streaks.7.2.2 Methods utilizing sensitivity-multiplexing for single-image highdynamic range imaging (HDRI)The goal of this Chapter 5 is imaging day-time scenes or scenes with strongillumination variations, where highlights occupy a large part of the scenes. Thesolution proposed in Chapter 4 would not work in this case since the assumptionsabout small saturated highlights break down in bright daylight. We propose to apply201a new, off-the-shelf image sensor feature (namely, exposure-multiplexing [1]); weapply our framework to develop a method to obtain high dynamic range (HDR)images from standard dynamic range (SDR) captures taken with such a sensor.These are applicable to cases that occurs when the scene has highlights that occupya small number of pixels such as in the images of night-time or dark indoor scenes.We make use of an off-the-shelf optical photographic filter, installed at the lensaperture, and develop a method that rely on the framework of Chapter 3 to obtainHDR images using with a single exposure only. During post-processing, we detectthe spread-out brightness caused by this off-the-shelf optical filter and use thisspreading information to reconstruct the clipped highlights.In the process of the development of this high dynamic range imaging (HDRI)method, we also propose a novel edge-preserving image prior we refer to as thesmooth contour prior. It promotes the reconstructed image edges to be smoothalong the natural contours of those edges, and thus it complements the smoothcontour prior (Section 5.6) and helps reconstruct the unknown pixel values byinterpolating along the contours of strong spatial edges. The resulting “combined”prior is very strong (see Figure 5.8 and Figure 5.9); as we demonstrate using asimple super-resolution (SR) experiment, edges reconstructed with our proposedprior are both sharp and smooth along their contours. The proposed smooth contourprior uses an edge-aware anisotropic filter as the primary building block, and wechoose to use edge-directed interpolation (EDI) as the edge-aware anisotropic filter.We also improve the original method [84] in speed and stability (Section 5.6.2).7.2.3 Methods utilizing cross-color-channel correlation forsingle-image dynamic range (DR) expansionColor Information loss resulting from clipping one or two out of the three (RGB)color channels can affect the mood of the scene or make its rendition unrealistic. Wedevelop three methods that restore partially clipped color highlights and color noiseusing the partial unclipped information. Our algorithms manage to reconstruct thecorrect color and to restore washed out details, and color noise, in partially clipped202regions. Restoring the clipped regions reconstructs the highlights and reducingnoise reconstructs the shades; the result of these reconstructions is a higher dynamicrange (DR). For each method, we obtain a maximum-a-posteriori (MAP) estimateof the unknown high dynamic range (HDR) image by analyzing and inverting theforward imaging process.In Section 6.1, we have developed a novel gradient-space algorithm to restorediscoloration artifacts due to clipping, which demonstrates that our algorithmgenerates smooth and artifact-free results in many real-life situations. Thealgorithm is easy to implement and to incorporate in modern image processingsoftware, and the results demonstrated the advantages of our gradient-basedapproach.In Section 6.2, we propose a new method for expanding the DR of a singlestandard dynamic range (SDR) image by exploiting the non-local self-similarityproperty of natural images and scene lighting variations. This method producesa HDR image from a single SDR exposure using a novel application of thenatural image non-local self-similarity image prior in an intensity-invariant way byreconstructing poorly-exposed parts of a SDR image using content from the patchesin well-exposed image areas. We extend the traditional non-local self-similarityimage prior by using our proposed robust distance function and obtain a novelimage prior we refer to as intensity-invariant non-local self-similarity. Resultsobtained utilizing our method demonstrates that the performance of our methodis better than the state-of-the-art block-matching and 3D filtering (BM3D).In Section 6.3, We also have taken a Bayesian approach to improve existingdenoising methods. This proposed improvement utilizes the low-frequency natureof the observed pixel chromaticity and retrieves the image data lost by a denoiser.We illustrate the effectiveness of this method by applying it to improve imagesdenoised by the state-of-the-art denoising algorithm BM3D [21, 24].2037.3 ContributionsIn this thesis, we analyze three special cases of natural scenes and develop a fewpractical and inexpensive methods for single-image high dynamic range imaging(HDRI) using off-the-shelf photography hardware and post-processing methods(full discussion in Section 7.2). We propose solutions for increasing the dynamicrange (DR) of a single image captured by a camera with a standard dynamicrange (SDR). The novel methods we have proposed expand the DR capabilities oftoday’s cameras. Our methods employ state-of-the-art computational optimizationtechniques and robust natural image priors to produce the best possible imagereconstruction. Our methods outperform state-of-the-art methods.We develop a mathematical framework for formalizing, and inverting, thegeneral forward imaging process (full discussion in Section 7.1 above). We opt for amaximum-a-posteriori (MAP) estimate and derive the corresponding minimizationproblem under Gaussian noise assumption. We develop a framework that lets usaddress a few special cases of natural scenes. The framework uses a state-of-the-artcomputational optimization technique and leverages natural image priors suitablefor the various cases of HDRI. This framework is not limited to known imagepriors. We develop two other image priors that we use in this thesis. As new andmore sophisticated image priors are invented, those new priors can be plugged intoour proposed framework to obtain new and improved HDRI methods.In addition to the HDRI framework discussed above, the followingcontributions are made and demonstrated in this thesis:1) In Chapter 4, we make use of an off-the-shelf optical, photographic filter,installed at the lens aperture, and develop a method using our frameworkto obtain a high dynamic range (HDR) image from a SDR image (capturedusing a single exposure). During post-processing, we detect the spread-outbrightness and use this information to reconstruct the highlights that had beenclipped by the SDR camera.2042) We investigate, in Chapter 5, HDRI for scenes with strong illuminationvariations and highlights (bright areas) occupying large parts of the scenes.We use a new, off-the-shelf image sensor with exposure-multiplexing [1](i.e., sensitivity-multiplexing) feature and use our framework to obtain HDRimages from a single exposure of the sensor. This feature allows us to capturethe image’s odd rows with one exposure and the even rows with a differentexposure. In post-processing, we reconstruct a HDR image.3) We have developed a fast method for performing an anisotropic filter knownas the edge-directed interpolation (EDI) [84]. Our algorithm is an order ofmagnitude faster than the traditional EDI (Section 5.6.2).4) Based on the fast EDI algorithm above, we have proposed a fast-to-computeimage prior, namely the smooth contour prior. While any other anisotropicfilter can be fitted in this proposed image prior, we prefer to use the EDIfilter because of its speed. We have also observed that the smooth contourprior works in a complementary fashion with the sparse gradient prior. Wehave demonstrated that this novel smooth contour prior can be applied toproblems such as super-resolution (SR) and single-image HDRI. We believethat this image prior has potential applications in other imaging problems aswell, such as denoising and inpainting.5) In Section 5.6.3, we also propose a new method for super-resolution (SR).We demonstrated that our method outperforms state-of-the-art SR methods.6) In Section 6.1, we investigate the case when clipped (due to sensor over-exposure) parts (highlights) of a captured image are not white but have a(RGB) color. In such cases, we restore the missing image details in a clippedcolor channel by analyzing the scene information available in the other colorchannels in the captured image (that are not clipped). That is we utilize theavailable “partial” pixel intensity information to restore the partially clippedcolor highlights, where the clipping can affect the mood of the scene ormake its rendition unrealistic. Our algorithm reconstructs the correct color205and restores the washed out details in partially clipped regions. Our resultsdemonstrate that our algorithm generates smooth and artifact-free results.7) In Section 6.2, we propose a method that produces a HDR image from asingle SDR exposure using a novel application of the natural image non-localself-similarity prior in an intensity-invariant way by reconstructing poorly-exposed parts of a SDR image using content from the patches in well-exposedimage areas. Results obtained utilizing our method demonstrate that theperformance of our method is better than the state-of-the-art block-matchingand 3D filtering (BM3D).8) In Section 6.2, based on the non-local self-similarity image prior and using arobust distance function, we propose a novel image prior, intensity-invariantnon-local self-similarity. We have used it for HDRI and denoising, but webelieve that this image prior has applications in other imaging problems(For example inpainting can use this robust image prior. Since this prioris intensity-invariant, chances of finding a matching patch are higher than theconventional non-local self-similarity prior which only searches for matchingpatches that are equally bright. Consequently, our prior should help to reacha convergence faster, compared to non-local self-similarity9) In Section 6.2, we develop a robust distance function that is intensity-invariant. This distance metric is robust against minor mismatches; in orderwords, it can be used to find similar image patches even under differentlighting conditions. We have used this distance function for denoising andHDRI. This robust distance function potentially has further applications inimaging (such as image- or video-compression).10) In Section 6.3, we develop a Bayesian method to improve existing denoisingmethods. Our proposed method utilizes the low-frequency nature of observedpixel chromaticity to retrieve image data lost by a denoiser. We haveillustrated the effectiveness of our method by applying it to images denoisedby the state-of-the-art denoising algorithm BM3D [21, 24]. This methodapplies to other denoising techniques as well.2067.4 LimitationsWhile in this thesis we thrive to push the boundary on the dynamic range(DR) limit, to make this technology desirable for the masses, we also need toreduce reconstruction artifacts. Our proposed methods fall under the umbrellaof computational imaging; and like many computational imaging techniques, ourmethods also perform trade-offs. In the pursuit of a higher DR, our methodspay a cost elsewhere, as indicated in Chapter 4, that our method reconstructs thehighlights at the cost of signal-to-noise ratio (SNR) in the image areas where theintroduced optical, photographic filter results in the spreading of bright objects.It further may be noted that the examples presented in the same chapter show anincreased residual glare in some parts of the reconstructed high dynamic range(HDR) images.7.5 Future work7.5.1 Auxiliary high dynamic range (HDR) cameraConsumers do not want to see reconstruction artifacts in their images. A quicksolution to remove artifacts would be to apply our method in parallel to a regularcapture. In other words, two cameras can be used simultaneously; one wouldtake a usual picture; while the second one would implement one or more ofour sophisticated single-image high dynamic range imaging (HDRI) algorithms.Finally, the two images can be fused in a way that reduces reconstruction artifacts.Such multiple cameras systems could be installed such that all cameras areon the same optical path by using a beam splitter. However, since beam splitterscan cause unwanted scattering inside the optical system, an elegant solution maybe to use separate optical paths (i.e., off-axis imaging). There is further scope forcarrying out research investigation for developing methods or algorithms to analyzethe multi-image formation model. Our framework can be extended to accommodatemultiple cameras and off-axis imaging for such artifact-free HDRI.2077.5.2 Filtering somewhere in between the aperture and sensorBoth of our proposed filters are planar filters: the sensor-filtering we proposed inChapter 5 happens on the sensor, and therefore is a planar filter by definition sincesensors are planar, and the cross-screen filter we use in Chapter 4 is also a planarfilter. Planar filters are easier to deal with in mathematical derivations. In particular,the two locations of filtering we chose are easy to model• sensor-filtering is simply point-multiplication, and• aperture-filtering is simply a convolution.However, it is technically possible to put a filter somewhere in between. It is alsopossible to put a non-planar filter.Such a non-planar “filter” could act as a light guide. It would warp theincoming light from the scene in a way that interesting parts of a scene (such aspeople’s faces) are imaged by more pixels, and therefore these parts of the sceneare acquired with higher fidelity, thus increasing the effective dynamic range (DR).This a broad space; foveated imaging is a good example of one type of suchimaging method. A foveated image acquisition system would require customoptics; for instance, Thiele et al. [132] use a 3D-printed compound microlenssystem to capture foveated image capture.7.5.3 Aperture-filtering and deep learningRecently Eilertsen et al. [31] proposed a single-image clipping restoration using adeep convolutional neural network. Their approach is to train a deep network withlots of clipping data. Unfortunately, this reconstruction problem is still heavilyunderdetermined, and probably impossible to solve due to the lack of informationin the captured image. In other words, the missing clipped pixel data is completelymissing; even if a deep network can help reconstruct missing pixels, it wouldbe nothing more than hallucinating. Instead of the deep network working on208regular standard dynamic range (SDR) images, we could apply an aperture-filteringto make the reconstruction problem a little better posed. We believe that SDRimages captured with aperture-filtering would have a better reconstruction oncethe network is trained, compared to no aperture-filtering.209Bibliography[1] a1ex. Dynamic range improvement for some canon DSLRs by alternatingISO during sensor readout. http://acoutts.com/a1ex/dual_iso.pdf, July2013. Accessed: 2018-02-24. 9, 18, 98, 101, 102, 103, 105, 202, 205[2] a1ex. Dual ISO - massive dynamic range improvement, magic lantern forum.http://www.magiclantern.fm/forum/?topic=7139.0, 2013. Accessed:2017-04-01. 106[3] Abel, J. and Smith III, J. Restoring a clipped signal. In Proc. Int. Conf. onAcoustics, Speech, and Signal Processing, 1991, pages 1745–1748, 1991.21[4] Ables, J. Fourier transform photography: a new method for x-ray astronomy.Publications of the Astronomical Society of Australia, 1(4):172–173, 1968.xix, 27[5] Ackerman, E. Fatal tesla self-driving car crash reminds us that robots aren’tperfect. IEEE-Spectrum, 1, 2016. 1[6] Adams, A., Jacobs, D. E., Dolson, J., Tico, M., Pulli, K., Talvala, E.-V.,Ajdin, B., Vaquero, D., Lensch, H., Horowitz, M., et al. The frankencamera:an experimental platform for computational photography. ACM Trans.Graph., 29(4):29, 2010. 16[7] Aggarwal, M. and Ahuja, N. Split aperture imaging for high dynamic range.Int. J. of Computer Vision, 58(1):7–17, 2004. 19210[8] Anscombe, F. J. The transformation of poisson, binomial and negative-binomial data. Biometrika, 35(3/4):246–254, 1948. 5[9] Bae, M., Jo, S.-H., Choi, B.-S., Lee, H., Choi, P., and Shin, J.-K. Widedynamic range linear-logarithmic CMOS image sensor using photogate andcascode MOSFET. Electronics Letters, 52(3):198–200, 2016. 20[10] Bando, Y. and Nishita, T. Towards digital refocusing from a singlephotograph. In Proceedings of Pacific Graphics, pages 363–372, 2007. 26[11] Banterle, F., Ledda, P., Debattista, K., and Chalmers, A. Inverse tonemapping. In Proc. GRAPHITE ’06, pages 349–356, 2006. 22[12] Barnes, C., Shechtman, E., Finkelstein, A., and Goldman, D. Patchmatch:a randomized correspondence algorithm for structural image editing. ACMTrans. Comput. Graph., 28(3):24, 2009. 170, 171, 173[13] Bay, H., Tuytelaars, T., and Van Gool, L. Surf: Speeded up robust features.In European conference on computer vision, pages 404–417. Springer, 2006.xxix, 15[14] Bayer, B. Color imaging array, 07 1976. US Patent 3,971,065. xv, xx, 18,38, 54, 99, 107, 108, 113, 138, 157, 183[15] Bertalmio, M., Sapiro, G., Caselles, V., and Ballester, C. Image inpainting.In ACM SIGGRAPH’00, pages 417–424, 2000. 23[16] Buades, A., Coll, B., and Morel, J.-M. A non-local algorithm forimage denoising. In Proc. IEEE Comput. Vision and Pattern Recognition,volume 2, pages 60–65, 2005. 24, 169, 184[17] Bürker, M., Rößing, C., and Lensch, H. Exposure control for HDR video.In SPIE Photonics Europe, pages 913805–913805, 2014. 16[18] Chambolle, A. and Pock, T. A first-order primal-dual algorithm for convexproblems with applications to imaging. Math. Imaging and Vision, 40(1):120–145, 2011. 65, 66, 68, 114, 115, 124, 128, 130, 131, 132, 140, 141, 178211[19] Chang, S. G., Yu, B., and Vetterli, M. Adaptive wavelet thresholding forimage denoising and compression. IEEE Trans. Image Process., 9(9):1532–1546, 2000. 24, 184[20] Cole, M. A tedious explanation of the f/stop. http://www.uscoles.com/fstop.htm, 2018. Accessed: 2018-02-24. xxiii[21] Dabov, K., Foi, A., Katkovnik, V., and Egiazarian, K. Image denoisingwith block-matching and 3d filtering. In Electronic Imaging, pages 606414–606414. Int. Soc. for Optics and Photonics, 2006. xx, 25, 114, 117, 145,169, 175, 179, 183, 184, 192, 203, 206[22] Dabov, K., Foi, A., Katkovnik, V., and Egiazarian, K. Image denoisingby sparse 3-D transform-domain collaborative filtering. IEEE Trans. ImageProcess., 16(8):2080–2095, 2007. 127[23] Dai, S., Han, M., Xu, W., Wu, Y., Gong, Y., and Katsaggelos, A. K. Softcuts:a soft edge smoothness prior for color image super-resolution. IEEE Trans.Image Process., 18(5):969–981, 2009. 126, 135, 136[24] Danielyan, A., Katkovnik, V., and Egiazarian, K. BM3D frames andvariational image deblurring. IEEE Trans. Image Process., 21(4):1715–1728, 2012. xx, 25, 114, 145, 179, 183, 184, 192, 203, 206[25] Dcraw. Decoding software for raw digital photos from cameras. http://www.cybercom.net/~dcoffin/dcraw/. Accessed: 2018-02-24. xxi, 23,164[26] Debevec, P. E. and Malik, J. Recovering high dynamic range radiance mapsfrom photographs. In ACM SIGGRAPH’97, pages 369–378, 1997. 13, 153[27] Debevec, P. and Malik, J. Recovering high dynamic range images. Inproceeding of the SPIE: Image Sensors, volume 3965, pages 392–401, 1997.103, 149[28] Dicke, R. Scatter-hole cameras for x-rays and gamma rays. TheAstrophysical Journal, 153:L101, 1968. xix, 27212[29] Didyk, P., Mantiuk, R., Hein, M., and Seidel, H. Enhancement of brightvideo features for HDR displays. Computer Graphics Forum, 27(4):1265–1274, 2008. 22, 93[30] Dong, C., Loy, C. C., He, K., and Tang, X. Learning a deep convolutionalnetwork for image super-resolution. In Computer Vision–ECCV 2014, pages184–199. Springer, 2014. 126, 134, 136[31] Eilertsen, G., Kronander, J., Denes, G., Mantiuk, R. K., and Unger, J.Hdr image reconstruction from a single exposure using deep cnns. ACMTransactions on Graphics (TOG), 36(6):178, 2017. 23, 208[32] El Gamal, A. High dynamic range image sensors. In Tutorial atInternational Solid-State Circuits Conference, 2002. 20, 21[33] Elboher, E. and Werman, M. Recovering color and details of clipped imageregions. In Proc. CGVCVIP, 2010. 23[34] Elder, J. and Goldberg, R. Image editing in the contour domain. IEEE Trans.PAMI, 23(3):291–296, 2001. 154[35] Farsiu, S., Robinson, M. D., Elad, M., and Milanfar, P. Fast and robustmultiframe super resolution. IEEE Trans. Image Process., 13(10):1327–1344, 2004. 118, 125[36] Fechner, G. T. Elemente der Psychophysik: Zweiter Theil. Breitkopf undHärtel, 1860. 20[37] Fenimore, E. E. and Cannon, T. M. Coded aperture imaging with uniformlyredundant arrays. Applied optics, 17(3):337–347, 1978. xix, 27[38] Fernandez-Granda, C. and Candes, E. J. Super-resolution via transform-invariant group-sparse regularization. In Proc. IEEE Int. Conf. on Comput.Vision, pages 3336–3343. IEEE, 2013. 126[39] Fischler, M. A. and Bolles, R. C. Random sample consensus: a paradigm formodel fitting with applications to image analysis and automated cartography.In Readings in computer vision, pages 726–740. Elsevier, 1987. xxvii, 15213[40] Foi, A. Clipped noisy images: Heteroskedastic modeling and practicaldenoising. Signal Processing, 89(12):2609–2629, 2009. 21[41] Foi, A., Katkovnik, V., and Egiazarian, K. Pointwise shape-adaptive DCTfor high-quality denoising and deblocking of grayscale and color images.IEEE Trans. Image Process., 16(5):1395–1411, 2007. 24, 184[42] Foi, A., Trimeche, M., Katkovnik, V., and Egiazarian, K. Practicalpoissonian-gaussian noise modeling and fitting for single-image raw-data.Image Processing, IEEE Transactions on, 17(10):1737–1754, 2008. 38, 55[43] Freedman, G. and Fattal, R. Image and video upscaling from local self-examples. ACM Trans. Graph., 28(3):1–10, 2010. ISSN 0730-0301. doi:http://doi.acm.org/10.1145/1531326.1531328. 117, 128, 134, 136[44] Freeman, W. T., Jones, T. R., and Pasztor, E. C. Example-based super-resolution. IEEE Comput. Graph. Appl., 22(2):56–65, 2002. 126[45] Gallo, O., Gelfand, N., Chen, W., Tico, M., and Pulli, K. Artifact-free highdynamic range imaging. In Proc. ICCP, 2009. 15[46] Ginosar, R., Hilsenrath, O., Zeevi, Y. Wide dynamic range camera. 1992.13[47] Glasner, D., Bagon, S., and Irani, M. Super-resolution from a single image.In Proc. IEEE Int. Conf. on Comput. Vision, pages 349–356. IEEE, 2009.128[48] Granados, M., Ajdin, B., Wand, M., Theobalt, C., Seidel, H.-P., and Lensch,H. Optimal hdr reconstruction with linear digital cameras. In Proc. CVPR,pages 215–222, 2010. 15, 16[49] Guo, D., Cheng, Y., Zhuo, S., and Sim, T. Correcting over-exposure inphotographs. In Proc. CVPR, pages 515–521. IEEE, 2010. 23[50] Guthier, B., Kopf, S., and Effelsberg, W. A real-time system for capturingHDR videos. In Proc. of the 20th ACM intl. Conf. on Multimedia, pages1473–1476, 2012. 16214[51] HaCohen, Y., Shechtman, E., Goldman, D. B., and Lischinski, D. Non-rigid dense correspondence with applications for image enhancement. ACMTrans. Comput. Graph., 30(4):70, 2011. 149[52] Hajisharif, S., Kronander, J., and Unger, J. HDR reconstruction foralternating gain (ISO) sensor readout. In Proc. Eurographics Short Papers,pages 25–28. The Eurographics Association, 2014. 18[53] Hajisharif, S., Kronander, J., and Unger, J. Adaptive dualiso hdrreconstruction. EURASIP Journal on Image and Video Processing, 2015(1):41, 2015. 106[54] Hasinoff, S. W. and Kutulakos, K. N. A layer-based restoration frameworkfor variable-aperture photography. In Computer Vision, 2007. ICCV 2007.IEEE 11th International Conference on, pages 1–8. IEEE, 2007. 103[55] Hasinoff, S. W., Durand, F., and Freeman, W. T. Noise-optimal capture forhigh dynamic range photography. In Proc. IEEE Comput. Vision and PatternRecognition, pages 553–560, 2010. 15, 149[56] Hasinoff, S. W., Sharlet, D., Geiss, R., Adams, A., Barron, J. T., Kainz, F.,Chen, J., and Levoy, M. Burst photography for high dynamic range and low-light imaging on mobile cameras. ACM Trans. on Graph., 35(6):192, 2016.13, 103, 149[57] He, K., Sun, J., and Tang, X. Single image haze removal using dark channelprior. IEEE Trans. on Pattern Analysis and Machine Intelligence,, 33(12):2341–2353, 2011. 128, 133, 136[58] He, L., Qi, H., and Zaretzki, R. Beta process joint dictionary learning forcoupled feature spaces with application to single image super-resolution. InProc. IEEE Conf. on Comput. Vision and Pattern Recognition, pages 345–352. IEEE, 2013. 126, 134, 137[59] Heide, F., Steinberger, M., Tsai, Y.-T., Rouf, M., Pajak, D., Reddy, D., Gallo,O., Liu, J., Heidrich, W., Egiazarian, K., et al. FlexISP: A flexible camera215image processing framework. ACM Trans. on Graphics, 33(6), 2014. 18,100, 105, 106, 117, 127[60] Heidrich, W. and Rouf, M. Color highlight reconstruction, May 3 2012. USPatent App. 13/463,775. 23[61] Horowitz, P. and Hill, W. The art of electronics. Cambridge UniversityPress, 1989. 39[62] Hu, J., Gallo, O., Pulli, K., and Sun, X. Hdr deghosting: How to deal withsaturation? In Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, pages 1163–1170, 2013. 15, 105, 149[63] Huang, T., Yang, G., and Tang, G. A fast two-dimensional median filteringalgorithm. IEEE Trans. Acoust., Speech, Signal Process., 27(1):13–18,1979. 24, 183[64] ISO 12232:2006. Photography – Digital still cameras – Determinationof exposure index, ISO speed ratings, standard output sensitivity, andrecommended exposure index. Standard, International Organization forStandardization, Geneva, CH, April 2006. xxiii, 6, 9, 35, 101, 103[65] Joshi, N., Zitnick, C., Szeliski, R., and Kriegman, D. Image deblurring anddenoising using color priors. In Proc. Conf. on Comput. Vision and PatternRecognition, 2009. 23[66] Kahn, D. The history of steganography. In International Workshop onInformation Hiding, pages 1–5. Springer, 1996. 25[67] Kak, A. and Slaney, M. Principles of computerized tomographic imaging.SIAM, 2001. 88[68] Kalantari, N. K., Shechtman, E., Barnes, C., Darabi, S., Goldman, D. B.,and Sen, P. Patch-based high dynamic range video. ACM Trans. Graph., 32(6):202:1–202:8, November 2013. 15216[69] Kalantari, N. K., Shechtman, E., Barnes, C., Darabi, S., Goldman, D. B.,and Sen, P. Patch-based high dynamic range video. ACM Trans. Comput.Graph., 32(6):202–1, 2013. 149[70] Kang, S., Uyttendaele, M., Winder, S., and Szeliski, R. High dynamic rangevideo. ACM Transactions on Graphics, 22(3):319–325, 2003. 16[71] Kang, S. B., Uyttendaele, M., Winder, S., and Szeliski, R. High dynamicrange video. In ACM Trans. Comput. Graph., volume 22, pages 319–325,2003. 149[72] Kang, S., Do, H., Cho, B., Chien, S., and Tae, H. Improvement of lowgray-level linearity using perceived luminance of human visual system inPDP-TV. IEEE Trans. Consumer Electronics, 51(1):204–209, 2005. 42[73] Keys, R. Cubic convolution interpolation for digital image processing. IEEEtransactions on acoustics, speech, and signal processing, 29(6):1153–1160,1981. 124[74] Kim, K. I. and Kwon, Y. Example-based learning for single-image super-resolution. In Pattern Recognition, pages 456–465. Springer, 2008. 126[75] Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classificationwith deep convolutional neural networks. In Advances in neural informationprocessing systems, pages 1097–1105, 2012. xxi[76] Kronander, J., Gustavson, S., Bonnet, G., and Unger, J. Unified HDRreconstruction from raw CFA data. In Computational Photography (ICCP),2013 IEEE International Conference on, pages 1–9, 2013. 19[77] Kwon, Y., Kim, K. I., Tompkin, J., Kim, J. H., and Theobalt, C. Efficientlearning of image super-resolution and compression artifact removal withsemi-local gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell.,December 2014. 126, 133, 135, 136, 137217[78] Laemmel, A. Coding processes for band-width reduction in picturetransmission. In PROCEEDINGS OF THE INSTITUTE OF RADIOENGINEERS, volume 39, pages 293–293, 1951. 25[79] LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learningapplied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. xxi[80] Levin, A., Fergus, R., Durand, F., and Freeman, W. Image and depth from aconventional camera with a coded aperture. ACM Trans. Graph., 26(3):70,2007. 26, 62, 111, 150, 178, 199[81] Levin, A., Sand, P., Cho, T., Durand, F., and Freeman, W. Motion-invariantphotography. ACM Trans. Graph., 27(3):71, 2008. 26[82] Levin, A., Lischinski, D., and Weiss, Y. Colorization using optimization. InSIGGRAPH ’04, pages 689–694, 2004. 154[83] Levin, A., Fergus, R., Durand, F., and Freeman, W. T. Image and depth froma conventional camera with a coded aperture. ACM Trans. Comput. Graph.,26(3):70, 2007. 176[84] Li, X. and Orchard, M. T. New edge-directed interpolation. IEEE Trans.Image Process., 10(10):1521–1527, 2001. xxii, 10, 118, 119, 123, 125, 135,136, 202, 205[85] Lowe, D. G. Object recognition from local scale-invariant features. InComputer vision, 1999. The proceedings of the seventh IEEE internationalconference on, volume 2, pages 1150–1157. IEEE, 1999. xxviii, 15[86] Mach, E. On the physiological effect of spatially distributed light stimuli.Mach Bands: Quantitative Studies on Neural Networks in the Retina, pages299–306, 1868. 153[87] Magic Lantern. Open-source software extensions to official canon firmware.http://www.magiclantern.fm, 2018. Accessed: 2018-02-24. 102218[88] Mairal, J., Bach, F., Ponce, J., Sapiro, G., and Zisserman, A. Non-localsparse models for image restoration. In Proc. IEEE Comput. Vision andPattern Recognition, pages 2272–2279, 2009. 169[89] Mangiat, S. and Gibson, J. Spatially adaptive filtering for registration artifactremoval in hdr video. In IEEE Int. Conf. on Image Process., pages 1317–1320, 2011. 15[90] Mann, S. and Picard, R. W. On being ’undigital’ with digital cameras:Extending dynamic range by combining differently exposed pictures, 7pages. In IS&T’s 48th Annual Conference, pages 422–428. Society forImaging Science and Technology, 1995. 149[91] Mann, S. and Picard, R. On being ’undigital’ with digital cameras:extending dynamic range by combining differently exposed pictures.Perceptual Computing Section, Media Laboratory, Massachusetts Instituteof Technology, 1995. 13[92] Mantiuk, R., Myszkowski, K., and Seidel, H.-P. A perceptual framework forcontrast processing of high dynamic range images. ACM Transactions onApplied Perception, 3(3):286–308, 2006. 13[93] Mantiuk, R., Mantiuk, R., Tomaszewska, A., and Heidrich, W. Colorcorrection for tone mapping. In Computer Graphics Forum, volume 28,pages 193–202, 2009. 164[94] Markowski, M. Ghost removal in HDRI acquisition. In 13th CentralEuropean Seminar on Computer Graphics, 2009. 15[95] Masood, S. Z., Zhu, J., and Tappen, M. F. Automatic correction of saturatedregions in photographs using cross-channel correlation. Comput. Graph.Forum, 28(7), 2009. ISSN 1467-8659. 23, 157, 164[96] Meka, A., Zollhöfer, M., Richardt, C., and Theobalt, C. Live intrinsic video.In Proc. ACM Conf. Comput. Graph. and Interactive Techniques, 2016. 174219[97] Meylan, L., Daly, S., and Susstrunk, S. The reproduction of specularhighlights on high dynamic range displays. In Proc. Color ImagingConference, 2006. 22[98] Mitsunaga, T. and Nayar, S. K. Radiometric self calibration. In Proc. IEEEConf. on Computer Vision and Pattern Recognition, volume 1. IEEE, 1999.103, 149[99] Mocˇkus, J. On bayesian methods for seeking the extremum. In OptimizationTechniques IFIP Technical Conference, pages 400–404. Springer, 1975. xx[100] Myszkowski, K., Mantiuk, R., and Krawczyk, G. High Dynamic RangeVideo. Morgan & Claypool Publishers, 2008. 41[101] Nayar, S. K. and Mitsunaga, T. High dynamic range imaging: Spatiallyvarying pixel exposures. In Proc. IEEE Conf. on Computer Vision andPattern Recognition, volume 1, pages 472–479. IEEE, 2000. 105[102] Nayar, S. and Mitsunaga, T. High dynamic range imaging: Spatially varyingpixel exposures. In Proc. Conf. on Comput. Vision and Pattern Recognition,pages 472–479, 2000. 17, 18, 99[103] Nayar, S. and Narasimhan, S. Assorted pixels: Multi-sampled imaging withstructural models. In ACM SIGGRAPH’05 Courses, 2005. 17, 18[104] Ng, M. K. and Bose, N. K. Mathematical analysis of super-resolutionmethodology. IEEE Signal Process. Mag., 20(3):62–74, 2003. 129[105] Olofsson, T. Deconvolution and model-based restoration of clippedultrasonic signals. IEEE Trans. Instrumentation and Measurement, 54(3):1235–1240, 2005. 21[106] Olshausen, B. and Field, D. Emergence of simple-cell receptive fieldproperties by learning a sparse code for natural images. Nature, 381:607–609, 1996. 26, 62, 111, 150, 178, 199[107] Pan, R. and Reeves, S. J. Efficient huber-markov edge-preserving imagerestoration. IEEE Trans. Image Process., 15(12):3728–3735, 2006. 126220[108] Perez, P., Gangnet, M., and Blake, A. Poisson image editing. ACMTransactions on Graphics, 22(3):313–318, 2003. 154[109] Portz, T., Zhang, L., and Jiang, H. Random coded sampling for high-speedHDR video. In Int. Conf. on Computational Photography, pages 1–8, 2013.16[110] Raskar, R., Agrawal, A., and Tumblin, J. Coded exposure photography:Motion deblurring using fluttered shutter. ACM Trans. Graph., 25(3):795–804, 2006. 25[111] Reinhard, E., Stark, M., Shirley, P., and Ferwerda, J. Photographic tonereproduction for digital images. ACM Transactions on Graphics, 21(3):267–276, 2002. 164[112] Rempel, A., Trentacoste, M., Seetzen, H., Young, H., Heidrich, W.,Whitehead, L., and Ward, G. Ldr2Hdr: on-the-fly reverse tone mappingof legacy video and photographs. ACM Trans. Graph., 26(3):39, 2007. 22,164, 169[113] Robertson, M., Borman, S., and Stevenson, R. Dynamic range improvementthrough multiple exposures. In Proc. Int. Conf. on Image Process., pages159–163, 1999. 13[114] Rouf, M. Single exposure high dynamic range imaging with a conventionalcamera using cross-screen filters. Master’s thesis, University of BritishColumbia, 2009. URL https://open.library.ubc.ca/cIRcle/collections/24/items/1.0051847. Accessed: 2018-02-24. vi[115] Rouf, M. and Ward, R. K. Retrieving information lost by image denoising.In Signal and Information Processing (GlobalSIP), 2015 IEEE GlobalConference on, pages 1066–1070. IEEE, 2015. vii[116] Rouf, M. and Ward, R. K. Dynamic range expansion of single images usingintensity-invariant patch correspondences. In Proc. Int. Conf. on Sig. andImage Processing. IEEE, 2017. vii221[117] Rouf, M. and Ward, R. K. High dynamic range imaging with a singleexposure-multiplexed image using smooth contour prior. In Proc. ElectronicImaging (Image Processing: Algorithms and Systems). IS&T, 2018. vi[118] Rouf, M., Mantiuk, R., Heidrich, W., Trentacoste, M., and Lau, C. Glareencoding of high dynamic range images. In Proc. Conf. on Comput. Visionand Pattern Recognition, pages 289–296. IEEE, 2011. vi, 27[119] Rouf, M., Lau, C., and Heidrich, W. Gradient domain color restoration ofclipped highlights. In Proc. Int. Workshop on Projector-Camera Systems,pages 7–14, 2012. vi, 23, 169, 175[120] Rouf, M., Reddy, D., Pulli, K., and Ward, R. Fast edge-directed single-imagesuper-resolution. In Proc. IS&T Intl. Symposium on Electronic Imaging2016, in press, 2016. vi, 106, 138[121] Rudin, L. I., Osher, S., and Fatemi, E. Nonlinear total variation based noiseremoval algorithms. Physica D: Nonlinear Phenomena, 60(1-4):259–268,1992. xxvii, 62, 113[122] Sen, P., Kalantari, N. K., Yaesoubi, M., Darabi, S., Goldman, D. B., andShechtman, E. Robust patch-based HDR reconstruction of dynamic scenes.ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2012), 31(6):203:1–203:11, November 2012. 15[123] Sen, P., Kalantari, N. K., Yaesoubi, M., Darabi, S., Goldman, D. B., andShechtman, E. Robust patch-based hdr reconstruction of dynamic scenes.ACM Trans. Comput. Graph., 31(6):203, 2012. 149[124] Shen, J. Inpainting and the fundamental problem of image processing. SIAMNews, 36(2), 2003. 23[125] Smith, A. R. A pixel is not a little square, a pixel is not a little square, apixel is not a little square! Microsoft Computer Graphics, Technical Memo,6, 1995. xxvii222[126] Storm, G., Henderson, R., Hurwitz, J., Renshaw, D., Findlater, K., andPurcell, M. Extended dynamic range from a combined linear-logarithmiccmos image sensor. IEEE Journal of Solid-State Circuits, 41(9):2095–2106,2006. 20[127] Sun, N., Mansour, H., and Ward, R. HDR image construction from multi-exposed stereo LDR images. In Proc. IEEE Conf. on Image Process., pages2973–2976. IEEE, 2010. 19[128] Talvala, E., Adams, A., Horowitz, M., and Levoy, M. Veiling glare in highdynamic range imaging. ACM Trans. Graph., 26(3):37, 2007. 96[129] Tam, W.-S., Kok, C.-W., and Siu, W.-C. Modified edge-directedinterpolation for images. Electronic imaging, 19(1):013011–013011, 2010.125[130] Tan, P., Lin, S., Quan, L., and Shum, H. Highlight removal by illumination-constrained inpainting. In Proc. International Conference on ComputerVision, page 164, 2003. 23[131] Team, T. T. A tragic loss. https://www.tesla.com/blog/tragic-loss, June2016. Accessed: 2018-02-24. 1[132] Thiele, S., Arzenbacher, K., Gissibl, T., Giessen, H., and Herkommer, A. M.3d-printed eagle eye: Compound microlens system for foveated imaging.Science advances, 3(2):e1602655, 2017. 208[133] Timofte, R., De Smet, V., and Van Gool, L. A+: Adjusted anchoredneighborhood regression for fast super-resolution. In Computer Vision–ACCV 2014, pages 111–126. Springer, 2014. 126, 134, 136[134] Tocci, M. D., Kiser, C., Tocci, N., and Sen, P. A versatile HDR videoproduction system. ACM Trans. Graph., 30(4):41:1–41:10, July 2011. 19[135] Tomasi, C. and Manduchi, R. Bilateral filtering for gray and color images. InComputer Vision, 1998. Sixth International Conference on, pages 839–846.IEEE, 1998. xx, 24, 157, 183223[136] Tomaszewska, A. and Mantiuk, R. Image registration for multi-exposurehigh dynamic range image acquisition. 2007. 15[137] Trentacoste, M., Lau, C., Rouf, M., Mantiuk, R., and Heidrich, W. Defocustechniques for camera dynamic range expansion. In IS&T/SPIE ElectronicImaging, pages 75370H–75370H, 2010. 26, 73[138] Unger, J. and Gustavson, S. High-dynamic-range video for photometricmeasurement of illumination. In Electronic Imaging 2007, pages 65010E–65010E, 2007. 16[139] Venkataraman, K., Lelescu, D., Duparré, J., McMahon, A., Molina, G.,Chatterjee, P., Mullis, R., and Nayar, S. Picam: an ultra-thin highperformance monolithic camera array. ACM Trans. on Graph., 32(6):166,2013. 126[140] Wang, Q. and Ward, R. K. A new orientation-adaptive interpolation method.IEEE Trans. Image Process., 16(4):889–900, 2007. 117, 124[141] Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P. Image qualityassessment: from error visibility to structural similarity. IEEE transactionson image processing, 13(4):600–612, 2004. xxix[142] Ward, G. Fast, robust image registration for compositing high dynamic rangephotographs from hand-held exposures. Journal of graphics tools, 8(2):17–30, 2003. 15[143] Wetzstein, G., Ihrke, I., and Heidrich, W. Sensor Saturation in FourierMultiplexed Imaging. In Proc. Conf. on Comput. Vision and PatternRecognition, Jun 2010. 18[144] Wu, W., Liu, Z., He, X., and Gueaieb, W. Single-image super-resolutionbased on markov random field and contourlet transform. Electronic Imaging,20(2):023005–023005, 2011. 126[145] Xu, D., Doutre, C., and Nasiopoulos, P. Correction of clipped pixels in colorimages. IEEE Trans. Vis. Comput. Graphics, 17(3):333–344, 2011. 169224[146] Yang, J., Wright, J., Huang, T. S., and Ma, Y. Image super-resolution viasparse representation. Image Processing, IEEE Transactions on, 19(11):2861–2873, 2010. 126, 135, 137[147] Yang, J., Wang, Z., Lin, Z., Shu, X., and Huang, T. Bilevel sparse coding forcoupled feature spaces. In Proc. IEEE Conf. on Comput. Vision and PatternRecognition, pages 2360–2367. IEEE, 2012. 126, 134[148] Yang, J., Lin, Z., and Cohen, S. Fast image super-resolution based onin-place example regression. In Proc. IEEE Conf. in Comput. Vision andPattern Recognition, pages 1059–1066. IEEE, 2013. 128, 134, 136[149] Yoshida, A., Ihrke, M., Mantiuk, R., and Seidel, H. Brightness of the glareillusion. In Proc. APGV, pages 83–90, 2008. 71[150] Yule, J. A. Photographic unsharp masking method, December 7 1948. USPatent 2,455,849. xxix[151] Zhang, K., Gao, X., Tao, D., and Li, X. Single image super-resolution withnon-local means and steering kernel regression. IEEE Trans. Image Process.,21(11):4544–4556, 2012. 128[152] Zhang, X. and Brainard, D. Estimation of saturated pixel values in digitalcolor imaging. JOSA A, 21(12):2301–2310, 2004. 23, 164[153] Zhu, Y., Zhang, Y., and Yuille, A. L. Single image super-resolution usingdeformable patches. In Proc. IEEE Conf. on Comput. Vision and PatternRecognition, 2014. 126225
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Computational single-image high dynamic range imaging
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Computational single-image high dynamic range imaging Rouf, Mushfiqur 2018
pdf
Page Metadata
Item Metadata
Title | Computational single-image high dynamic range imaging |
Creator |
Rouf, Mushfiqur |
Publisher | University of British Columbia |
Date Issued | 2018 |
Description | This thesis proposes solutions for increasing the dynamic range (DR)—the number of intensity levels—of a single image captured by a camera with a standard dynamic range (SDR). The DR in a natural scene is usually too high for SDR cameras to capture, even with optimum exposure settings. The intensity values of bright objects (highlights) that are above the maximum exposure capacity get clipped due to sensor over-exposure, while objects that are too dark (shades) appear dark and noisy in the image. Capturing a high number of intensity levels would solve this problem, but this is costly, as it requires the use of a camera with a high dynamic range (HDR). Reconstructing an HDR image from a single SDR image is difficult, if not impossible, to achieve for all imaging situations. For some situations, however, it is possible to restore the scene details, using computational imaging techniques. We investigate three such cases, which also occur commonly in imaging. These cases pose relaxed and well-posed versions of the general single-image high dynamic range imaging (HDRI) problem. The first case occurs when the scene has highlights that occupy a small number of pixels in the image; for example, night scenes. We propose the use of a cross-screen filter, installed at the lens aperture, to spread a small part of the light from the highlights across the rest of the image. In post-processing, we detect the spread-out brightness and use this information to reconstruct the clipped highlights. Second, we investigate the cases when highlights occupy a large part of the scene. The first method is not applicable here. Instead, we propose to apply a spatial filter at the sensor that locally varies the DR of the sensor. In post-processing, we reconstruct an HDR image. The third case occurs when the clipped parts of the image are not white but have a color. In such cases, we restore the missing image details in the clipped color channels by analyzing the scene information available in other color channels in the captured image. For each method, we obtain a maximum-a-posteriori estimate of the unknown HDR image by analyzing and inverting the forward imaging process. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2018-06-25 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0368721 |
URI | http://hdl.handle.net/2429/66350 |
Degree |
Doctor of Philosophy - PhD |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2018-09 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2018_september_rouf_mushfiqur.pdf [ 275.92MB ]
- Metadata
- JSON: 24-1.0368721.json
- JSON-LD: 24-1.0368721-ld.json
- RDF/XML (Pretty): 24-1.0368721-rdf.xml
- RDF/JSON: 24-1.0368721-rdf.json
- Turtle: 24-1.0368721-turtle.txt
- N-Triples: 24-1.0368721-rdf-ntriples.txt
- Original Record: 24-1.0368721-source.json
- Full Text
- 24-1.0368721-fulltext.txt
- Citation
- 24-1.0368721.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0368721/manifest