Learning Domain-Adaptive Landmark Detection-Based Self-Supervised Video Synchronization for Remote Sensing Panorama

UBC Faculty Research and Publications

Learning Domain-Adaptive Landmark Detection-Based Self-Supervised Video Synchronization for Remote Sensing Panorama Mei, Ling; He, Yizhuo; Fishani, Farnoosh Javadi; Yu, Yaowen; Zhang, Lijun; Rhodin, Helge

Abstract

The synchronization of videos is an essential pre-processing step for multi-view reconstruction such as the image mosaic by UAV remote sensing; it is often solved with hardware solutions in motion capture studios. However, traditional synchronization setups rely on manual interventions or software solutions and only fit for a particular domain of motions. In this paper, we propose a self-supervised video synchronization algorithm that attains high accuracy in diverse scenarios without cumbersome manual intervention. At the core is a motion-based video synchronization algorithm that infers temporal offsets from the trajectories of moving objects in the videos. It is complemented by a self-supervised scene decomposition algorithm that detects common parts and their motion tracks in two or more videos, without requiring any manual positional supervision. We evaluate our approach on three different datasets, including the motion of humans, animals, and simulated objects, and use it to build the view panorama of the remote sensing field. All experiments demonstrate that the proposed location-based synchronization is more effective compared to the state-of-the-art methods, and our self-supervised inference approaches the accuracy of supervised solutions, while being much easier to adapt to a new target domain.

Item Metadata

Title	Learning Domain-Adaptive Landmark Detection-Based Self-Supervised Video Synchronization for Remote Sensing Panorama
Creator	Mei, Ling; He, Yizhuo; Fishani, Farnoosh Javadi; Yu, Yaowen; Zhang, Lijun; Rhodin, Helge
Publisher	Multidisciplinary Digital Publishing Institute
Date Issued	2023-02-09
Description	The synchronization of videos is an essential pre-processing step for multi-view reconstruction such as the image mosaic by UAV remote sensing; it is often solved with hardware solutions in motion capture studios. However, traditional synchronization setups rely on manual interventions or software solutions and only fit for a particular domain of motions. In this paper, we propose a self-supervised video synchronization algorithm that attains high accuracy in diverse scenarios without cumbersome manual intervention. At the core is a motion-based video synchronization algorithm that infers temporal offsets from the trajectories of moving objects in the videos. It is complemented by a self-supervised scene decomposition algorithm that detects common parts and their motion tracks in two or more videos, without requiring any manual positional supervision. We evaluate our approach on three different datasets, including the motion of humans, animals, and simulated objects, and use it to build the view panorama of the remote sensing field. All experiments demonstrate that the proposed location-based synchronization is more effective compared to the state-of-the-art methods, and our self-supervised inference approaches the accuracy of supervised solutions, while being much easier to adapt to a new target domain.
Subject	video synchronization; remote sensing; image mosaic; self-supervised learning; style transfer
Genre	Article
Type	Text
Language	eng
Date Available	2025-06-27
Provider	Vancouver : University of British Columbia Library
Rights	CC BY 4.0
DOI	10.14288/1.0449231
URI	http://hdl.handle.net/2429/91468
Affiliation	Science, Faculty of; Non UBC; Computer Science, Department of
Citation	Remote Sensing 15 (4): 953 (2023)
Publisher DOI	10.3390/rs15040953
Peer Review Status	Reviewed
Scholarly Level	Faculty; Researcher
Rights URI	https://creativecommons.org/licenses/by/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Faculty Research and Publications