UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Exploring code clones in software development : a study of PyTorch on GitHub and Stack Overflow Alam, Md Jumar

Abstract

Code cloning, the practice of duplicating identical or highly similar source code fragments within or across different projects, is a prevalent phenomenon in software development. This practice is not exclusive to traditional software development but extends to deep learning frameworks. Developers often clone code from both within their own repository files and distant repositories in the open-source system. Platforms like GitHub and Stack Overflow serve as rich ecosystems for such practices. This thesis looks into the specifics of code cloning in the context of the PyTorch framework. The research addresses the distribution of PyTorch code clones, the relationship between code cloning practices and user and repository metadata, and the phases of deep learning development where code cloning primarily occurs. Findings reveal that function cloning is more prevalent than block cloning in GitHub-GitHub clones, with Type I and Type II clones being more common. However, for GitHub-Stack Overflow and Stack Overflow-Stack Overflow clones, Type III clones are more prevalent, indicating users often modify codes cloned from different platforms. The research also finds that user contributions, follower count, following count, organizational membership, and repository popularity do not strongly influence code cloning practices. Findings from the clone also show that the data processing, model construction, and model evaluation phases of deep learning development stages have more clones than other stages. Comparisons show that most cloning occurs in the preliminary preparation and data collection stage by different users. Meanwhile, clones in data processing, model construction, and model evaluation stages are mostly done by the same users in their repositories. Across all stages, Type III clones are more prominent in both categories of users. The findings could guide future research, such as analyzing the usage of PyTorch APIs during code cloning.

Item Citations and Data

Rights

Attribution 4.0 International