UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Advancing life cycle assessment of bio-based chemicals through artificial intelligence-driven data extraction Tang, Zirui

Abstract

The transition from fossil-based chemicals to bio-based alternatives is widely regarded as a key strategy for reducing greenhouse gas (GHG) emissions in the chemical sector. However, the environmental performance of biochemicals, as quantified by life cycle assessment (LCA) studies, remains uncertain due to heterogeneous biorefinery configurations and inconsistent modeling choices across studies. In addition, life cycle inventory (LCI) and life cycle impact assessment (LCIA) results are often fragmented and inconsistently reported in scientific literature, which hinders systematic data synthesis and limits the reuse of existing evidence. This thesis addresses these challenges by integrating quantitative evidence synthesis with artificial intelligence (AI)-enabled data extraction to enhance the reliability, comparability, and accessibility of LCA data for biochemicals. First, a systematic literature review was conducted on 65 LCA studies covering 17 priority biochemicals across C1-C6 value chains. Reported global warming potential (GWP) results were systematically harmonized to reduce heterogeneity in modeling assumptions to ensure fair comparability across studies. The analysis indicates that most biochemicals exhibit lower GWP, primarily driven by biogenic carbon sequestration and reduced emissions in manufacturing stages. A meta-regression analysis further identified key drivers of variability in reported GWPs, revealing that feedstock type, LCA modeling assumptions (e.g., allocation methods), and carbon sequestration, significantly affect results. Second, to address the labor-intensive and inefficient nature of manual LCA data extraction, this research develops an AI-assisted framework that integrates large language models (LLMs) with a knowledge graph (KG) to automatically extract and manage LCA domain data from unstructured PDF literature. The framework achieves high semantic accuracy, with F1-scores ranging from 73.54% to 93.34% in extracting three key types of LCA information: LCI data, LCIA results, and modeling assumptions. Furthermore, by integrating similarity-based retrieval with graph-based reasoning, the framework enhances query performance within the graph database, improving the F1-score from a baseline of 56.98% to 75.18%. Overall, this thesis advances understanding of the environmental performance of biochemicals and strengthens LCA research by introducing an intelligent data infrastructure that supports scalable, data-driven LCA analysis for chemical systems and beyond.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International