- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Faculty Research and Publications /
- Human Interpretation and Exploitation of Self-attention...
Open Collections
UBC Faculty Research and Publications
Human Interpretation and Exploitation of Self-attention Patterns in Transformers : A Case Study in Extractive Summarization Li, Raymond; Xiao, Wen; Wang, Lanjun; Carenini, Giuseppe
Abstract
The transformer multi-head self-attention mechanism has been thoroughly investigated recently. On one hand, researchers are interested in understanding why and how transformers work. On the other hand, they propose new attention augmentation methods to make transformers more accurate, efficient and interpretable. In this paper, we synergize these two lines of research in a human-in-the-loop pipeline to first find important task-specific attention patterns. Then those patterns are applied, not only to the original model, but also to smaller models, as a human-guided knowledge distillation process. The benefits of our pipeline are demonstrated in a case study with the extractive summarization task. After finding three meaningful attention patterns in the popular BERTSum model, experiments indicate that when we inject such patterns, both the original and the smaller model show improvements in performance and arguably interpretability.
Item Metadata
Title |
Human Interpretation and Exploitation of Self-attention Patterns in Transformers : A Case Study in Extractive Summarization
|
Creator | |
Date Issued |
2021-12-10
|
Description |
The transformer multi-head self-attention mechanism has
been thoroughly investigated recently. On one hand, researchers are interested in understanding why and how transformers work. On the other hand, they propose new attention augmentation methods to make transformers more accurate, efficient and interpretable. In this paper, we synergize
these two lines of research in a human-in-the-loop pipeline
to first find important task-specific attention patterns. Then
those patterns are applied, not only to the original model, but
also to smaller models, as a human-guided knowledge distillation process. The benefits of our pipeline are demonstrated
in a case study with the extractive summarization task. After finding three meaningful attention patterns in the popular
BERTSum model, experiments indicate that when we inject
such patterns, both the original and the smaller model show
improvements in performance and arguably interpretability.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2022-01-19
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution 4.0 International
|
DOI |
10.14288/1.0406311
|
URI | |
Affiliation | |
Peer Review Status |
Reviewed
|
Scholarly Level |
Faculty; Researcher; Postdoctoral
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution 4.0 International