UBC Undergraduate Research

Multi-Resolution Vision Transformer for Subtype Classification in Ovarian Cancer Whole-Slide Histopathology Images Xu, Tony; Farahani, Hossein; Bashashati, Ali

Abstract

Ovarian cancer accounts for a significant number of cancer deaths in women. Diagnosis of ovarian cancer subtypes has also been shown to be a difficult task, with poor agreement between pathologists. Advances in deep learning algorithms have made it possible to automatically analyse whole-slide histopathology images (WSIs) by dividing the image into smaller patches. However, efficiently selecting representative patches in a WSI and aggregating resulting patch-level information to a WSI-level classification are nontrivial tasks. We propose the Multi-Resolution Vision Transformer (MR-ViT) framework to extract and summarize multi-resolution patch features into a WSI-level subtype. This framework uses a convolutional neural network (CNN) with an added attention layer to predict tumor presence and guide downstream selection of higher resolution patches. The result of this selection process is a set of patches at several whole-slide magnifications along with their correlated location, magnification, and tumor classification. We use a Transformer encoder with a smaller CNN and several embedding networks to embed and summarize the outputs of CNN-based patch selection. From the proposed MR-ViT, we obtain both long-range structural features and fine-grained detail to generate the final WSIlevel subtype classification. We obtained a final test accuracy and F1 score of 68.26 and 62.41 respectively, and have yet to converge on the overall optimal parameters. Nonetheless, we found numerous insights on the performance and potential for the MR-ViT framework for future development.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International