1. Introduction
Prostate cancer (PCa) is the second prevalent cancer among males [1]. The number of diagnoses is estimated to increase up to ~1.7 million worldwide by 2030 [2]. Accurate prostate lesion assessment, particularly for classifying clinically significant PCa (csPCa; Gleason Score (GS) ≥7) [3] from indolent non-csPCa, can vastly facilitate tailored treatments[4]. The broad range of PCa’s behavioral pathology makes the assessment challenging [5]. Current clinical assessment relies on prostate-specific antigen (PSA) blood test, if which is positive, a transrectal ultrasound (TRUS) biopsy will be followed. However, PSA in conjunction with blind TRUS biopsy has a high false-negative rate (~20%), inducing unnecessary biopsies [6]. It is also highly prone to causing under-detection of csPCa or over-detection of non-csPCa[7].
Multiparametric magnetic resonance imaging (mpMRI) has become a gold standard PCa diagnosis, even in prior to any biopsy[8], if necessary. It typically involves T2-weighted (T2), high diffusion-weighted imaging (hDWI) sequence, and its derivative apparent-diffusion coefficient (ADC) maps[4, 9]. Although the magnetic resonance imaging (MRI) acquisition and interpretation have been standardized with guidance of Prostate Imaging Reporting and Data System version 2.1 (PI-RADS v2.1) [10], the image interpretation is still time-consuming for the readers [5], inevitably, significant inter-reader variations still exist[11]. To this end, numerous learning-based methods have been proposed to facilitate efficient, accurate, and reliable prostate lesion assessment. In 2017, an international contest PROSTATEx Challenge [12] was organized, in which twenty-one teams proposed their models with the area under the ROC curve (AUC) ranging from 0.80 to 0.87 [13]. Unlike the traditional methods relying on inputs with handcrafted features[14], all of them employed convolutional neural networks (CNNs) [15] to detect the complex semantic features automatically, demonstrating significant advantages in a combination of prostate lesion detection and classification (PLDC) over the traditional methods.
To enhance the network training, prostate magnetic resonance (MR) images have to be pre-cropped manually, so as to retain the prostate region that originally occupies just a small portion of the entire image set. Few recent studies, e.g. [2, 16], proposedautomated PLDC frameworks to save laboring effort from repeated prostate segmentation. CNNs were also utilized to segment the target region, identifying the prostate profile. These studies, despite with notable progress, still assumed the training/testing datasets have to be shared the same data distribution from the source and target domains. This would be too ideal[17], as in normal practice, prostate MR images from a single cohort could not avoid the nature of data scarcity, or either they are typically publicly unavailable [14]. Most likely, images have to be collected and aggregated from multiple cohorts to maintain a certain size of samples for robust model training. Inevitably, these multi-site images exhibit apparent discrepancies in terms of scanning protocols, in-plane resolutions, field of views (FoV), etc[17-18]. These inherent inter-site discrepancies would cause “domain shift ”, while having the models trained in the source domain, but applied in the target domain. It significantly degrades the overall model performance, biasing the PLDC results.
Several paradigms have been proposed to resolve the domain shift. An intuitive solution is directly mixing the heterogeneous images from multiple cohorts to make the training data adequate. However, the model’s prediction capability could not be explicitly improved, in contrast, would be limited by overfitting when distribution heterogeneity is significant [18-19]. Another common practice is to pre-train the model in the source domain, then fine-tune it in the target domain. It generally requires sufficient labeled data from the target domain to tune massive network parameters manually, which is still labor-intensive. Domain adaptation (DA) has emerged as a more promising method, allowing effective knowledge transfer [17, 20] from the label-rich source domain to the target domain. Recently, unsupervised DA (UDA) methods have drawn increasing attention, accredited to their immunity of using target labels for training [21]. They can be generally categorized as image translation and feature alignment approaches. In the former one, the models can align image appearance [17, 22] by translating images from one domain to another using generative models, such as generative adversarial networks (GAN) [23]. Difficulties mainly come from whole-slide image translation, and image synthesis due to insufficient similarity of images. Besides, these models usually focus on low-level feature extraction, suffering from inconspicuous lesion texture and characteristics [24]. In contrast, the latter one, feature alignment-based models, could be more effective in resolving domain shift by extracting domain-invariant features, either minimizing correlation distance between domains[25], or assimilating feature distributions through adversarial learning [26]. Yet, very few of them are dedicated to prostate lesion detection and/or classification, particularly using mpMRI. Therefore, an effective UDA model for fully automated mpMRI-based PLDC is highly desirable in prior to its invasive biopsy, if necessary.
In this work, we develop Coarse Mask-guided Deep Domain Adaptation Network (CMD²A-Net) for both coarse prostate lesions detection and lesion malignancy classification. Besides, we also extend the proposed network to an open-sourced system. This executable end-to-end system takes mpMRI sequences as input, and outputs coarse lesion contours as well as lesion malignancy. The system can also be downloaded online. Our work contributions can be summarized below:
  1. Development of a deep-learning-based system for fully automatedprostate lesion assessment. Our end-to-end system is dedicated to PLDC on multi-cohort mpMRI without the need of prior manual processing on mpMRI sequences.
  2. Design of a UDA model (i.e., CMD²A-Net) capable of leveraging cross-site representation transfer to realize accurate PLDC without requiring target labels, where weakly-supervised coarse lesion segmentation modules are incorporated, in order to extract informative lesion features, thus facilitating feature alignment between domains.
  3. Experimental evaluation of CMD²A-Net on one public dataset (i.e., PROSTATEx [12]) and three local cohort datasets, including lesion assessments with various mpMRI sequence inputs, comparisons with state-of-the-art models, as well as ablation study. The capability of transferring knowledge from PROSTATEx to our small-scale local cohort datasets is demonstrated over the state-of-the-art models.
Related Work
CNNs have been proved effective and widely applied for mpMRI-based PCa classification with promising performance. Wang and Wang[13]a attempted to explore optimal mpMRI sequence combinations as the CNN’s input, and their model achieved an AUC of 0.95, which was reported to outperform all models in the PROSTATEx Challenge. Rather than PCa classification only, Kiraly, Abi Nader, Tuysuzoglu, Grimm, Kiefer, El-Zehiry and Kamen[27] developed a model with an encoder-decoder architecture to detect prostate lesions and simultaneously classify the lesion malignancy. However, these studies required manually-cropped regions of prostate, which would be time-consuming and expensive[22a, 28].
End-to-end PLDC frameworks have also been investigated, with the aim to avoid the need for manual prostate segmentation. Yang, Liu, Wang, Yang, Le Min, Wang and Cheng [2] incorporated CNN for automatic segmentation in advance to the PLDC. Insufficient prostate image features extracted by the shallow network (i.e. in five-layer) could deteriorate much the overall segmentation accuracy. Later, Wang, Liu, Cheng, Wang, Yang and Cheng [29] proposed a deeper prostate segmentation model capable of detecting more complex features. Apart from improving the segmentation performance, fusing spatial features using 3D CNNs is also another means to enhance the PCa classification accuracy. Mehta, Antonelli, Ahmed, Emberton, Punwani and Ourselin [30] employed a patient-level 3D model for binary classification using volumetric mpMRI, achieving an AUC of 0.79 and 0.86, respectively, on their local cohort dataset and PROSTATEx. However, only single-cohort datasets were used to evaluate the model. Domain shift would occur when it is directly applied to an unseen cohort [17-18]. Provided with very few studies (e.g., Mehta, Antonelli, Ahmed, Emberton, Punwani and Ourselin[30]) mpMRI sequences from multiple cohorts, they could just directly combine the heterogeneous images, giving rise of samples sufficient for model training, but inevitably ignoring data source heterogeneity. It would be prone to severe domain shift, thus biasing predictions by particular cohorts.
Very recently, there are many research attempts in investigating DA approaches to alleviate inter-site distributional variability, among which UDA methods demonstrated their advantages in exploiting unlabeled target samples [20]. Such UDA methods can be categorized into two groups: (1) image translation and (2)feature alignment approaches. The former one performs image appearance alignment [17, 22]. The resultant models translate images across domains using GAN-based networks[23]. However, texture similarity between the image of synthesized target and the source would be crucial for the PLDC problem. The DA process would fail with insufficient texture similarity, particularly found in the generated lesion area[22c]. Besides, lesions could be missed during the translation process due to various transferability among image regions, thus worsening the DA process [31]. Moreover, the GAN models would distort the non-lesion region’s appearance, further causing unreliable lesion assessment results [24].
By using feature alignment approaches, domain-invariant features are extracted to reduce domain shift [26]. A common way is to minimize distribution similarity (e.g. second-order correlation [25]) between domains using Siamese network architecture. Adversarial learning [26a]can also align features by enforcing the cross-domain features indistinguishable using a domain classifier. For instance, Wang, Feng, Zhang, Wang, Lv and Yi [14] developed a GAN-based method to learn domain-invariant features on mammographic images acquired for breast cancer screening. However, these models were usually trained with the entire images, treating all voxels equally[26b, 28]. Previous works [24, 26b] revealed that not all image regions can facilitate knowledge transfer across domains. Roughly aligning the features in the whole image set would introduce irrelevant knowledge, resulting in ineffective DA. It is hypothesized that the background regions on mpMRI sequences, such as regions outside the prostate gland, would not attribute to DA well in our PLDC problem. To our knowledge, only few works reported PCa classification using multi-site ultrasound images[32], histopathology images[33], or T2 image slices only[13b].
2. Results and Discussion
2.1. Datasets
Table 1. Characteristics of the five MRI datasets for prostate segmentation and PLDC.