Is Medical Chest X-ray Data Anonymous?

With the rise and ever-increasing potential of deep learning techniques in recent years, publicly available medical data sets became a key factor to enable reproducible development of diagnostic algorithms in the medical domain. Medical data contains sensitive patient-related information and is therefore usually anonymized by removing patient identifiers, e.g., patient names before publication. To the best of our knowledge, we are the first to show that a well-trained deep learning system is able to recover the patient identity from chest X-ray data. We demonstrate this using the publicly available large-scale ChestX-ray14 dataset, a collection of 112,120 frontal-view chest X-ray images from 30,805 unique patients. Our verification system is able to identify whether two frontal chest X-ray images are from the same person with an AUC of 0.9940 and a classification accuracy of 95.55%. We further highlight that the proposed system is able to reveal the same person even ten and more years after the initial scan. When pursuing a retrieval approach, we observe an mAP@R of 0.9748 and a precision@1 of 0.9963. Based on this high identification rate, a potential attacker may leak patient-related information and additionally cross-reference images to obtain more information. Thus, there is a great risk of sensitive content falling into unauthorized hands or being disseminated against the will of the concerned patients. Especially during the COVID-19 pandemic, numerous chest X-ray datasets have been published to advance research. Therefore, such data may be vulnerable to potential attacks by deep learning-based re-identification algorithms.


随着近年来深度学习技术的发展,公开的医疗数据集称为诊断算法能够成功的关键因素之一。医疗数据包含敏感的个人信息,因此这些信息常常会被移除,例如病人的姓名。据我们所知,我们是第一个展示一个预训练的深度学习模型可以从X光数据中恢复病人的个人信息的研究小组。我们使用公认的Chest-ray14 数据集进行测试,这个数据集拥有112120前侧X光数据,由30805独立病人采集。我们的系统用可以有效识别两张X光图像是否来自同一个人,甚至两张图像的生成时间相差多年。基于这样的高识别率,一个潜在的攻击者可以泄露这些个人信息,并通过交叉对比获得更多的信息。因此,敏感信息的泄漏横在面临很高的风险。特别是对于COVID-19疫情,多个胸部X光数据集已经被公开。所以,这些数据的隐私应该被考虑进行有效保护。

COVID-19 Prognosis via Self-Supervised Representation Learning and Multi-Image Prediction

The rapid spread of COVID-19 cases in recent months has strained hospital resources, making rapid and accurate triage of patients presenting to emergency departments a necessity. Machine learning techniques using clinical data such as chest X-rays have been used to predict which patients are most at risk of deterioration. We consider the task of predicting two types of patient deterioration based on chest X-rays: adverse event deterioration (i.e., transfer to the intensive care unit, intubation, or mortality) and increased oxygen requirements beyond 6 L per day. Due to the relative scarcity of COVID-19 patient data, existing solutions leverage supervised pretraining on related non-COVID images, but this is limited by the differences between the pretraining data and the target COVID-19 patient data. In this paper, we use self-supervised learning based on the momentum contrast (MoCo) method in the pretraining phase to learn more general image representations to use for downstream tasks. We present three results. The first is deterioration prediction from a single image, where our model achieves an area under receiver operating characteristic curve (AUC) of 0.742 for predicting an adverse event within 96 hours (compared to 0.703 with supervised pretraining) and an AUC of 0.765 for predicting oxygen requirements greater than 6 L a day at 24 hours (compared to 0.749 with supervised pretraining). We then propose a new transformer-based architecture that can process sequences of multiple images for prediction and show that this model can achieve an improved AUC of 0.786 for predicting an adverse event at 96 hours and an AUC of 0.848 for predicting mortalities at 96 hours. A small pilot clinical study suggested that the prediction accuracy of our model is comparable to that of experienced radiologists analyzing the same information.



COVID TV-UNet: Segmenting COVID-19 Chest CT Images Using Connectivity Imposed U-Net

The novel corona-virus disease (COVID-19) pandemic has caused a major outbreak in more than 200 countries around the world, leading to a severe impact on the health
and life of many people globally. As of mid-July 2020, more than 12 million people were infected, and more than 570,000 death were reported. Computed Tomography (CT) images can be used as an alternative to the time-consuming RT-PCR test, to detect COVID-19. In this work we propose a segmentation framework to detect chest regions in CT images, which are infected by COVID-19. We use an architecture similar to U-Net
model, and train it to detect ground glass regions, on pixel level.
As the infected regions tend to form a connected component (rather than randomly distributed pixels), we add a suitable regularization term to the loss function, to promote connectivity of the segmentation map for COVID-19 pixels. 2D-anisotropic to talvariation is used for this purpose, and therefore the proposed model is called “TV-UNet”. Through experimental results on a relatively large-scale CT segmentation dataset of around 900 images, we show that adding this new regularization term leads
to 2% gain on overall segmentation performance compared to the U-Net model. Our experimental analysis, ranging from visual evaluation of the predicted segmentation results to quantitative assessment of segmentation performance (precision, recall, Dice
score, and mIoU) demonstrated great ability to identify COVID19 associated regions of the lungs, achieving a mIoU rate of over 99%, and a Dice score of around 86%.



MiniSeg: An Extremely Minimum Network for Efficient COVID-19 Segmentation

The rapid spread of the new pandemic, coronavirus disease 2019 (COVID-19), has seriously threatened global health. The gold standard for COVID-19 diagnosis is the tried-and true polymerase chain reaction (PCR), but PCR is a laborious, time-consuming and complicated manual process that is in short supply. Deep learning based computer-aided screening, e.g., infection segmentation, is thus viewed as an alternative due to its great successes in medical imaging. However, the publicly available COVID-19 training data are limited, which would easily cause overfitting of traditional deep learning methods that are usually data-hungry with millions of parameters. On the other hand, fast training/testing and low computational cost are also important for quick deployment and development of computer-aided COVID-19 screening systems, but traditional deep learning methods, especially for image segmentation, are usually computationally intensive. To address the above problems, we propose MiniSeg, a lightweight deep learning model for efficient COVID-19 segmentation. Compared with traditional segmentation methods, MiniSeg has several significant strengths:
i) it only has 472K parameters and is thus not easy to overfit;
ii) it has high computational efficiency and is thus convenient for practical deployment; iii) it can be fast retrained by other users using their private COVID-19 data for further improving performance. In addition, we build a comprehensive COVID-19 segmentation benchmark for comparing MiniSeg with traditional methods. Code and models will be released to promote the research and practical deployment for computer-aided COVID19 screening.


  • 为了解决以上问题,我们提出了MiniSeg,一种轻量级深度学习模型用于有效的COVID-19分割。性比于传统的分割方法,MiniSeg有几个显著的优势:
    • i)它仅仅有472K个参数并且不容易过拟合
    • ii)它高度计算有效并且因此非常方便与实际的部署
    • iii)其他用户可以用他们自己的私人COVID-19数据快速的重新训练这个方法,以进一步提升性能。此外,我们构建了一个可理解的COVID-19分割基准,用于比较MiniSeg和其他传统的方法。