标签归档:COVID-19

Is Medical Chest X-ray Data Anonymous?

With the rise and ever-increasing potential of deep learning techniques in recent years, publicly available medical data sets became a key factor to enable reproducible development of diagnostic algorithms in the medical domain. Medical data contains sensitive patient-related information and is therefore usually anonymized by removing patient identifiers, e.g., patient names before publication. To the best of our knowledge, we are the first to show that a well-trained deep learning system is able to recover the patient identity from chest X-ray data. We demonstrate this using the publicly available large-scale ChestX-ray14 dataset, a collection of 112,120 frontal-view chest X-ray images from 30,805 unique patients. Our verification system is able to identify whether two frontal chest X-ray images are from the same person with an AUC of 0.9940 and a classification accuracy of 95.55%. We further highlight that the proposed system is able to reveal the same person even ten and more years after the initial scan. When pursuing a retrieval approach, we observe an mAP@R of 0.9748 and a precision@1 of 0.9963. Based on this high identification rate, a potential attacker may leak patient-related information and additionally cross-reference images to obtain more information. Thus, there is a great risk of sensitive content falling into unauthorized hands or being disseminated against the will of the concerned patients. Especially during the COVID-19 pandemic, numerous chest X-ray datasets have been published to advance research. Therefore, such data may be vulnerable to potential attacks by deep learning-based re-identification algorithms.

https://arxiv.org/abs/2103.08562

随着近年来深度学习技术的发展,公开的医疗数据集称为诊断算法能够成功的关键因素之一。医疗数据包含敏感的个人信息,因此这些信息常常会被移除,例如病人的姓名。据我们所知,我们是第一个展示一个预训练的深度学习模型可以从X光数据中恢复病人的个人信息的研究小组。我们使用公认的Chest-ray14 数据集进行测试,这个数据集拥有112120前侧X光数据,由30805独立病人采集。我们的系统用可以有效识别两张X光图像是否来自同一个人,甚至两张图像的生成时间相差多年。基于这样的高识别率,一个潜在的攻击者可以泄露这些个人信息,并通过交叉对比获得更多的信息。因此,敏感信息的泄漏横在面临很高的风险。特别是对于COVID-19疫情,多个胸部X光数据集已经被公开。所以,这些数据的隐私应该被考虑进行有效保护。

COVID-19 Prognosis via Self-Supervised Representation Learning and Multi-Image Prediction

The rapid spread of COVID-19 cases in recent months has strained hospital resources, making rapid and accurate triage of patients presenting to emergency departments a necessity. Machine learning techniques using clinical data such as chest X-rays have been used to predict which patients are most at risk of deterioration. We consider the task of predicting two types of patient deterioration based on chest X-rays: adverse event deterioration (i.e., transfer to the intensive care unit, intubation, or mortality) and increased oxygen requirements beyond 6 L per day. Due to the relative scarcity of COVID-19 patient data, existing solutions leverage supervised pretraining on related non-COVID images, but this is limited by the differences between the pretraining data and the target COVID-19 patient data. In this paper, we use self-supervised learning based on the momentum contrast (MoCo) method in the pretraining phase to learn more general image representations to use for downstream tasks. We present three results. The first is deterioration prediction from a single image, where our model achieves an area under receiver operating characteristic curve (AUC) of 0.742 for predicting an adverse event within 96 hours (compared to 0.703 with supervised pretraining) and an AUC of 0.765 for predicting oxygen requirements greater than 6 L a day at 24 hours (compared to 0.749 with supervised pretraining). We then propose a new transformer-based architecture that can process sequences of multiple images for prediction and show that this model can achieve an improved AUC of 0.786 for predicting an adverse event at 96 hours and an AUC of 0.848 for predicting mortalities at 96 hours. A small pilot clinical study suggested that the prediction accuracy of our model is comparable to that of experienced radiologists analyzing the same information.

https://arxiv.org/abs/2101.04909

COVID-19的快速传播让医疗资源变得紧张,所以对病人进行准确而快速的分诊是必要的。使用机器学习方法处理例如胸部X光的诊疗数据已经被广泛采用。我们提出采用胸部X光数据将病人分为两类:继续恶化(移交重症监护病房,插管治疗或者死亡)和提高氧气供给至少6L每天。因为目前较为缺乏COVID-19的病人数据,现有的诊断方法往往依赖在非COVID-19的病例上进行监督预训练。在本文中,我们使用基于动量对比(MoCo)的自监督学习方法以在预训练阶段学习更多泛化的图像表示以用于下游任务中。我们展示了三个结果。第一个是恶化程度的预测结果,这个结果是从单张图片中获得的。我们的模型在预测接下来96小时的恶化事件的任务中获得了0.742的AUC,以及在接下来24小时的氧气供给量超过6L事件的任务中获得了0.749的AUC.然后我们还提出了一种新的基于Transformer的模型用来处理图片序列,通过这个模型我们对于死亡事件的预测AUC提高到0.848。在较小的实验数据集上测试的结果显示我们的模型可以达到有经验的放射学家的诊断水平。

COVID TV-UNet: Segmenting COVID-19 Chest CT Images Using Connectivity Imposed U-Net

The novel corona-virus disease (COVID-19) pandemic has caused a major outbreak in more than 200 countries around the world, leading to a severe impact on the health
and life of many people globally. As of mid-July 2020, more than 12 million people were infected, and more than 570,000 death were reported. Computed Tomography (CT) images can be used as an alternative to the time-consuming RT-PCR test, to detect COVID-19. In this work we propose a segmentation framework to detect chest regions in CT images, which are infected by COVID-19. We use an architecture similar to U-Net
model, and train it to detect ground glass regions, on pixel level.
As the infected regions tend to form a connected component (rather than randomly distributed pixels), we add a suitable regularization term to the loss function, to promote connectivity of the segmentation map for COVID-19 pixels. 2D-anisotropic to talvariation is used for this purpose, and therefore the proposed model is called “TV-UNet”. Through experimental results on a relatively large-scale CT segmentation dataset of around 900 images, we show that adding this new regularization term leads
to 2% gain on overall segmentation performance compared to the U-Net model. Our experimental analysis, ranging from visual evaluation of the predicted segmentation results to quantitative assessment of segmentation performance (precision, recall, Dice
score, and mIoU) demonstrated great ability to identify COVID19 associated regions of the lungs, achieving a mIoU rate of over 99%, and a Dice score of around 86%.

本文提出一个用于检测被COVID-19感染的胸部区域的分割架构。我们使用一个类似与U-Net模型的结构并且训练它去在像素级别上检测ground-glass区域。由于感染区域倾向于形成一个连通的部分(而不是随机分布的像素点),我们在损失函数中加入一个合适的正则化项,以提高COVID-19像素分割图的连通性。我们在一个相对大规模的900张图像的CT分割数据集山进行实验,我们展示了通过添加新的正则化项可以实现相对于U-Net模型的2%的提升。我们的实验分析涵盖了预测分割结果的视觉评估以及分割效果的定量评估,展示了极大的COVID-19相关区域的识别能力。

论文地址:https://arxiv.org/pdf/2007.12303.pdf

MiniSeg: An Extremely Minimum Network for Efficient COVID-19 Segmentation

The rapid spread of the new pandemic, coronavirus disease 2019 (COVID-19), has seriously threatened global health. The gold standard for COVID-19 diagnosis is the tried-and true polymerase chain reaction (PCR), but PCR is a laborious, time-consuming and complicated manual process that is in short supply. Deep learning based computer-aided screening, e.g., infection segmentation, is thus viewed as an alternative due to its great successes in medical imaging. However, the publicly available COVID-19 training data are limited, which would easily cause overfitting of traditional deep learning methods that are usually data-hungry with millions of parameters. On the other hand, fast training/testing and low computational cost are also important for quick deployment and development of computer-aided COVID-19 screening systems, but traditional deep learning methods, especially for image segmentation, are usually computationally intensive. To address the above problems, we propose MiniSeg, a lightweight deep learning model for efficient COVID-19 segmentation. Compared with traditional segmentation methods, MiniSeg has several significant strengths:
i) it only has 472K parameters and is thus not easy to overfit;
ii) it has high computational efficiency and is thus convenient for practical deployment; iii) it can be fast retrained by other users using their private COVID-19 data for further improving performance. In addition, we build a comprehensive COVID-19 segmentation benchmark for comparing MiniSeg with traditional methods. Code and models will be released to promote the research and practical deployment for computer-aided COVID19 screening.

由于公开的COVID-19数据集有限,可能导致需要大量数据的传统深度学习方法过拟合。另一方面,快速的训练/测试和低计算代价对于快速的部署和开发也很重要,但是传统的深度学习方法特别是图像分割,通常是计算密集的。

  • 为了解决以上问题,我们提出了MiniSeg,一种轻量级深度学习模型用于有效的COVID-19分割。性比于传统的分割方法,MiniSeg有几个显著的优势:
    • i)它仅仅有472K个参数并且不容易过拟合
    • ii)它高度计算有效并且因此非常方便与实际的部署
    • iii)其他用户可以用他们自己的私人COVID-19数据快速的重新训练这个方法,以进一步提升性能。此外,我们构建了一个可理解的COVID-19分割基准,用于比较MiniSeg和其他传统的方法。

论文地址:https://arxiv.org/pdf/2004.09750.pdf