Generative adversarial networks (GANs) are able to generate high resolution photo-realistic images of objects that “do not exist.” These synthetic images are rather difficult to detect as fake. However, the manner in which these generative models are trained hints at a potential for information leakage from the supplied training data, especially in the context of synthetic faces. This paper presents experiments suggesting that identity information in face images can flow from the training corpus into synthetic samples without any adversarial actions when building or using the existing model. This raises privacy-related questions, but also stimulates discussions of (a) the face manifold’s characteristics in the feature space and (b) how to create generative models that do not inadvertently reveal identity information of real subjects whose images were used for training. We used five different face matchers (face_recognition, FaceNet, ArcFace, SphereFace and Neurotechnology MegaMatcher) and the StyleGAN2 synthesis model, and show that this identity leakage does exist for some, but not all methods. So, can we say that these synthetically generated faces truly do not exist? Databases of real and synthetically generated faces are made available with this paper to allow full replicability of the results discussed in this work.
GAN被普遍认为可以生成一些高分辨率的并不存在的虚假人脸。但是，GAN的训练过程中有可能通过训练数据泄露，尤其是从生成人脸的上下文关系。本文揭示了个人的人脸信息可能可以从训练集中流入合成数据中不需要借助任何对抗操作或者使用任何模型。这让我们想问(1) 人脸的流形特征在特征空间中是如何的形式; (2) 如何构造一个生成模型而不会泄露个人信息。我们使用四个人脸匹配器(face_recognition, FaceNet, ArcFace, SphereFace and Neurotechnology MegaMatcher) 以及StyleGAN2生成模型，揭示了这种泄露存在于部分但不是所有的方法中。所以我们还能够说生成的人脸是绝对不存在的吗？本文涉及的数据集将会公开以供社区讨论。