VoterFraud2020: a Multi-modal Dataset of Election Fraud Claims on Twitter

The wide spread of unfounded election fraud claims surrounding the U.S. 2020 election had resulted in undermining of trust in the election, culminating in violence inside the U.S. capitol. Under these circumstances, it is critical to understand discussions surrounding these claims on Twitter, a major platform where the claims disseminate. To this end, we collected and release the VoterFraud2020 dataset, a multi-modal dataset with 7.6M tweets and 25.6M retweets from 2.6M users related to voter fraud claims. To make this data immediately useful for a wide area of researchers, we further enhance the data with cluster labels computed from the retweet graph, user suspension status, and perceptual hashes of tweeted images. We also include in the dataset aggregated information for all external links and YouTube videos that appear in the tweets. Preliminary analyses of the data show that Twitter’s ban actions mostly affected a specific community of voter fraud claim promoters, and exposes the most common URLs, images and YouTube videos shared in the data.

https://arxiv.org/pdf/2101.08210.pdf

在美国2020年选举中,被广泛传播的没有根据的选举舞弊指控已经破坏了选举的公信力,最终导致了国会大厦的暴力事件。所以对于这些来自推特平台的指控的研究变得很重要。我们收集并整理了VoterFraud2020数据集,这是一个多模态的数据集,它包括7.6百万条推特,25.6百万条来自2.6百万用户的转推,这些推特都与选举舞弊指控相关。为了更好地帮助研究者对数据进行处理,我们对数据进行了处理并从转推图、用户状态、图像的感知哈希值中得到了聚类标签。我们还汇总了所有推文中出现的YouTube视频链接。对数据的初步分析表示推特平台的封禁措施对于选举舞弊的支持者群体是有影响的。

发表评论

邮箱地址不会被公开。 必填项已用*标注