GloVe: Global Vectors for Word Representation

NLP — Word Embedding & GloVe. BERT is a major milestone in creating… | by  Jonathan Hui | Medium

Recent methods for learning vector space representations of words have succeeded
in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

(NLP领域经典论文之一) 最近的自然语言处理方法成功地学习到了可以很好表达语义和句法结构的词嵌入,但是这些句法的来源一直是不明晰的。我们分析了要将这些句法融入词向量中都需要哪些模型特质。我们的成果是提出了一个全局对数双线性回归模型,这个模型继承了全局矩阵分解和局部语境窗口两个技术的优点,它可以通过词-词共现矩阵的非零元素训练学习到统计信息,而不是通过整个稀疏矩阵或单一语境窗口。模型可以生成具有实际意义子结构的词向量,它在近似词任务上取得了75%的成绩,并且击败了其他竞争的方法。

Efficient estimation of word representations in vector space

PDF] Efficient Estimation of Word Representations in Vector Space |  Semantic Scholar

We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.


(本文是NLP界开山鼻祖级论文Word2Vec) 我们为大规模数据集提出两种计算连续词向量表示的方法。我们是使用一个词近似度估计任务来衡量这些词向量的质量,实验结果包括了与现有性能最优方法的比较。我们观察到本文提出的方法有巨大的精确度提升,同时保持了低得多的算力要求,例如:它仅仅需要不到一天的时间就能从16亿个词中学习到高质量的词向量。而且,通过这些词向量,我们还获得了在词近似估计任务上的SOTA性能。

A Survey of the State of Explainable AI for Natural Language Processing

Recent years have seen important advances in the quality of state-of-the-art models, but this has come at the expense of models becoming less interpretable. This survey presents an overview of the current state of Explainable AI (XAI), considered within the domain of Natural Language Processing (NLP). We discuss the main categorization of explanations, as well as the various ways explanations can be arrived at and visualized. We detail the operations and explainability techniques currently available for generating explanations for NLP model predictions, to serve as a resource for model developers in the community. Finally, we point out the current gaps and encourage directions for future work in this important research area.



An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale  | Papers With Code

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer can perform very well on image classification tasks when applied directly to sequences of image patches. When pre-trained on large amounts of data and transferred to multiple recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc), Vision Transformer attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.


虽然Transformer 已经成为NPL任务的标准配置,它在计算机视觉领域的应用依然是有限的。在计算机视觉领域,注意力机制可以与卷积网络一起使用,也可以用于替换网络中指定的部分而不改变整体网络结构。我们在本文中展示了一种不使用CNNs而是使用全transformer的架构,这种架构可以通过输入一系列的图像patch完成图像分类任务。

Generative adversarial text to image synthesis

GitHub - zsdonghao/text-to-image: Generative Adversarial Text to Image  Synthesis / Please Star -->

Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal. However, in recent years generic and powerful recurrent neural network architectures have been developed to learn discriminative text feature representations. Meanwhile, deep convolutional generative adversarial networks (GANs) have begun to generate highly compelling images of specific categories, such as faces, album covers, and room interiors. In this work, we develop a novel deep architecture and GAN formulation to effectively bridge these advances in text and image modeling, translating visual concepts from characters to pixels. We demonstrate the capability of our model to generate plausible images of birds and flowers from detailed text descriptions.



Brain2Char: A Deep Architecture for Decoding Text from Brain Recordings

Decoding language representations directly from the brain can enable new Brain-Computer Interfaces (BCI) for high bandwidth human-human and human-machine communication. Clinically, such technologies can restore communication in people with neurological conditions affecting their ability to speak. In this study, we propose a novel deep network architecture Brain2Char, for directly decoding text (specifically character sequences) from direct brain recordings (called Electrocorticography, ECoG). Brain2Char framework combines state-of-the-art deep learning modules — 3D Inception layers for multiband spatiotemporal feature extraction from neural data and bidirectional recurrent layers, dilated convolution layers followed by language model weighted beam search to decode character sequences, optimizing a connectionist temporal classification (CTC) loss. Additionally, given the highly non-linear transformations that underlie the conversion of cortical function to character sequences, we perform regularizations on the network’s latent representations motivated by insights into cortical encoding of speech production and artifactual aspects specific to ECoG data acquisition. To do this, we impose auxiliary losses on latent representations for articulatory movements, speech acoustics and session specific non-linearities. In 3 participants tested here, Brain2Char achieves 10.6%, 8.5% and 7.0% Word Error Rates (WER) respectively on vocabulary sizes ranging from 1200 to 1900 words. Brain2Char also performs well when 2 participants silently mimed sentences. These results set a new state-of-the-art on decoding text from brain and demonstrate the potential of Brain2Char as a high-performance communication BCI.


从脑部直接解码语言表示课提供人-人,人-机的高速脑机接口。在临床上,这样的技术可以用于为神经受损的人恢复交流能力。在本文中,我们提出了一种深度网络: Brain2Char, 直接从脑部记录(脑皮层电图)中解码词序列。Brain2Char模型包括:3D Inception层 (提取多波段时空域特征),扩张卷积层(解码词序列)以及一个联接的时域分类损失函数。

CogniVal: A framework for cognitive word embedding evaluation

PDF] CogniVal: A Framework for Cognitive Word Embedding Evaluation |  Semantic Scholar

An interesting method of evaluating word representations is by how much they reflect the semantic representations in the human brain. However, most, if not all, previous works only focus on small datasets and a single modality. In this paper, we present the first multimodal framework for evaluating English word representations based on cognitive lexical semantics. Six types of word embeddings are evaluated by fitting them to 15 datasets of eye-tracking, EEG and fMRI signals recorded during language processing. To achieve a global score over all evaluation hypotheses, we apply statistical significance testing accounting for the multiple comparisons problem. This framework is easily extensible and available to include other intrinsic and extrinsic evaluation methods. We find strong correlations in the results between cognitive datasets, across recording modalities and to their performance on extrinsic NLP tasks.



Inducing brain-relevant bias in natural language processing models

Progress in natural language processing (NLP) models that estimate representations of word sequences has recently been leveraged to improve the understanding of language processing in the brain. However, these models have not been specifically designed to capture the way the brain represents language meaning. We hypothesize that fine-tuning these models to predict recordings of brain activity of people reading text will lead to representations that encode more brain-activity-relevant language information. We demonstrate that a version of BERT, a recently introduced and powerful language model, can improve the prediction of brain activity after fine-tuning. We show that the relationship between language and brain activity learned by BERT during this fine-tuning transfers across multiple participants. We also show that, for some participants, the fine-tuned representations learned from both magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI) are better for predicting fMRI than the representations learned from fMRI alone, indicating that the learned representations capture brain-activity-relevant information that is not simply an artifact of the modality. While changes to language representations help the model predict brain activity, they also do not harm the model’s ability to perform downstream NLP tasks. Our findings are notable for research on language understanding in the brain.



Blackbox Meets Blackbox: Representational Similarity & Stability Analysis of Neural Language Models and Brains

In this paper, we define and apply representational stability analysis (ReStA), an intuitive way of analyzing neural language models. ReStA is a variant of the popular representational similarity analysis (RSA) in cognitive neuroscience. While RSA can be used to compare representations in models, model components, and human brains, ReStA compares instances of the same model, while systematically varying single model parameter. Using ReStA, we study four recent and successful neural language models, and evaluate how sensitive their internal representations are to the amount of prior context. Using RSA, we perform a systematic study of how similar the representational spaces in the first and second (or higher) layers of these models are to each other and to patterns of activation in the human brain. Our results reveal surprisingly strong differences between language models, and give insights into where the deep linguistic processing, that integrates information over multiple sentences, is happening in these models. The combination of ReStA and RSA on models and brains allows us to start addressing the important question of what kind of linguistic processes we can hope to observe in fMRI brain imaging data. In particular, our results suggest that the data on story reading from Wehbe et al. (2014) contains a signal of shallow linguistic processing, but show no evidence on the more interesting deep linguistic processing.


在本文中,我们提出了一种直观地分析自然语言处理模型的方法:表示稳定度分析方法(ReStA)。ReStA是一种广为欢迎的表示相似度分析方法(RSA)在认知神经科学领域的变形。RSA专注于比较模型,模型组件,人脑中的表示的区别,而ReStA比较的是在系统性地变化同一个模型的参数而带来对于表示的变化。我们通过ReStA比较和分析了最近成功应用的自然语言处理模型,并且评估了他们的内部表示对于先验信息的敏感程度。另外,我们还利用RSA系统地研究了语言模型中第一或者更高层的表示之间的相似度,以建模对于人脑的激活。我们的试验结果惊讶地揭示了语言模型之间的很大不同,深度语言处理会从多个句子中合并信息。我们的实验可以开始去解答我们在fMRI脑部图像中观察到的现象对应的是哪种语言处理过程。我们的结果表示Wehbe et al. (2014)的数据包含一个浅层语言处理的信号,但是没有包括深度语言处理的内容。

Describing Textures using Natural Language

Describing Textures using Natural Language | Chenyun Wu

Textures in natural images can be characterized by color,
shape, periodicity of elements within them, and other attributes that can
be described using natural language. In this paper, we study the problem
of describing visual attributes of texture on a novel dataset containing
rich descriptions of textures, and conduct a systematic study of current
generative and discriminative models for grounding language to images
on this dataset. We find that while these models capture some properties
of texture, they fail to capture several compositional properties, such
as the colors of dots. We provide critical analysis of existing models by
generating synthetic but realistic textures with different descriptions.
Our dataset also allows us to train interpretable models and generate
language-based explanations of what discriminative features are learned
by deep networks for fine-grained categorization where texture plays a
key role. We present visualizations of several fine-grained domains and
show that texture attributes learned on our dataset offer improvements
over expert-designed attributes on the Caltech-UCSD Birds dataset.