GloVe: Global Vectors for Word Representation

NLP — Word Embedding & GloVe. BERT is a major milestone in creating… | by  Jonathan Hui | Medium

Recent methods for learning vector space representations of words have succeeded
in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

(NLP领域经典论文之一) 最近的自然语言处理方法成功地学习到了可以很好表达语义和句法结构的词嵌入,但是这些句法的来源一直是不明晰的。我们分析了要将这些句法融入词向量中都需要哪些模型特质。我们的成果是提出了一个全局对数双线性回归模型,这个模型继承了全局矩阵分解和局部语境窗口两个技术的优点,它可以通过词-词共现矩阵的非零元素训练学习到统计信息,而不是通过整个稀疏矩阵或单一语境窗口。模型可以生成具有实际意义子结构的词向量,它在近似词任务上取得了75%的成绩,并且击败了其他竞争的方法。

发表评论

邮箱地址不会被公开。 必填项已用*标注