LEGAL-BERT: The Muppets straight out of Law School

Ilias Chalkidis on Twitter: "Our paper "LEGAL-BERT: The Muppets straight out  of Law School" with @ManosFergas, @NeuRulller, @nikaletras and @ionandrou,  has been accepted in Findings of #EMNLP2020. Arxiv pre-print available at:

BERT has achieved impressive performance in several NLP tasks. However, there has been limited investigation on its adaptation guidelines in specialised domains. Here we focus on the legal domain, where we explore several approaches for applying BERT models to downstream legal tasks, evaluating on multiple datasets. Our findings indicate that the previous guidelines for pre-training and fine-tuning, often blindly followed, do not always generalize well in the legal domain. Thus we propose a systematic investigation of the available strategies when applying BERT in specialised domains. These are: (a) use the original BERT out of the box, (b) adapt BERT by additional pre-training on domain-specific corpora, and (c) pre-train BERT from scratch on domain-specific corpora. We also propose a broader hyper-parameter search space when fine-tuning for downstream tasks and we release LEGAL-BERT, a family of BERT models intended to assist legal NLP research, computational law, and legal technology applications.

BERT已经在许多NLP任务上取得了惊人的效果。但是在专业领域上缺少挖掘。在本文中,我们专注于法律领域,我们探索了几种将BERT应用在下游法律任务的方式并且在几种数据集上获得了验证。我们发现,之前提出的预训练-fine-tuning的方式无法泛化到法律领域。所以我们系统性地研究了将BERT应用在特殊领域的可能方式:(a) 开箱即用传统BERT;(b) 使用专业语料对BERT进行追加的预训练;(c) 使用专业语料对BERT从零开始训练。我们还针对fine-tuning下游任务提出了一种更宽的超参数搜索空间:LEGAL-BERT. 这是一个BERT家族的模型,用于帮助法律NLP任务,可计算法律以及其他法律应用的研究。


邮箱地址不会被公开。 必填项已用*标注