In this notebook, we will train a number of named entity recognition (NER) models using different training schedules and training/validation datasets. Then, we will select the best model using our test dataset.
To train our models, we will use two different schedules and four different types of datasets from two different methods for finding named entities. In total, we will train 16 different NER models.
We will fine-tune [a pretrained `xlm-roberta-base` model][1] with the following two schedules for our masked language modeling (MLM) and named entity recognition (NER) objectives:
- First with MLM for at most 5 epochs and then with NER for at most 5 epochs.
- Using both MLM and NER in parallel for at most 10 epochs.
Nejstarší listina vůbec naším národním jazykem psaná je smlouva mezi Petrem Neumburgerem a panem Bočkem z Kunštátu, sepsaná v Poděbradech 17. prosince 1370