Commit 69c56bf1 authored by Vít Novotný's avatar Vít Novotný
Browse files

Add evaluation results for all but one model to `03_train_ner_models.ipynb`

parent cd193fa8
Pipeline #147057 passed with stage
in 8 minutes and 57 seconds
%% Cell type:markdown id:089e0573-56f2-4827-a2d5-4b88c8e24e43 tags:
# Train NER models
In this notebook, we will train a number of named entity recognition (NER) models using different training schedules and training/validation datasets. Then, we will select the best model using our test dataset.
%% Cell type:markdown id:9d9fc44f-6c13-47c5-969c-8d26448d2c2d tags:
## Preliminaries
We will begin with a bit of boilerplate, logging information and setting up the computational environment.
%% Cell type:code id:e9047d58-9d3d-4123-a1ed-60a0724295dc tags:
``` python
! hostname
```
%% Output
apollo.fi.muni.cz
%% Cell type:code id:e30f3d27-4c1c-4edf-a0be-f0febde2139b tags:
``` python
! python -V
```
%% Output
Python 3.8.10
%% Cell type:markdown id:e1f13f57-c900-45ed-8698-3668771d7098 tags:
Install the current version of the package and its dependencies.
%% Cell type:code id:f990803b-d7b3-4240-9b7c-16b9865a2c5d tags:
``` python
%%capture
! pip install .
```
%% Cell type:markdown id:0587a39e-c5dd-4e52-807a-13aecbdeb5bd tags:
Make sure numpy does not parallelize.
%% Cell type:code id:0444735d-2dd0-40b0-b9a3-bbb03416f65c tags:
``` python
import os
```
%% Cell type:code id:3c06551f-1a14-4ec6-82c8-31c8314639bd tags:
``` python
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["OPENBLAS_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["VECLIB_MAXIMUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"
```
%% Cell type:markdown id:87b8905f-6989-415f-b271-9c9feb56e560 tags:
Pick the GPU that we will use.
%% Cell type:code id:222cafea-1124-40f3-8156-ac6ee364e83b tags:
``` python
! nvidia-smi -L
```
%% Output
GPU 0: NVIDIA A40 (UUID: GPU-177e5a84-366f-6464-1bbb-908f2dd979cc)
GPU 1: Tesla T4 (UUID: GPU-cf4e7061-619f-5b3b-a217-410f6d506d62)
GPU 2: Tesla T4 (UUID: GPU-00386b4a-741a-aac4-b833-b678a811936f)
GPU 3: Tesla T4 (UUID: GPU-10531c8c-13c3-8e82-302b-91a5615701d6)
GPU 4: Tesla T4 (UUID: GPU-82eac985-cf18-1379-cbcc-e8d71246e28c)
GPU 5: Tesla T4 (UUID: GPU-552f5db8-cec9-3733-3394-17c1ecbc8b85)
GPU 6: Tesla T4 (UUID: GPU-7d2ad51d-6c12-c878-1a30-a21a7fe9c7bd)
GPU 7: Tesla T4 (UUID: GPU-81bd2022-c6f6-4a67-d3f3-f461591e20ab)
GPU 8: Tesla T4 (UUID: GPU-4f6616fb-96e0-adbd-6ee5-7b6146de8ece)
GPU 9: Tesla T4 (UUID: GPU-197d3f17-6807-d6d8-a31c-f54ef78bcb2d)
GPU 10: Tesla T4 (UUID: GPU-e36ec7af-fa51-2498-6bb9-1f2e57bed4c5)
GPU 11: NVIDIA A100 80GB PCIe (UUID: GPU-2d25d82d-c487-73b0-9341-82e74253106e)
GPU 12: Tesla T4 (UUID: GPU-4195d034-0e80-bd51-3c68-3069d48177db)
GPU 13: Tesla T4 (UUID: GPU-030e587b-ae70-3854-4a86-b888f04de428)
GPU 14: Tesla T4 (UUID: GPU-c450823e-5524-7032-228b-140b3187d733)
GPU 15: Tesla T4 (UUID: GPU-8b6ef8ec-186a-2e88-d308-569892e57eeb)
GPU 16: Tesla T4 (UUID: GPU-7edb1e91-a5cb-40a4-b470-e1548a76e6d9)
%% Cell type:code id:4b62818d-125a-46f8-80da-ecdc1bead095 tags:
``` python
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "14"
```
%% Cell type:markdown id:425fbbe4-b88a-420e-9897-ec861cdf111c tags:
Set up logging to display informational messages.
%% Cell type:code id:e271a6d3-4757-4020-a928-20c6406bd26d tags:
``` python
import logging
import sys
```
%% Cell type:code id:cf472e29-276b-4202-a975-d63f1b9c28aa tags:
``` python
logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='%(message)s')
```
%% Cell type:markdown id:6c8a96f9-c6f1-453a-a93c-d133c0d437f6 tags:
## Train models
To train our models, we will use two different schedules and four different types of datasets from two different methods for finding named entities. In total, we will train 16 different NER models.
%% Cell type:markdown id:ec4926c7-5477-46c5-9d27-01b64179bcb4 tags:
We will fine-tune [a pretrained `xlm-roberta-base` model][1] with the following two schedules for our masked language modeling (MLM) and named entity recognition (NER) objectives:
- First with MLM for at most 5 epochs and then with NER for at most 5 epochs.
- Using both MLM and NER in parallel for at most 10 epochs.
[1]: https://huggingface.co/xlm-roberta-base
%% Cell type:code id:f23106ca-bf03-4c27-9548-927aa01d89a4 tags:
``` python
schedule_names = ['fine-tuning', 'parallel']
```
%% Cell type:markdown id:dd602ab5-83a9-4924-9a37-cf3b521cdd92 tags:
We will use datasets produced from the results of two search methods:
- Fuzzy regexes
- Manatee
%% Cell type:code id:36a4c3a2-56e3-439c-bd6c-34f68fb9fb4c tags:
``` python
search_methods = ['manatee', 'fuzzy-regex']
```
%% Cell type:markdown id:7e638751-ea86-481a-bb41-7306b0c56445 tags:
For both Manatee and fuzzy regexes, we will use four different datasets of different sizes and different quality of annotations:
- Using all results from all documents.
- Using all results from documents that have been marked as relevant by expert annotators.
- Using results from sentences that don't cross document boundaries from all documents.
- Using results from sentences that don't cross document boundaries from documents that have been marked as relevant by expert annotators.
%% Cell type:code id:15232a72-22de-488d-8446-77d655d80a66 tags:
``` python
cross_page_boundaries_values = ['non-crossing', 'all']
only_relevant_values = ['only-relevant', 'all']
```
%% Cell type:markdown id:e1aca5ae-a2a7-481f-9b92-3f0e975bbac3 tags:
We will train all our models in turn:
%% Cell type:code id:dd1ab471-88d2-4025-aa5f-9fc94a29217f tags:
``` python
from itertools import product
```
%% Cell type:code id:00e13fff-2a8b-4d5a-a014-65dbc0055b4d tags:
``` python
from ahisto_named_entity_search.recognition import NerModel
```
%% Cell type:code id:0b8cb413-c70b-48f8-b6c7-d43a944ee60c tags:
``` python
models = []
for schedule_name, only_relevant, search_method, cross_page_boundaries in product(
schedule_names, only_relevant_values, search_methods, cross_page_boundaries_values):
if schedule_name == 'fine-tuning' and search_method == 'fuzzy-regex' and cross_page_boundaries == 'non-crossing' and only_relevant == 'all':
continue # TODO: remove me after the fuzzy-regex-non-crossing-all has finished training
model_basename = f'model_ner_{search_method}_{cross_page_boundaries}_{only_relevant}_{schedule_name}'
model_checkpoint_basename = f'{model_basename}_checkpoints'
sentence_basename = f'dataset_mlm_{cross_page_boundaries}_{only_relevant}'
training_sentence_basename = f'{sentence_basename}_training'
validation_sentence_basename = f'{sentence_basename}_validation'
tagged_sentence_basename = f'dataset_ner_{search_method}_{cross_page_boundaries}_{only_relevant}'
training_tagged_sentence_basename = f'{tagged_sentence_basename}_training'
validation_tagged_sentence_basename = f'{tagged_sentence_basename}_validation'
try:
model = NerModel.load(model_basename)
model.model # Try actually loading the NER model
except EnvironmentError:
project_name = f'AHISTO NER: {search_method}, {cross_page_boundaries}, {only_relevant}'
os.environ['COMET_PROJECT_NAME'] = project_name
model = NerModel.train_and_save(model_checkpoint_basename, model_basename,
training_sentence_basename, validation_sentence_basename,
training_tagged_sentence_basename,
validation_tagged_sentence_basename, schedule_name)
models.append(model)
```
%% Cell type:markdown id:c9a695c8-4585-4af5-9a99-544a3e340cd3 tags:
## Evaluate NER models
TODO
To evaluate our models, we will use our smallest (and therefore highest-grade) test dataset.
%% Cell type:code id:984f0f20-72f4-4293-b5ad-2065c4eb6806 tags:
``` python
testing_tagged_sentence_basename = 'dataset_ner_manatee_non-crossing_only-relevant_training'
```
%% Cell type:code id:565d4512-c7b6-47be-a6e2-c487b1043ef7 tags:
%% Cell type:markdown id:310a1802-2a50-4dfe-b9ec-60e2dcd5e0ff tags:
For each model, we will compute the mean F-score at our test dataset.
%% Cell type:code id:e4ff7558-61ed-4208-95b3-2e42a7b6533f tags:
``` python
mean_f_scores = {
model: model.test(testing_tagged_sentence_basename)
for model
in models
}
```
%% Cell type:markdown id:b9b1bfe4-9a0b-43f8-8cbf-9c101f03202e tags:
Finally, we will plot the evaluation results to a table.
%% Cell type:code id:38efa732-8afd-4798-809a-ca828a8b960c tags:
``` python
from pathlib import Path
```
%% Cell type:code id:1c9d3bab-53de-4c39-8e4d-cd2978f00925 tags:
``` python
models[0]
from IPython.display import display
import pandas as pd
from pandas import DataFrame
```
%% Cell type:code id:f47f62ca-1164-45b5-94b9-ff082787e8a9 tags:
``` python
rows = [Path(str(model)).parent.name for model in models]
columns = ['Mean F-score']
data = [[mean_f_scores[model]] for model in models]
mean_f_scores_df = DataFrame(data, columns=columns, index=rows)
```
%% Cell type:code id:053ff5dd-775c-431d-a988-18bf3c4f4f6d tags:
``` python
with pd.option_context('display.float_format', lambda mean_f_score: f'{100.0 * mean_f_score:.5f}%'):
display(mean_f_scores_df.sort_values(by=['Mean F-score'], ascending=False))
```
%% Output
NerModel: /nlp/projekty/ahisto/public_html/named-entity-search/results/model_ner_manatee_non-crossing_only-relevant_fine-tuning/TokenClassification
%% Cell type:code id:eba26620-5a9f-4a41-aef1-1b80e280cd2a tags:
%% Cell type:code id:e3db5477-76e6-4316-ab8b-43014333cb8a tags:
``` python
best_model, _ = max(mean_f_scores.items(), key=lambda x: (x[1], x[0]))
```
%% Cell type:code id:f7975e45-ba27-45b4-9b61-1a9c119d434d tags:
``` python
f_score = models[0].test(testing_tagged_sentence_basename)
print(f'Mean F-score: {f_score * 100.0:.2f}%')
best_model
```
%% Output
Mean F-score: 34.26%
NerModel: /nlp/projekty/ahisto/public_html/named-entity-search/results/model_ner_manatee_non-crossing_only-relevant_parallel/TokenClassification
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment