Commit 9273b3ea authored by Vít Novotný's avatar Vít Novotný
Browse files

Add ARQMath task 1 and 2 relevance judgements

parent dcc5ddb0
...@@ -14,3 +14,11 @@ include scripts/votes-qrels-train.V1.0.tsv ...@@ -14,3 +14,11 @@ include scripts/votes-qrels-train.V1.0.tsv
include scripts/votes-qrels-small-validation.V1.0.tsv include scripts/votes-qrels-small-validation.V1.0.tsv
include scripts/votes-qrels-validation.V1.0.tsv include scripts/votes-qrels-validation.V1.0.tsv
include scripts/votes-qrels-test.V1.0.tsv include scripts/votes-qrels-test.V1.0.tsv
include scripts/qrel_task1-test.tsv
include scripts/qrel_task1-train.tsv
include scripts/qrel_task1.tsv
include scripts/qrel_task1-validation.tsv
include scripts/qrel_task2-test.tsv
include scripts/qrel_task2-train.tsv
include scripts/qrel_task2.tsv
include scripts/qrel_task2-validation.tsv
...@@ -3,16 +3,18 @@ ...@@ -3,16 +3,18 @@
This repository evaluates the performance of your information retrieval system This repository evaluates the performance of your information retrieval system
on a number of *tasks*: on a number of *tasks*:
- `task1`[ARQMath Task1][arqmath-task1] validation dataset, - `task1-example`[ARQMath Task1][arqmath-task1] example dataset,
- `task1-votes`[ARQMath Task1][arqmath-task1] Math StackExchange [user votes][], - `task1-votes`[ARQMath Task1][arqmath-task1] Math StackExchange [user votes][],
- `ntcir-11-math-2-main`[NTCIR-11 Math-2 Task Main Subtask][ntcir-11-math-2], and - `task1`[ARQMath Task1][arqmath-task1] final dataset,
- `ntcir-12-mathir-arxiv-main`[NTCIR-12 MathIR Task ArXiv Main Subtask][ntcir-12-mathir]. - `ntcir-11-math-2-main`[NTCIR-11 Math-2 Task Main Subtask][ntcir-11-math-2],
- `ntcir-12-mathir-arxiv-main`[NTCIR-12 MathIR Task ArXiv Main Subtask][ntcir-12-mathir], and
- `ntcir-12-mathir-math-wiki-formula`[NTCIR-12 MathIR Task MathWikiFormula Subtask][ntcir-12-mathir]. - `ntcir-12-mathir-math-wiki-formula`[NTCIR-12 MathIR Task MathWikiFormula Subtask][ntcir-12-mathir].
- `task2`[ARQMath Task2][arqmath-task2] final dataset,
The main tasks are: The main tasks are:
- `task1-votes` – Use this task to evaluate your ARQMath task 1 system. - `task1` – Use this task to evaluate your ARQMath task 1 system.
- `ntcir-12-mathir-math-wiki-formula` – Use this task to evaluate your ARQMath task 2 system. - `task2` – Use this task to evaluate your ARQMath task 2 system.
#### Subsets #### Subsets
Each task comes with three *subsets*: Each task comes with three *subsets*:
...@@ -26,7 +28,11 @@ Each task comes with three *subsets*: ...@@ -26,7 +28,11 @@ Each task comes with three *subsets*:
used at the end to compare the systems, which performed best on the used at the end to compare the systems, which performed best on the
validation set. validation set.
### Usage The `task1` and `task2` tasks come also with the `all` subset, which contains
all relevance judgements. Use these to evaluate a system that has not been
trained using subsets of the `task1` and `task2` tasks.
### Examples
#### Using the `train` set to train your supervised system #### Using the `train` set to train your supervised system
``` sh ``` sh
...@@ -34,7 +40,7 @@ $ pip install --force-reinstall git+https://gitlab.fi.muni.cz/xstefan3/arqmath-e ...@@ -34,7 +40,7 @@ $ pip install --force-reinstall git+https://gitlab.fi.muni.cz/xstefan3/arqmath-e
$ python $ python
>>> from arqmath_eval import get_topics, get_judged_documents, get_ndcg >>> from arqmath_eval import get_topics, get_judged_documents, get_ndcg
>>> >>>
>>> task = 'task1-votes' >>> task = 'task1'
>>> subset = 'train' >>> subset = 'train'
>>> results = {} >>> results = {}
>>> for topic in get_topics(task=task, subset=subset): >>> for topic in get_topics(task=task, subset=subset):
...@@ -62,7 +68,7 @@ $ pip install --force-reinstall git+https://gitlab.fi.muni.cz/xstefan3/arqmath-e ...@@ -62,7 +68,7 @@ $ pip install --force-reinstall git+https://gitlab.fi.muni.cz/xstefan3/arqmath-e
$ python $ python
>>> from arqmath_eval import get_topics, get_judged_documents >>> from arqmath_eval import get_topics, get_judged_documents
>>> >>>
>>> task = 'task1-votes' >>> task = 'task1'
>>> subset = 'validation' >>> subset = 'validation'
>>> results = {} >>> results = {}
>>> for topic in get_topics(task=task, subset=subset): >>> for topic in get_topics(task=task, subset=subset):
...@@ -90,6 +96,7 @@ $ git push # publish your new result and the upd ...@@ -90,6 +96,7 @@ $ git push # publish your new result and the upd
``` ```
[arqmath-task1]: https://www.cs.rit.edu/~dprl/ARQMath/Task1-answers.html (Task 1: Find Answers) [arqmath-task1]: https://www.cs.rit.edu/~dprl/ARQMath/Task1-answers.html (Task 1: Find Answers)
[arqmath-task2]: https://www.cs.rit.edu/~dprl/ARQMath/task2-formulas.html (Task 2: Formula Search)
[get_judged_documents]: https://gitlab.fi.muni.cz/xstefan3/arqmath-eval/-/blob/master/scripts/common.py#L61 [get_judged_documents]: https://gitlab.fi.muni.cz/xstefan3/arqmath-eval/-/blob/master/scripts/common.py#L61
[get_ndcg]: https://gitlab.fi.muni.cz/xstefan3/arqmath-eval/-/blob/master/scripts/common.py#L94 [get_ndcg]: https://gitlab.fi.muni.cz/xstefan3/arqmath-eval/-/blob/master/scripts/common.py#L94
[get_random_ndcg]: https://gitlab.fi.muni.cz/xstefan3/arqmath-eval/-/blob/master/scripts/common.py#L129 [get_random_ndcg]: https://gitlab.fi.muni.cz/xstefan3/arqmath-eval/-/blob/master/scripts/common.py#L129
......
...@@ -19,7 +19,9 @@ underscores (`_`) replaced with a comma and a space for improved readability. ...@@ -19,7 +19,9 @@ underscores (`_`) replaced with a comma and a space for improved readability.
'''.strip() '''.strip()
RELEVANCE_JUDGEMENTS = { RELEVANCE_JUDGEMENTS = {
'train': { 'train': {
'task1': 'qrel.V1.0-train.tsv', 'task1': 'qrel_task1-train.tsv',
'task2': 'qrel_task2-train.tsv',
'task1-example': 'qrel.V1.0-train.tsv',
'task1-votes': 'votes-qrels-train.V1.0.tsv', 'task1-votes': 'votes-qrels-train.V1.0.tsv',
'ntcir-11-math-2-main': 'NTCIR11_Math-qrels-train.dat', 'ntcir-11-math-2-main': 'NTCIR11_Math-qrels-train.dat',
'ntcir-12-mathir-arxiv-main': 'NTCIR12_Math-qrels_agg-train.dat', 'ntcir-12-mathir-arxiv-main': 'NTCIR12_Math-qrels_agg-train.dat',
...@@ -29,20 +31,26 @@ RELEVANCE_JUDGEMENTS = { ...@@ -29,20 +31,26 @@ RELEVANCE_JUDGEMENTS = {
'task1-votes': 'votes-qrels-small-validation.V1.0.tsv', 'task1-votes': 'votes-qrels-small-validation.V1.0.tsv',
}, },
'validation': { 'validation': {
'task1': 'qrel.V1.0-validation.tsv', 'task1': 'qrel_task1-validation.tsv',
'task2': 'qrel_task2-validation.tsv',
'task1-example': 'qrel.V1.0-validation.tsv',
'task1-votes': 'votes-qrels-validation.V1.0.tsv', 'task1-votes': 'votes-qrels-validation.V1.0.tsv',
'ntcir-11-math-2-main': 'NTCIR11_Math-qrels-validation.dat', 'ntcir-11-math-2-main': 'NTCIR11_Math-qrels-validation.dat',
'ntcir-12-mathir-arxiv-main': 'NTCIR12_Math-qrels_agg-validation.dat', 'ntcir-12-mathir-arxiv-main': 'NTCIR12_Math-qrels_agg-validation.dat',
'ntcir-12-mathir-math-wiki-formula': 'NTCIR12_MathWikiFrm-qrels_agg-validation.dat', 'ntcir-12-mathir-math-wiki-formula': 'NTCIR12_MathWikiFrm-qrels_agg-validation.dat',
}, },
'test': { 'test': {
'task1': 'qrel.V1.0-test.tsv', 'task1': 'qrel_task1-test.tsv',
'task2': 'qrel_task2-test.tsv',
'task1-example': 'qrel.V1.0-test.tsv',
'task1-votes': 'votes-qrels-test.V1.0.tsv', 'task1-votes': 'votes-qrels-test.V1.0.tsv',
'ntcir-11-math-2-main': 'NTCIR11_Math-qrels-test.dat', 'ntcir-11-math-2-main': 'NTCIR11_Math-qrels-test.dat',
'ntcir-12-mathir-arxiv-main': 'NTCIR12_Math-qrels_agg-test.dat', 'ntcir-12-mathir-arxiv-main': 'NTCIR12_Math-qrels_agg-test.dat',
'ntcir-12-mathir-math-wiki-formula': 'NTCIR12_MathWikiFrm-qrels_agg-test.dat', 'ntcir-12-mathir-math-wiki-formula': 'NTCIR12_MathWikiFrm-qrels_agg-test.dat',
}, },
'all': { 'all': {
'task1': 'qrel_task1.tsv',
'task2': 'qrel_task2.tsv',
'task1-votes.V1.2': 'votes-qrels.V1.2.tsv', 'task1-votes.V1.2': 'votes-qrels.V1.2.tsv',
'task2-topics-formula_ids.V.1.1': 'topics-formula_ids-qrels.V1.1.tsv', 'task2-topics-formula_ids.V.1.1': 'topics-formula_ids-qrels.V1.1.tsv',
} }
......
...@@ -22,6 +22,8 @@ def evaluate_worker(result_filename): ...@@ -22,6 +22,8 @@ def evaluate_worker(result_filename):
if __name__ == '__main__': if __name__ == '__main__':
for task in TASKS: for task in TASKS:
if not os.path.exists(task):
continue
random_ndcg = get_random_ndcg(task, 'validation') random_ndcg = get_random_ndcg(task, 'validation')
users = glob(os.path.join(task, '*', '')) users = glob(os.path.join(task, '*', ''))
task_results = [(random_ndcg, 'random', 'xrando42')] task_results = [(random_ndcg, 'random', 'xrando42')]
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
...@@ -5,7 +5,7 @@ from setuptools import setup ...@@ -5,7 +5,7 @@ from setuptools import setup
setup( setup(
name='arqmath_eval', name='arqmath_eval',
version='0.0.12', version='0.0.13',
description='Evaluation of ARQMath systems', description='Evaluation of ARQMath systems',
packages=['arqmath_eval'], packages=['arqmath_eval'],
package_dir={'arqmath_eval': 'scripts'}, package_dir={'arqmath_eval': 'scripts'},
...@@ -34,6 +34,14 @@ setup( ...@@ -34,6 +34,14 @@ setup(
'votes-qrels-test.V1.0.tsv', 'votes-qrels-test.V1.0.tsv',
'votes-qrels.V1.2.tsv', 'votes-qrels.V1.2.tsv',
'topics-formula_ids-qrels.V1.1.tsv', 'topics-formula_ids-qrels.V1.1.tsv',
'qrel_task1-test.tsv',
'qrel_task1-train.tsv',
'qrel_task1.tsv',
'qrel_task1-validation.tsv',
'qrel_task2-test.tsv',
'qrel_task2-train.tsv',
'qrel_task2.tsv',
'qrel_task2-validation.tsv',
], ],
}, },
include_package_data=True, include_package_data=True,
......
This table contains the best result for every user on the *task1* task. This table contains the best result for every user on the *task1-example* task.
| nDCG | Result name | User | | nDCG | Result name | User |
|:-----|:------------|------| |:-----|:------------|------|
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment