Commit 9273b3ea authored by Vít Novotný's avatar Vít Novotný
Browse files

Add ARQMath task 1 and 2 relevance judgements

parent dcc5ddb0
......@@ -14,3 +14,11 @@ include scripts/votes-qrels-train.V1.0.tsv
include scripts/votes-qrels-small-validation.V1.0.tsv
include scripts/votes-qrels-validation.V1.0.tsv
include scripts/votes-qrels-test.V1.0.tsv
include scripts/qrel_task1-test.tsv
include scripts/qrel_task1-train.tsv
include scripts/qrel_task1.tsv
include scripts/qrel_task1-validation.tsv
include scripts/qrel_task2-test.tsv
include scripts/qrel_task2-train.tsv
include scripts/qrel_task2.tsv
include scripts/qrel_task2-validation.tsv
......@@ -3,16 +3,18 @@
This repository evaluates the performance of your information retrieval system
on a number of *tasks*:
- `task1`[ARQMath Task1][arqmath-task1] validation dataset,
- `task1-example`[ARQMath Task1][arqmath-task1] example dataset,
- `task1-votes`[ARQMath Task1][arqmath-task1] Math StackExchange [user votes][],
- `ntcir-11-math-2-main`[NTCIR-11 Math-2 Task Main Subtask][ntcir-11-math-2], and
- `ntcir-12-mathir-arxiv-main`[NTCIR-12 MathIR Task ArXiv Main Subtask][ntcir-12-mathir].
- `task1`[ARQMath Task1][arqmath-task1] final dataset,
- `ntcir-11-math-2-main`[NTCIR-11 Math-2 Task Main Subtask][ntcir-11-math-2],
- `ntcir-12-mathir-arxiv-main`[NTCIR-12 MathIR Task ArXiv Main Subtask][ntcir-12-mathir], and
- `ntcir-12-mathir-math-wiki-formula`[NTCIR-12 MathIR Task MathWikiFormula Subtask][ntcir-12-mathir].
- `task2`[ARQMath Task2][arqmath-task2] final dataset,
The main tasks are:
- `task1-votes` – Use this task to evaluate your ARQMath task 1 system.
- `ntcir-12-mathir-math-wiki-formula` – Use this task to evaluate your ARQMath task 2 system.
- `task1` – Use this task to evaluate your ARQMath task 1 system.
- `task2` – Use this task to evaluate your ARQMath task 2 system.
#### Subsets
Each task comes with three *subsets*:
......@@ -26,7 +28,11 @@ Each task comes with three *subsets*:
used at the end to compare the systems, which performed best on the
validation set.
### Usage
The `task1` and `task2` tasks come also with the `all` subset, which contains
all relevance judgements. Use these to evaluate a system that has not been
trained using subsets of the `task1` and `task2` tasks.
### Examples
#### Using the `train` set to train your supervised system
``` sh
......@@ -34,7 +40,7 @@ $ pip install --force-reinstall git+https://gitlab.fi.muni.cz/xstefan3/arqmath-e
$ python
>>> from arqmath_eval import get_topics, get_judged_documents, get_ndcg
>>>
>>> task = 'task1-votes'
>>> task = 'task1'
>>> subset = 'train'
>>> results = {}
>>> for topic in get_topics(task=task, subset=subset):
......@@ -62,7 +68,7 @@ $ pip install --force-reinstall git+https://gitlab.fi.muni.cz/xstefan3/arqmath-e
$ python
>>> from arqmath_eval import get_topics, get_judged_documents
>>>
>>> task = 'task1-votes'
>>> task = 'task1'
>>> subset = 'validation'
>>> results = {}
>>> for topic in get_topics(task=task, subset=subset):
......@@ -90,6 +96,7 @@ $ git push # publish your new result and the upd
```
[arqmath-task1]: https://www.cs.rit.edu/~dprl/ARQMath/Task1-answers.html (Task 1: Find Answers)
[arqmath-task2]: https://www.cs.rit.edu/~dprl/ARQMath/task2-formulas.html (Task 2: Formula Search)
[get_judged_documents]: https://gitlab.fi.muni.cz/xstefan3/arqmath-eval/-/blob/master/scripts/common.py#L61
[get_ndcg]: https://gitlab.fi.muni.cz/xstefan3/arqmath-eval/-/blob/master/scripts/common.py#L94
[get_random_ndcg]: https://gitlab.fi.muni.cz/xstefan3/arqmath-eval/-/blob/master/scripts/common.py#L129
......
......@@ -19,7 +19,9 @@ underscores (`_`) replaced with a comma and a space for improved readability.
'''.strip()
RELEVANCE_JUDGEMENTS = {
'train': {
'task1': 'qrel.V1.0-train.tsv',
'task1': 'qrel_task1-train.tsv',
'task2': 'qrel_task2-train.tsv',
'task1-example': 'qrel.V1.0-train.tsv',
'task1-votes': 'votes-qrels-train.V1.0.tsv',
'ntcir-11-math-2-main': 'NTCIR11_Math-qrels-train.dat',
'ntcir-12-mathir-arxiv-main': 'NTCIR12_Math-qrels_agg-train.dat',
......@@ -29,20 +31,26 @@ RELEVANCE_JUDGEMENTS = {
'task1-votes': 'votes-qrels-small-validation.V1.0.tsv',
},
'validation': {
'task1': 'qrel.V1.0-validation.tsv',
'task1': 'qrel_task1-validation.tsv',
'task2': 'qrel_task2-validation.tsv',
'task1-example': 'qrel.V1.0-validation.tsv',
'task1-votes': 'votes-qrels-validation.V1.0.tsv',
'ntcir-11-math-2-main': 'NTCIR11_Math-qrels-validation.dat',
'ntcir-12-mathir-arxiv-main': 'NTCIR12_Math-qrels_agg-validation.dat',
'ntcir-12-mathir-math-wiki-formula': 'NTCIR12_MathWikiFrm-qrels_agg-validation.dat',
},
'test': {
'task1': 'qrel.V1.0-test.tsv',
'task1': 'qrel_task1-test.tsv',
'task2': 'qrel_task2-test.tsv',
'task1-example': 'qrel.V1.0-test.tsv',
'task1-votes': 'votes-qrels-test.V1.0.tsv',
'ntcir-11-math-2-main': 'NTCIR11_Math-qrels-test.dat',
'ntcir-12-mathir-arxiv-main': 'NTCIR12_Math-qrels_agg-test.dat',
'ntcir-12-mathir-math-wiki-formula': 'NTCIR12_MathWikiFrm-qrels_agg-test.dat',
},
'all': {
'task1': 'qrel_task1.tsv',
'task2': 'qrel_task2.tsv',
'task1-votes.V1.2': 'votes-qrels.V1.2.tsv',
'task2-topics-formula_ids.V.1.1': 'topics-formula_ids-qrels.V1.1.tsv',
}
......
......@@ -22,6 +22,8 @@ def evaluate_worker(result_filename):
if __name__ == '__main__':
for task in TASKS:
if not os.path.exists(task):
continue
random_ndcg = get_random_ndcg(task, 'validation')
users = glob(os.path.join(task, '*', ''))
task_results = [(random_ndcg, 'random', 'xrando42')]
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -5,7 +5,7 @@ from setuptools import setup
setup(
name='arqmath_eval',
version='0.0.12',
version='0.0.13',
description='Evaluation of ARQMath systems',
packages=['arqmath_eval'],
package_dir={'arqmath_eval': 'scripts'},
......@@ -34,6 +34,14 @@ setup(
'votes-qrels-test.V1.0.tsv',
'votes-qrels.V1.2.tsv',
'topics-formula_ids-qrels.V1.1.tsv',
'qrel_task1-test.tsv',
'qrel_task1-train.tsv',
'qrel_task1.tsv',
'qrel_task1-validation.tsv',
'qrel_task2-test.tsv',
'qrel_task2-train.tsv',
'qrel_task2.tsv',
'qrel_task2-validation.tsv',
],
},
include_package_data=True,
......
This table contains the best result for every user on the *task1* task.
This table contains the best result for every user on the *task1-example* task.
| nDCG | Result name | User |
|:-----|:------------|------|
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment