Commit 9273b3ea authored by Vít Novotný's avatar Vít Novotný
Browse files

Add ARQMath task 1 and 2 relevance judgements

parent dcc5ddb0
Loading
Loading
Loading
Loading
+8 −0
Original line number Diff line number Diff line
@@ -14,3 +14,11 @@ include scripts/votes-qrels-train.V1.0.tsv
include scripts/votes-qrels-small-validation.V1.0.tsv
include scripts/votes-qrels-validation.V1.0.tsv
include scripts/votes-qrels-test.V1.0.tsv
include scripts/qrel_task1-test.tsv
include scripts/qrel_task1-train.tsv
include scripts/qrel_task1.tsv
include scripts/qrel_task1-validation.tsv
include scripts/qrel_task2-test.tsv
include scripts/qrel_task2-train.tsv
include scripts/qrel_task2.tsv
include scripts/qrel_task2-validation.tsv
+15 −8
Original line number Diff line number Diff line
@@ -3,16 +3,18 @@
This repository evaluates the performance of your information retrieval system
on a number of *tasks*:

- `task1`[ARQMath Task1][arqmath-task1] validation dataset,
- `task1-example`[ARQMath Task1][arqmath-task1] example dataset,
- `task1-votes`[ARQMath Task1][arqmath-task1] Math StackExchange [user votes][],
- `ntcir-11-math-2-main`[NTCIR-11 Math-2 Task Main Subtask][ntcir-11-math-2], and
- `ntcir-12-mathir-arxiv-main`[NTCIR-12 MathIR Task ArXiv Main Subtask][ntcir-12-mathir].
- `task1`[ARQMath Task1][arqmath-task1] final dataset,
- `ntcir-11-math-2-main`[NTCIR-11 Math-2 Task Main Subtask][ntcir-11-math-2],
- `ntcir-12-mathir-arxiv-main`[NTCIR-12 MathIR Task ArXiv Main Subtask][ntcir-12-mathir], and
- `ntcir-12-mathir-math-wiki-formula`[NTCIR-12 MathIR Task MathWikiFormula Subtask][ntcir-12-mathir].
- `task2`[ARQMath Task2][arqmath-task2] final dataset,

The main tasks are:

- `task1-votes` – Use this task to evaluate your ARQMath task 1 system.
- `ntcir-12-mathir-math-wiki-formula` – Use this task to evaluate your ARQMath task 2 system.
- `task1` – Use this task to evaluate your ARQMath task 1 system.
- `task2` – Use this task to evaluate your ARQMath task 2 system.

#### Subsets
Each task comes with three *subsets*:
@@ -26,7 +28,11 @@ Each task comes with three *subsets*:
  used at the end to compare the systems, which performed best on the
  validation set.

### Usage
The `task1` and `task2` tasks come also with the `all` subset, which contains
all relevance judgements. Use these to evaluate a system that has not been
trained using subsets of the `task1` and `task2` tasks.

### Examples
#### Using the `train` set to train your supervised system

``` sh
@@ -34,7 +40,7 @@ $ pip install --force-reinstall git+https://gitlab.fi.muni.cz/xstefan3/arqmath-e
$ python
>>> from arqmath_eval import get_topics, get_judged_documents, get_ndcg
>>>
>>> task = 'task1-votes'
>>> task = 'task1'
>>> subset = 'train'
>>> results = {}
>>> for topic in get_topics(task=task, subset=subset):
@@ -62,7 +68,7 @@ $ pip install --force-reinstall git+https://gitlab.fi.muni.cz/xstefan3/arqmath-e
$ python
>>> from arqmath_eval import get_topics, get_judged_documents
>>>
>>> task = 'task1-votes'
>>> task = 'task1'
>>> subset = 'validation'
>>> results = {}
>>> for topic in get_topics(task=task, subset=subset):
@@ -90,6 +96,7 @@ $ git push # publish your new result and the upd
```

 [arqmath-task1]:              https://www.cs.rit.edu/~dprl/ARQMath/Task1-answers.html (Task 1: Find Answers)
 [arqmath-task2]:              https://www.cs.rit.edu/~dprl/ARQMath/task2-formulas.html (Task 2: Formula Search)
 [get_judged_documents]:       https://gitlab.fi.muni.cz/xstefan3/arqmath-eval/-/blob/master/scripts/common.py#L61
 [get_ndcg]:                   https://gitlab.fi.muni.cz/xstefan3/arqmath-eval/-/blob/master/scripts/common.py#L94
 [get_random_ndcg]:            https://gitlab.fi.muni.cz/xstefan3/arqmath-eval/-/blob/master/scripts/common.py#L129
+11 −3
Original line number Diff line number Diff line
@@ -19,7 +19,9 @@ underscores (`_`) replaced with a comma and a space for improved readability.
'''.strip()
RELEVANCE_JUDGEMENTS = {
    'train': {
        'task1': 'qrel.V1.0-train.tsv',
        'task1': 'qrel_task1-train.tsv',
        'task2': 'qrel_task2-train.tsv',
        'task1-example': 'qrel.V1.0-train.tsv',
        'task1-votes': 'votes-qrels-train.V1.0.tsv',
        'ntcir-11-math-2-main': 'NTCIR11_Math-qrels-train.dat',
        'ntcir-12-mathir-arxiv-main': 'NTCIR12_Math-qrels_agg-train.dat',
@@ -29,20 +31,26 @@ RELEVANCE_JUDGEMENTS = {
        'task1-votes': 'votes-qrels-small-validation.V1.0.tsv',
    },
    'validation': {
        'task1': 'qrel.V1.0-validation.tsv',
        'task1': 'qrel_task1-validation.tsv',
        'task2': 'qrel_task2-validation.tsv',
        'task1-example': 'qrel.V1.0-validation.tsv',
        'task1-votes': 'votes-qrels-validation.V1.0.tsv',
        'ntcir-11-math-2-main': 'NTCIR11_Math-qrels-validation.dat',
        'ntcir-12-mathir-arxiv-main': 'NTCIR12_Math-qrels_agg-validation.dat',
        'ntcir-12-mathir-math-wiki-formula': 'NTCIR12_MathWikiFrm-qrels_agg-validation.dat',
    },
    'test': {
        'task1': 'qrel.V1.0-test.tsv',
        'task1': 'qrel_task1-test.tsv',
        'task2': 'qrel_task2-test.tsv',
        'task1-example': 'qrel.V1.0-test.tsv',
        'task1-votes': 'votes-qrels-test.V1.0.tsv',
        'ntcir-11-math-2-main': 'NTCIR11_Math-qrels-test.dat',
        'ntcir-12-mathir-arxiv-main': 'NTCIR12_Math-qrels_agg-test.dat',
        'ntcir-12-mathir-math-wiki-formula': 'NTCIR12_MathWikiFrm-qrels_agg-test.dat',
    },
    'all': {
        'task1': 'qrel_task1.tsv',
        'task2': 'qrel_task2.tsv',
        'task1-votes.V1.2': 'votes-qrels.V1.2.tsv',
        'task2-topics-formula_ids.V.1.1': 'topics-formula_ids-qrels.V1.1.tsv',
    }
+2 −0
Original line number Diff line number Diff line
@@ -22,6 +22,8 @@ def evaluate_worker(result_filename):

if __name__ == '__main__':
    for task in TASKS:
        if not os.path.exists(task):
            continue
        random_ndcg = get_random_ndcg(task, 'validation')
        users = glob(os.path.join(task, '*', ''))
        task_results = [(random_ndcg, 'random', 'xrando42')]
+4567 −0

File added.

Preview size limit exceeded, changes collapsed.

Loading