Commit 09948c29 authored by Vít Novotný's avatar Vít Novotný
Browse files

Add support for NTCIR-11 Math-2 Main and NTCIR-12 MathIR ArXiv Main

parent 4280a34e
Loading
Loading
Loading
Loading
Loading
+5 −1
Original line number Diff line number Diff line
@@ -2,7 +2,9 @@
This repository evaluates the performance of your information retrieval system
on a number of *tasks*:

- task1/ – [ARQMath Task1: Find Answers][arqmath-task1]
- task1/ – [ARQMath Task1][arqmath-task1]  validation dataset,
- ntcir-11-math-2-main/ – [NTCIR-11 Math-2 Task Main Subtask][ntcir-11-math-2], and
- ntcir-12-mathir-arxiv-main/ – [NTCIR-12 MathIR Task ArXiv Main Subtask][ntcir-12-mathir].

Place your results in [the trec\_eval format][treceval-format] into your
dedicated directory *task/user*. To evaluate and publish your results,
@@ -18,3 +20,5 @@ $ git push # publish your new result and the updated lea

 [arqmath-task1]:   https://www.cs.rit.edu/~dprl/ARQMath/Task1-answers.html (Task 1: Find Answers)
 [treceval-format]: https://stackoverflow.com/a/8175382/657401 (How to evaluate a search/retrieval engine using trec_eval?)
 [ntcir-11-math-2]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.686.444&rep=rep1&type=pdf (NTCIR-11 Math-2 Task Overview)
 [ntcir-12-mathir]: https://www.cs.rit.edu/~rlaz/files/ntcir12-mathir.pdf (NTCIR-12 MathIR Task Overview)
+5 −4
Original line number Diff line number Diff line
@@ -8,9 +8,10 @@ import numpy as np
from pytrec_eval import RelevanceEvaluator, parse_qrel, parse_run


TASKS = ['task1']
RELEVANCE_JUDGEMENTS = {
    'task1': 'qrel.V0.1.tsv',
    'ntcir-11-math-2-main': 'NTCIR11_Math-qrels.dat',
    'ntcir-12-mathir-arxiv-main': 'NTCIR12_Math-qrels_agg.dat',
}
TASK_README_HEAD = r'''
This table contains the best result for every user.
@@ -29,8 +30,8 @@ underscores (`_`) replaced with a comma and a space for improved readability.


if __name__ == '__main__':
    for task in TASKS:
        with open(os.path.join(task, RELEVANCE_JUDGEMENTS[task]), 'rt') as f:
    for task, relevance_judgements in RELEVANCE_JUDGEMENTS.items():
        with open(os.path.join(task, relevance_judgements), 'rt') as f:
            parsed_relevance_judgements = parse_qrel(f)
        evaluator = RelevanceEvaluator(parsed_relevance_judgements, {'ndcg'})
        task_results = []
+2500 −0

File added.

Preview size limit exceeded, changes collapsed.

+8 −0
Original line number Diff line number Diff line
This table contains the best result for every user.

| nDCG | User | Result name |
|:-----|------|:------------|
| 0.5499 | xstefan3 | example, key1=value1, key2=value2, etc |
| 0.5499 | xnovot32 | example, key1=value1, key2=value2, etc |
| 0.5499 | xluptak4 | example, key1=value1, key2=value2, etc |
| 0.5499 | ayetiran | example, key1=value1, key2=value2, etc |
+7 −0
Original line number Diff line number Diff line
This table contains all results for user *ayetiran* in descending order of task
performance.  Result names are based on the filenames of the results with
underscores (`_`) replaced with a comma and a space for improved readability.

| nDCG | Result name |
|------|:------------|
| 0.5499 | example, key1=value1, key2=value2, etc |
Loading