Loading MANIFEST.in +5 −2 Original line number Diff line number Diff line include scripts/NTCIR11_Math-qrels-train.dat include scripts/NTCIR11_Math-qrels-validation.dat include scripts/NTCIR11_Math-qrels-test.dat include scripts/NTCIR12_Math-qrels_agg-train.dat include scripts/NTCIR12_Math-qrels_agg-validation.dat include scripts/NTCIR12_Math-qrels_agg-test.dat include scripts/NTCIR12_MathWikiFrm-qrels_agg-train.dat include scripts/NTCIR12_MathWikiFrm-qrels_agg-validation.dat include scripts/NTCIR12_MathWikiFrm-qrels_agg-test.dat include scripts/qrel.V1.0-train.tsv include scripts/qrel.V1.0-validation.tsv include scripts/qrel.V1.0-test.tsv include scripts/votes-qrels-train.V1.0.tsv include scripts/votes-qrels-train-train.V1.0.tsv include scripts/votes-qrels-train-validation.V1.0.tsv include scripts/votes-qrels-validation.V1.0.tsv include scripts/votes-qrels-test.V1.0.tsv README.md +53 −29 Original line number Diff line number Diff line Loading @@ -9,40 +9,45 @@ on a number of *tasks*: - `ntcir-12-mathir-arxiv-main` – [NTCIR-12 MathIR Task ArXiv Main Subtask][ntcir-12-mathir]. - `ntcir-12-mathir-math-wiki-formula` – [NTCIR-12 MathIR Task MathWikiFormula Subtask][ntcir-12-mathir]. The main tasks are: - `task1-votes` – Use this task to evaluate your ARQMath task 1 system. - `ntcir-12-mathir-math-wiki-formula` – Use this task to evaluate your ARQMath task 2 system. #### Subsets Each task comes with a number of *subsets*: Each task comes with three *subsets*: - `train` – the training set, which you should use for parameter optimization before publishing the results for the best parameters of your system, - `test` – the test set, which you should use *only for your best system* after parameter optimization on the training set, - `train-train` – a subset of the training set for the `task1-votes` task, which you can use for training if you also require a validation subset (e.g. for early stopping), and - `train-validation` – a subset of the training set for the `task1-votes` task, which you can use for training if you also require a validation subset (e.g. for early stopping). - `train` – The training set, which you can use for supervised training of your system. - `validation` – The validation set, which you can use to compare the performance of your system with different parameters. The validation set is used to compute the leaderboards in this repository. - `test` – The test set, which you currently should not use at all. It will be used at the end to compare the systems, which performed best on the validation set. ### Usage #### Evaluating your model with various parameters Place your results in [the trec\_eval format][treceval-format] into the `results.csv` file. To evaluate your results e.g. on the `train` subset of the `task1-votes` task, execute the following commands: #### Using the `train` set to train your supervised system ``` sh $ pip install git+https://gitlab.fi.muni.cz/xstefan3/arqmath-eval@master $ python >>> from arqmath_eval import get_ndcg >>> from pytrec_eval import parse_run >>> from arqmath_eval import get_topics, get_judged_documents, get_ndcg >>> >>> task = 'task1-votes' >>> subset = 'train' >>> results = {} >>> for topic in get_topics(task=task, subset=subset): >>> results[topic] = {} >>> for document in get_judged_documents(task=task, subset=subset, topic=topic): >>> similarity_score = compute_similarity_score(topic, document) >>> results[topic][document] = similarity_score >>> >>> with open('results.csv', 'rt') as f: >>> results = parse_run(f) >>> get_ndcg(results, task='task1-votes', subset='train') 0.5876 ``` Beside `get_ndcg`, the Python interface of the package also provides the following functions: Here is the documentation of the available evaluation functions: - [`get_topics(task, subset=None)`][get_topics], - [`get_judged_documents(task, subset=None, topic=None)`][get_judged_documents], Loading @@ -50,19 +55,38 @@ following functions: - [`get_ndcg(parsed_run, task, subset)`][get_ndcg], and - [`get_random_normalized_ndcg(parsed_run, task, subset)`][get_random_ndcg]. #### Placing your results to the leaderboard Place your results in [the trec\_eval format][treceval-format] into your dedicated directory *task/user*, e.g. `task1-votes/xnovot32` for the user @xnovot32 and the `task1-votes` task. To evaluate your results on the `test` set of the `task1-votes` and publish the results into the leaderboard, execute the following commands: #### Using the `validation` set to compare various parameters of your system ``` sh $ git add task1-votes/xnovot32/result.tsv # track your new result with Git $ pip install git+https://gitlab.fi.muni.cz/xstefan3/arqmath-eval@master $ python -m scripts.evaluate # run the evaluation $ python >>> from arqmath_eval import get_ndcg >>> >>> task = 'task1-votes' >>> subset = 'validation' >>> results = {} >>> for topic in get_topics(task=task, subset=subset): >>> results[topic] = {} >>> for document in get_judged_documents(task=task, subset=subset, topic=topic): >>> similarity_score = compute_similarity_score(topic, document) >>> results[topic][document] = similarity_score >>> >>> user = 'xnovot32' >>> description = 'parameter1=value_parameter2=value' >>> filename = '{}/{}/{}.tsv'.format(task, user, description) >>> with open(filename, 'wt') as f: >>> for topic, documents in results.items(): >>> top_documents = sorted(documents.items(), key=lambda x: x[1], reverse=True)[:1000] >>> for rank, (document, similarity_score) in enumerate(top_documents): >>> line = '{}\txxx\t{}\t{}\t{}\txxx'.format(topic, document, rank + 1, score) >>> print(line, file=f) $ git add task1-votes/xnovot32/result.tsv # track your new result with Git $ python -m arqmath_eval.evaluate # run the evaluation $ git add -u # add the updated leaderboard to Git $ git push # publish your new result and the updated leaderboard ``` ``` sh ``` [arqmath-task1]: https://www.cs.rit.edu/~dprl/ARQMath/Task1-answers.html (Task 1: Find Answers) Loading ntcir-11-math-2-main/README.mddeleted 100644 → 0 +0 −8 Original line number Diff line number Diff line This table contains the best result for every user. | nDCG | User | Result name | |:-----|------|:------------| | 0.6413 | xstefan3 | example, key1=value1, key2=value2, etc | | 0.6413 | xnovot32 | example, key1=value1, key2=value2, etc | | 0.6413 | xluptak4 | example, key1=value1, key2=value2, etc | | 0.6413 | ayetiran | example, key1=value1, key2=value2, etc | ntcir-11-math-2-main/ayetiran/README.md +0 −1 Original line number Diff line number Diff line Loading @@ -4,4 +4,3 @@ underscores (`_`) replaced with a comma and a space for improved readability. | nDCG | Result name | |------|:------------| | 0.6413 | example, key1=value1, key2=value2, etc | ntcir-11-math-2-main/ayetiran/example_key1=value1_key2=value2_etc.tsvdeleted 100644 → 0 +0 −10000 File deleted.Preview size limit exceeded, changes collapsed. Show changes Loading
MANIFEST.in +5 −2 Original line number Diff line number Diff line include scripts/NTCIR11_Math-qrels-train.dat include scripts/NTCIR11_Math-qrels-validation.dat include scripts/NTCIR11_Math-qrels-test.dat include scripts/NTCIR12_Math-qrels_agg-train.dat include scripts/NTCIR12_Math-qrels_agg-validation.dat include scripts/NTCIR12_Math-qrels_agg-test.dat include scripts/NTCIR12_MathWikiFrm-qrels_agg-train.dat include scripts/NTCIR12_MathWikiFrm-qrels_agg-validation.dat include scripts/NTCIR12_MathWikiFrm-qrels_agg-test.dat include scripts/qrel.V1.0-train.tsv include scripts/qrel.V1.0-validation.tsv include scripts/qrel.V1.0-test.tsv include scripts/votes-qrels-train.V1.0.tsv include scripts/votes-qrels-train-train.V1.0.tsv include scripts/votes-qrels-train-validation.V1.0.tsv include scripts/votes-qrels-validation.V1.0.tsv include scripts/votes-qrels-test.V1.0.tsv
README.md +53 −29 Original line number Diff line number Diff line Loading @@ -9,40 +9,45 @@ on a number of *tasks*: - `ntcir-12-mathir-arxiv-main` – [NTCIR-12 MathIR Task ArXiv Main Subtask][ntcir-12-mathir]. - `ntcir-12-mathir-math-wiki-formula` – [NTCIR-12 MathIR Task MathWikiFormula Subtask][ntcir-12-mathir]. The main tasks are: - `task1-votes` – Use this task to evaluate your ARQMath task 1 system. - `ntcir-12-mathir-math-wiki-formula` – Use this task to evaluate your ARQMath task 2 system. #### Subsets Each task comes with a number of *subsets*: Each task comes with three *subsets*: - `train` – the training set, which you should use for parameter optimization before publishing the results for the best parameters of your system, - `test` – the test set, which you should use *only for your best system* after parameter optimization on the training set, - `train-train` – a subset of the training set for the `task1-votes` task, which you can use for training if you also require a validation subset (e.g. for early stopping), and - `train-validation` – a subset of the training set for the `task1-votes` task, which you can use for training if you also require a validation subset (e.g. for early stopping). - `train` – The training set, which you can use for supervised training of your system. - `validation` – The validation set, which you can use to compare the performance of your system with different parameters. The validation set is used to compute the leaderboards in this repository. - `test` – The test set, which you currently should not use at all. It will be used at the end to compare the systems, which performed best on the validation set. ### Usage #### Evaluating your model with various parameters Place your results in [the trec\_eval format][treceval-format] into the `results.csv` file. To evaluate your results e.g. on the `train` subset of the `task1-votes` task, execute the following commands: #### Using the `train` set to train your supervised system ``` sh $ pip install git+https://gitlab.fi.muni.cz/xstefan3/arqmath-eval@master $ python >>> from arqmath_eval import get_ndcg >>> from pytrec_eval import parse_run >>> from arqmath_eval import get_topics, get_judged_documents, get_ndcg >>> >>> task = 'task1-votes' >>> subset = 'train' >>> results = {} >>> for topic in get_topics(task=task, subset=subset): >>> results[topic] = {} >>> for document in get_judged_documents(task=task, subset=subset, topic=topic): >>> similarity_score = compute_similarity_score(topic, document) >>> results[topic][document] = similarity_score >>> >>> with open('results.csv', 'rt') as f: >>> results = parse_run(f) >>> get_ndcg(results, task='task1-votes', subset='train') 0.5876 ``` Beside `get_ndcg`, the Python interface of the package also provides the following functions: Here is the documentation of the available evaluation functions: - [`get_topics(task, subset=None)`][get_topics], - [`get_judged_documents(task, subset=None, topic=None)`][get_judged_documents], Loading @@ -50,19 +55,38 @@ following functions: - [`get_ndcg(parsed_run, task, subset)`][get_ndcg], and - [`get_random_normalized_ndcg(parsed_run, task, subset)`][get_random_ndcg]. #### Placing your results to the leaderboard Place your results in [the trec\_eval format][treceval-format] into your dedicated directory *task/user*, e.g. `task1-votes/xnovot32` for the user @xnovot32 and the `task1-votes` task. To evaluate your results on the `test` set of the `task1-votes` and publish the results into the leaderboard, execute the following commands: #### Using the `validation` set to compare various parameters of your system ``` sh $ git add task1-votes/xnovot32/result.tsv # track your new result with Git $ pip install git+https://gitlab.fi.muni.cz/xstefan3/arqmath-eval@master $ python -m scripts.evaluate # run the evaluation $ python >>> from arqmath_eval import get_ndcg >>> >>> task = 'task1-votes' >>> subset = 'validation' >>> results = {} >>> for topic in get_topics(task=task, subset=subset): >>> results[topic] = {} >>> for document in get_judged_documents(task=task, subset=subset, topic=topic): >>> similarity_score = compute_similarity_score(topic, document) >>> results[topic][document] = similarity_score >>> >>> user = 'xnovot32' >>> description = 'parameter1=value_parameter2=value' >>> filename = '{}/{}/{}.tsv'.format(task, user, description) >>> with open(filename, 'wt') as f: >>> for topic, documents in results.items(): >>> top_documents = sorted(documents.items(), key=lambda x: x[1], reverse=True)[:1000] >>> for rank, (document, similarity_score) in enumerate(top_documents): >>> line = '{}\txxx\t{}\t{}\t{}\txxx'.format(topic, document, rank + 1, score) >>> print(line, file=f) $ git add task1-votes/xnovot32/result.tsv # track your new result with Git $ python -m arqmath_eval.evaluate # run the evaluation $ git add -u # add the updated leaderboard to Git $ git push # publish your new result and the updated leaderboard ``` ``` sh ``` [arqmath-task1]: https://www.cs.rit.edu/~dprl/ARQMath/Task1-answers.html (Task 1: Find Answers) Loading
ntcir-11-math-2-main/README.mddeleted 100644 → 0 +0 −8 Original line number Diff line number Diff line This table contains the best result for every user. | nDCG | User | Result name | |:-----|------|:------------| | 0.6413 | xstefan3 | example, key1=value1, key2=value2, etc | | 0.6413 | xnovot32 | example, key1=value1, key2=value2, etc | | 0.6413 | xluptak4 | example, key1=value1, key2=value2, etc | | 0.6413 | ayetiran | example, key1=value1, key2=value2, etc |
ntcir-11-math-2-main/ayetiran/README.md +0 −1 Original line number Diff line number Diff line Loading @@ -4,4 +4,3 @@ underscores (`_`) replaced with a comma and a space for improved readability. | nDCG | Result name | |------|:------------| | 0.6413 | example, key1=value1, key2=value2, etc |
ntcir-11-math-2-main/ayetiran/example_key1=value1_key2=value2_etc.tsvdeleted 100644 → 0 +0 −10000 File deleted.Preview size limit exceeded, changes collapsed. Show changes