Commit 3001afd4 authored by Vít Novotný's avatar Vít Novotný
Browse files

Add prefix results for doc2vec

parent b01fbaf7
The [Formula2Vec system][scm-at-arqmath] recogizes the following parameters:
- Dataset:
- arxmliv, 08, 2019, no-problem – the no\_problem subset (150,701 documents) of [the arXMLiv 08.2019 dataset][arxmliv-08-2019]
- phrases – how many times [collocation detection][] and bigram merging are iteratively applied to the corpus:
- 0 – the text and math tokens in the corpus are unchanged,
- N – [collocation detection][] and bigram merging are iteratively applied to both text and math tokens in the corpus N times
......
......@@ -4,7 +4,8 @@ underscores (`_`) replaced with a comma and a space for improved readability.
| nDCG | Result name |
|------|:------------|
| 0.7580 | prefix, phrases=2, alpha=0.05, dm=1, dm-concat=1, epochs=5, hs=1, min-alpha=0, min-count=5, vector-size=400, window=4, workers=64 |
| 0.7604 | prefix, phrases=2, alpha=0.1, dm=0, dm-concat=1, epochs=5, hs=0, min-alpha=0, min-count=5, negative=12, vector-size=300, window=8, workers=64 |
| 0.7579 | prefix, phrases=2, alpha=0.05, dm=1, dm-concat=1, epochs=5, hs=1, min-alpha=0, min-count=5, vector-size=400, window=4, workers=64 |
| *0.7578* | *random* |
## Legend
......@@ -12,7 +13,6 @@ underscores (`_`) replaced with a comma and a space for improved readability.
The [Formula2Vec system][scm-at-arqmath] recogizes the following parameters:
- Dataset:
- arxmliv, 08, 2019, no-problem – the no\_problem subset (150,701 documents) of [the arXMLiv 08.2019 dataset][arxmliv-08-2019]
- phrases – how many times [collocation detection][] and bigram merging are iteratively applied to the corpus:
- 0 – the text and math tokens in the corpus are unchanged,
- N – [collocation detection][] and bigram merging are iteratively applied to both text and math tokens in the corpus N times
......
The [SCM system][scm-at-arqmath] recogizes the following parameters:
- Dataset:
- arxmliv, 08, 2019, no-problem – the no\_problem subset (150,701 documents) of [the arXMLiv 08.2019 dataset][arxmliv-08-2019]
- phrases – how many times [collocation detection][] and bigram merging are iteratively applied to the corpus:
- 0 – the text and math tokens in the corpus are unchanged,
- N – [collocation detection][] and bigram merging are iteratively applied to both text and math tokens in the corpus N times
......
......@@ -41,7 +41,6 @@ underscores (`_`) replaced with a comma and a space for improved readability.
The [SCM system][scm-at-arqmath] recogizes the following parameters:
- Dataset:
- arxmliv, 08, 2019, no-problem – the no\_problem subset (150,701 documents) of [the arXMLiv 08.2019 dataset][arxmliv-08-2019]
- phrases – how many times [collocation detection][] and bigram merging are iteratively applied to the corpus:
- 0 – the text and math tokens in the corpus are unchanged,
- N – [collocation detection][] and bigram merging are iteratively applied to both text and math tokens in the corpus N times
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment