Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
Michal Štefánik
ARQMath-eval
Commits
1610ef0e
Commit
1610ef0e
authored
May 31, 2020
by
stefanik12
Browse files
Merge branch 'master' of
https://gitlab.fi.muni.cz/xstefan3/arqmath-eval
parents
9276cc9f
6ae73714
Pipeline
#61644
failed with stage
Changes
3
Pipelines
1
Expand all
Hide whitespace changes
Inline
Side-by-side
task1-votes/ayetiran/LEGEND.md
0 → 100644
View file @
1610ef0e
The
[
Formula2Vec system
][
scm-at-arqmath
]
recogizes the following parameters:
-
Dataset:
-
arxmliv, 08, 2019, no-problem – the no
\_
problem subset (150,701 documents) of
[
the arXMLiv 08.2019 dataset
][
arxmliv-08-2019
]
-
phrases – how many times
[
collocation detection
][]
and bigram merging are iteratively applied to the corpus:
-
0 – the text and math tokens in the corpus are unchanged,
-
N –
[
collocation detection
][]
and bigram merging are iteratively applied to both text and math tokens in the corpus N times
-
Math representation:
-
opt – paths in operator tree
-
slt – paths in syntax layout tree
-
infix – nodes in operator tree in infix notation
-
prefix – nodes in operator tree in prefix notation
-
latex – untokenized LaTeX formulae
-
nomath – no math formulae
-
Doc2Vec:
-
alpha – initial learning rate
-
min-alpha – minimum learning rate
-
dm – whether the distributed memory architecture is used instead of the distributed bag of words
-
dm-concat – whether the concatenation of context vectors is used instead of sum/average
-
hs – whether hierarchical softmax is used instead of softmax
-
min-count – the minimum term frequency
-
vector-size – vector dimensions
-
window – window size
-
workers – the number of threads used for
[
hogwild
][]
-
epochs – the number of epochs
[
arxmliv-08-2019
]:
https://sigmathling.kwarc.info/resources/arxmliv-dataset-082019/
[
collocation detection
]:
https://radimrehurek.com/gensim/models/phrases.html
[
hogwild
]:
https://papers.nips.cc/paper/4390-hogwild-a-lock-free-approach-to-parallelizing-stochastic-gradient-descent
[
scm-at-arqmath
]:
https://gitlab.fi.muni.cz/xnovot32/scm-at-arqmath
(Soft Cosine Measure at ARQMath)
task1-votes/ayetiran/README.md
View file @
1610ef0e
...
...
@@ -4,4 +4,38 @@ underscores (`_`) replaced with a comma and a space for improved readability.
| nDCG | Result name |
|------|:------------|
| 0.7580 | prefix, phrases=2, alpha=0.05, dm=1, dm-concat=1, epochs=5, hs=1, min-alpha=0, min-count=5, vector-size=400, window=4, workers=64 |
|
*0.7578*
|
*random*
|
## Legend
The
[
Formula2Vec system
][
scm-at-arqmath
]
recogizes the following parameters:
-
Dataset:
-
arxmliv, 08, 2019, no-problem – the no
\_
problem subset (150,701 documents) of
[
the arXMLiv 08.2019 dataset
][
arxmliv-08-2019
]
-
phrases – how many times
[
collocation detection
][]
and bigram merging are iteratively applied to the corpus:
-
0 – the text and math tokens in the corpus are unchanged,
-
N –
[
collocation detection
][]
and bigram merging are iteratively applied to both text and math tokens in the corpus N times
-
Math representation:
-
opt – paths in operator tree
-
slt – paths in syntax layout tree
-
infix – nodes in operator tree in infix notation
-
prefix – nodes in operator tree in prefix notation
-
latex – untokenized LaTeX formulae
-
nomath – no math formulae
-
Doc2Vec:
-
alpha – initial learning rate
-
min-alpha – minimum learning rate
-
dm – whether the distributed memory architecture is used instead of the distributed bag of words
-
dm-concat – whether the concatenation of context vectors is used instead of sum/average
-
hs – whether hierarchical softmax is used instead of softmax
-
min-count – the minimum term frequency
-
vector-size – vector dimensions
-
window – window size
-
workers – the number of threads used for
[
hogwild
][]
-
epochs – the number of epochs
[
arxmliv-08-2019
]:
https://sigmathling.kwarc.info/resources/arxmliv-dataset-082019/
[
collocation detection
]:
https://radimrehurek.com/gensim/models/phrases.html
[
hogwild
]:
https://papers.nips.cc/paper/4390-hogwild-a-lock-free-approach-to-parallelizing-stochastic-gradient-descent
[
scm-at-arqmath
]:
https://gitlab.fi.muni.cz/xnovot32/scm-at-arqmath
(Soft Cosine Measure at ARQMath)
task1-votes/ayetiran/prefix_phrases=2_alpha=0.05_dm=1_dm-concat=1_epochs=5_hs=1_min-alpha=0_min-count=5_vector-size=400_window=4_workers=64.tsv
0 → 100644
View file @
1610ef0e
This diff is collapsed.
Click to expand it.
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment