Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
Michal Štefánik
ARQMath-eval
Commits
08cbfc81
Commit
08cbfc81
authored
May 15, 2020
by
stefanik12
Browse files
xstefan3 readme merge
parents
9240fcd6
ae012ba8
Pipeline
#60909
failed with stage
Changes
9
Pipelines
1
Expand all
Hide whitespace changes
Inline
Side-by-side
task1-votes/xnovot32/LEGEND.md
View file @
08cbfc81
The system recogizes the following parameters:
The
[
SCM
system
][
scm-at-arqmath
]
recogizes the following parameters:
-
Dataset:
-
arxmliv, 08, 2019, no-problem – the no
\_
problem subset (150,701 documents) of
[
the arXMLiv 08.2019 dataset
][
arxmliv-08-2019
]
-
phrases – whether phrases are modeled
-
phrases – how many times
[
collocation detection
][]
and bigram merging are iteratively applied to the corpus:
-
0 – the text and math tokens in the corpus are unchanged,
-
N –
[
collocation detection
][]
and bigram merging are iteratively applied to both text and math tokens in the corpus N times
-
Math representation:
-
opt – paths in operator tree
-
slt – paths in syntax layout tree
...
...
@@ -16,13 +18,13 @@ The system recogizes the following parameters:
-
iter – the number of epochs
-
min-alpha – minimum learning rate
-
min-n, max-n – the range of modeled subword sizes
-
min-count – minimum term frequency
-
min-count –
the
minimum term frequency
-
negative – the number of negative samples
-
sample – sampling threshold
-
sg – the skipgram model
-
size – vector dimensions
-
window – window size
-
workers – the number of threads used
in H
og
W
ild
-
workers – the number of threads used
for
[
h
og
w
ild
][]
-
Soft Cosine Measure:
-
dominant – whether the term similarity matrix will be strongly diagonally dominant
-
nonzero-limit – the maximum number of non-zero elements outside the diagonal in a single column of the term similarity matrix
...
...
@@ -31,4 +33,7 @@ The system recogizes the following parameters:
-
threshold – parameter
*t*
in the
[
term similarity matrix formula
][]
[
arxmliv-08-2019
]:
https://sigmathling.kwarc.info/resources/arxmliv-dataset-082019/
[
collocation detection
]:
https://radimrehurek.com/gensim/models/phrases.html
[
hogwild
]:
https://papers.nips.cc/paper/4390-hogwild-a-lock-free-approach-to-parallelizing-stochastic-gradient-descent
[
scm-at-arqmath
]:
https://gitlab.fi.muni.cz/xnovot32/scm-at-arqmath
(Soft Cosine Measure at ARQMath)
[
term similarity matrix formula
]:
https://arxiv.org/pdf/2003.05019.pdf#page=4
task1-votes/xnovot32/README.md
View file @
08cbfc81
...
...
@@ -4,21 +4,24 @@ underscores (`_`) replaced with a comma and a space for improved readability.
| nDCG | Result name |
|------|:------------|
| 0.7613 | arxmliv, infix, 08, 2019, no-problem, phrases=0, alpha=0.05, bucket=2000000, iter=5, max-n=6, min-alpha=0, min-count=5, min-n=3, negative=5, sample=0.0001, sg=1, size=300, window=5, workers=64, dominant=True, nonzero-limit=100, symmetric=True, exponent=4.0, threshold=-1.0 |
| 0.7612 | arxmliv, prefix, 08, 2019, no-problem, phrases=0, alpha=0.05, bucket=2000000, iter=5, max-n=6, min-alpha=0, min-count=5, min-n=3, negative=5, sample=0.0001, sg=1, size=300, window=5, workers=64, dominant=True, nonzero-limit=100, symmetric=True, exponent=4.0, threshold=-1.0 |
| 0.7607 | arxmliv, slt, 08, 2019, no-problem, phrases=0, alpha=0.05, bucket=2000000, iter=5, max-n=6, min-alpha=0, min-count=5, min-n=3, negative=5, sample=0.0001, sg=1, size=300, window=5, workers=64, dominant=True, nonzero-limit=100, symmetric=True, exponent=4.0, threshold=-1.0 |
| 0.7606 | arxmliv, opt, 08, 2019, no-problem, phrases=0, alpha=0.05, bucket=2000000, iter=5, max-n=6, min-alpha=0, min-count=5, min-n=3, negative=5, sample=0.0001, sg=1, size=300, window=5, workers=64, dominant=True, nonzero-limit=100, symmetric=True, exponent=4.0, threshold=-1.0 |
| 0.7602 | arxmliv, latex, 08, 2019, no-problem, phrases=0, alpha=0.05, bucket=2000000, iter=5, max-n=6, min-alpha=0, min-count=5, min-n=3, negative=5, sample=0.0001, sg=1, size=300, window=5, workers=64, dominant=True, nonzero-limit=100, symmetric=True, exponent=4.0, threshold=-1.0 |
| 0.7600 | arxmliv, nomath, 08, 2019, no-problem, phrases=0, alpha=0.05, bucket=2000000, iter=5, max-n=6, min-alpha=0, min-count=5, min-n=3, negative=5, sample=0.0001, sg=1, size=300, window=5, workers=64, dominant=True, nonzero-limit=100, symmetric=True, exponent=4.0, threshold=-1.0 |
| 0.7614 | infix, phrases=1, alpha=0.05, bucket=2000000, iter=5, max-n=6, min-alpha=0, min-count=5, min-n=3, negative=5, sample=0.0001, sg=1, size=300, window=5, workers=64, dominant=True, nonzero-limit=100, symmetric=True, exponent=4.0, threshold=-1.0 |
| 0.7613 | infix, phrases=0, alpha=0.05, bucket=2000000, iter=5, max-n=6, min-alpha=0, min-count=5, min-n=3, negative=5, sample=0.0001, sg=1, size=300, window=5, workers=64, dominant=True, nonzero-limit=100, symmetric=True, exponent=4.0, threshold=-1.0 |
| 0.7612 | prefix, phrases=0, alpha=0.05, bucket=2000000, iter=5, max-n=6, min-alpha=0, min-count=5, min-n=3, negative=5, sample=0.0001, sg=1, size=300, window=5, workers=64, dominant=True, nonzero-limit=100, symmetric=True, exponent=4.0, threshold=-1.0 |
| 0.7607 | slt, phrases=0, alpha=0.05, bucket=2000000, iter=5, max-n=6, min-alpha=0, min-count=5, min-n=3, negative=5, sample=0.0001, sg=1, size=300, window=5, workers=64, dominant=True, nonzero-limit=100, symmetric=True, exponent=4.0, threshold=-1.0 |
| 0.7606 | opt, phrases=0, alpha=0.05, bucket=2000000, iter=5, max-n=6, min-alpha=0, min-count=5, min-n=3, negative=5, sample=0.0001, sg=1, size=300, window=5, workers=64, dominant=True, nonzero-limit=100, symmetric=True, exponent=4.0, threshold=-1.0 |
| 0.7602 | latex, phrases=0, alpha=0.05, bucket=2000000, iter=5, max-n=6, min-alpha=0, min-count=5, min-n=3, negative=5, sample=0.0001, sg=1, size=300, window=5, workers=64, dominant=True, nonzero-limit=100, symmetric=True, exponent=4.0, threshold=-1.0 |
| 0.7600 | nomath, phrases=0, alpha=0.05, bucket=2000000, iter=5, max-n=6, min-alpha=0, min-count=5, min-n=3, negative=5, sample=0.0001, sg=1, size=300, window=5, workers=64, dominant=True, nonzero-limit=100, symmetric=True, exponent=4.0, threshold=-1.0 |
|
*0.7578*
|
*random*
|
## Legend
The system recogizes the following parameters:
The
[
SCM
system
][
scm-at-arqmath
]
recogizes the following parameters:
-
Dataset:
-
arxmliv, 08, 2019, no-problem – the no
\_
problem subset (150,701 documents) of
[
the arXMLiv 08.2019 dataset
][
arxmliv-08-2019
]
-
phrases – whether phrases are modeled
-
phrases – how many times
[
collocation detection
][]
and bigram merging are iteratively applied to the corpus:
-
0 – the text and math tokens in the corpus are unchanged,
-
N –
[
collocation detection
][]
and bigram merging are iteratively applied to both text and math tokens in the corpus N times
-
Math representation:
-
opt – paths in operator tree
-
slt – paths in syntax layout tree
...
...
@@ -32,13 +35,13 @@ The system recogizes the following parameters:
-
iter – the number of epochs
-
min-alpha – minimum learning rate
-
min-n, max-n – the range of modeled subword sizes
-
min-count – minimum term frequency
-
min-count –
the
minimum term frequency
-
negative – the number of negative samples
-
sample – sampling threshold
-
sg – the skipgram model
-
size – vector dimensions
-
window – window size
-
workers – the number of threads used
in H
og
W
ild
-
workers – the number of threads used
for
[
h
og
w
ild
][]
-
Soft Cosine Measure:
-
dominant – whether the term similarity matrix will be strongly diagonally dominant
-
nonzero-limit – the maximum number of non-zero elements outside the diagonal in a single column of the term similarity matrix
...
...
@@ -47,4 +50,7 @@ The system recogizes the following parameters:
-
threshold – parameter
*t*
in the
[
term similarity matrix formula
][]
[
arxmliv-08-2019
]:
https://sigmathling.kwarc.info/resources/arxmliv-dataset-082019/
[
collocation detection
]:
https://radimrehurek.com/gensim/models/phrases.html
[
hogwild
]:
https://papers.nips.cc/paper/4390-hogwild-a-lock-free-approach-to-parallelizing-stochastic-gradient-descent
[
scm-at-arqmath
]:
https://gitlab.fi.muni.cz/xnovot32/scm-at-arqmath
(Soft Cosine Measure at ARQMath)
[
term similarity matrix formula
]:
https://arxiv.org/pdf/2003.05019.pdf#page=4
task1-votes/xnovot32/
arxmliv_infix_08_2019_no-problem
_phrases=0_alpha=0.05_bucket=2000000_iter=5_max-n=6_min-alpha=0_min-count=5_min-n=3_negative=5_sample=0.0001_sg=1_size=300_window=5_workers=64_dominant=True_nonzero-limit=100_symmetric=True_exponent=4.0_threshold=-1.0.tsv
→
task1-votes/xnovot32/
infix
_phrases=0_alpha=0.05_bucket=2000000_iter=5_max-n=6_min-alpha=0_min-count=5_min-n=3_negative=5_sample=0.0001_sg=1_size=300_window=5_workers=64_dominant=True_nonzero-limit=100_symmetric=True_exponent=4.0_threshold=-1.0.tsv
View file @
08cbfc81
File moved
task1-votes/xnovot32/infix_phrases=1_alpha=0.05_bucket=2000000_iter=5_max-n=6_min-alpha=0_min-count=5_min-n=3_negative=5_sample=0.0001_sg=1_size=300_window=5_workers=64_dominant=True_nonzero-limit=100_symmetric=True_exponent=4.0_threshold=-1.0.tsv
0 → 100644
View file @
08cbfc81
This diff is collapsed.
Click to expand it.
task1-votes/xnovot32/
arxmliv_latex_08_2019_no-problem
_phrases=0_alpha=0.05_bucket=2000000_iter=5_max-n=6_min-alpha=0_min-count=5_min-n=3_negative=5_sample=0.0001_sg=1_size=300_window=5_workers=64_dominant=True_nonzero-limit=100_symmetric=True_exponent=4.0_threshold=-1.0.tsv
→
task1-votes/xnovot32/
latex
_phrases=0_alpha=0.05_bucket=2000000_iter=5_max-n=6_min-alpha=0_min-count=5_min-n=3_negative=5_sample=0.0001_sg=1_size=300_window=5_workers=64_dominant=True_nonzero-limit=100_symmetric=True_exponent=4.0_threshold=-1.0.tsv
View file @
08cbfc81
File moved
task1-votes/xnovot32/
arxmliv_nomath_08_2019_no-problem
_phrases=0_alpha=0.05_bucket=2000000_iter=5_max-n=6_min-alpha=0_min-count=5_min-n=3_negative=5_sample=0.0001_sg=1_size=300_window=5_workers=64_dominant=True_nonzero-limit=100_symmetric=True_exponent=4.0_threshold=-1.0.tsv
→
task1-votes/xnovot32/
nomath
_phrases=0_alpha=0.05_bucket=2000000_iter=5_max-n=6_min-alpha=0_min-count=5_min-n=3_negative=5_sample=0.0001_sg=1_size=300_window=5_workers=64_dominant=True_nonzero-limit=100_symmetric=True_exponent=4.0_threshold=-1.0.tsv
View file @
08cbfc81
File moved
task1-votes/xnovot32/
arxmliv_opt_08_2019_no-problem
_phrases=0_alpha=0.05_bucket=2000000_iter=5_max-n=6_min-alpha=0_min-count=5_min-n=3_negative=5_sample=0.0001_sg=1_size=300_window=5_workers=64_dominant=True_nonzero-limit=100_symmetric=True_exponent=4.0_threshold=-1.0.tsv
→
task1-votes/xnovot32/
opt
_phrases=0_alpha=0.05_bucket=2000000_iter=5_max-n=6_min-alpha=0_min-count=5_min-n=3_negative=5_sample=0.0001_sg=1_size=300_window=5_workers=64_dominant=True_nonzero-limit=100_symmetric=True_exponent=4.0_threshold=-1.0.tsv
View file @
08cbfc81
File moved
task1-votes/xnovot32/
arxmliv_prefix_08_2019_no-problem
_phrases=0_alpha=0.05_bucket=2000000_iter=5_max-n=6_min-alpha=0_min-count=5_min-n=3_negative=5_sample=0.0001_sg=1_size=300_window=5_workers=64_dominant=True_nonzero-limit=100_symmetric=True_exponent=4.0_threshold=-1.0.tsv
→
task1-votes/xnovot32/
prefix
_phrases=0_alpha=0.05_bucket=2000000_iter=5_max-n=6_min-alpha=0_min-count=5_min-n=3_negative=5_sample=0.0001_sg=1_size=300_window=5_workers=64_dominant=True_nonzero-limit=100_symmetric=True_exponent=4.0_threshold=-1.0.tsv
View file @
08cbfc81
File moved
task1-votes/xnovot32/
arxmliv_slt_08_2019_no-problem
_phrases=0_alpha=0.05_bucket=2000000_iter=5_max-n=6_min-alpha=0_min-count=5_min-n=3_negative=5_sample=0.0001_sg=1_size=300_window=5_workers=64_dominant=True_nonzero-limit=100_symmetric=True_exponent=4.0_threshold=-1.0.tsv
→
task1-votes/xnovot32/
slt
_phrases=0_alpha=0.05_bucket=2000000_iter=5_max-n=6_min-alpha=0_min-count=5_min-n=3_negative=5_sample=0.0001_sg=1_size=300_window=5_workers=64_dominant=True_nonzero-limit=100_symmetric=True_exponent=4.0_threshold=-1.0.tsv
View file @
08cbfc81
File moved
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment