Experimental evaluation of DiVinE Tools
The overall goal of DiVinE is to efficiently utilize available
computational resoururces to complete its verification task or complete it
sooner. The efficiency of selected DiVinE Tools is reported in the following
several case studies.
DiVinE Cluster on DAS-3
In this experiment we evaluated performance of algorithms MAP and OWCTY
as implemented in DiVinE Cluster. Our primary aim was to show that the
tool can scale up to 256 CPU cores. Our performance evaluation was done on
the Distributed ASCI Supercomputer (DAS-3), a wide-area distributed system
for Computer Science research in the Netherlands. The casestudy was done
by Kees Verstoep from VU University, Amsterdam.
Distributed ASCI Supercomputer (DAS-3)
DAS-3 consists of five clusters distributed over four sites. It uses
Myri-10G networking technology from Myricom both as an internal high-speed
interconnect as well as an interface to remote DAS-3 clusters. DAS-3 is
largely homogeneous: every cluster uses dual-CPU AMD Opteron nodes, but
with different clock speeds and/or number of CPU cores. For the purpose
of this experiment we employ DAS-3 cluster at VU University, which has the
largest number of compute nodes and cores (85 nodes with a dual-cpu,
dual-core 2.4 GHz AMD Opterons).
Intitial experiments on DAS-3 were performed with DiVinE Cluster
version 0.7.2 that contained several performance flaws due to which an
efficient usage of the tool was limited to clusters with small number of
nodes (less than 20). However, we succeed to optimize the tool to
efficently employ much larger number of cores that were available on DAS-3
system. The optimized implementation reffered below corresponds to the
DiVinE Cluster version 0.8 (will be released shortly).
Experiments on Scalability and Efficiency
First, we show graphs depicting the scalability of unoptimized and
optimized implementations. For the purpose of this experiment, we run
DiVinE Cluster on anderson.8.prop.dve model from BEEM database. Each line in the
graphs below shows scalability for the given number of nodes when using 1
core per node, 2 cores per node, and 4 cores per node.
MAP Scalability |
|
OWCTY Scalability |
|
|
|
To get an estimate on the efficiency of MAP and OWCTY, we ran
single-core versions on a special DAS-3/VU node equipped with 16GB memory
(the regular compute nodes have 4GB, which is insufficient to store the
state space there). The efficiencies of optimized implementations are
quite good even for 256 cores. In particular, MAP on a single core took
1052.5 sec; the runtimes on 64, 128, and 256 cores are 23.0, 13.1, and 8.9
sec, giving efficiencies of 72%, 63% and 46%, respectively. OWCTY on a
single core took 631.7 sec; the runtime on 64, 128, and 256 cores are
13.3, 7.4 and 5.0 sec, giving efficiencies of 74%, 67% and 49%,
respectively. The results are shown in the two following tables.
MAP Efficiency |
|
OWCTY Efficiency |
Cores | Runtime (sec) | Efficiency |
1 | 1052.5 | 100% |
64 | 23.0 | 72% |
128 | 13.1 | 63% |
256 | 8.9 | 46% |
|
|
Cores | Runtime (sec) | Efficiency |
1 | 631.7 | 100% |
64 | 13.3 | 74% |
128 | 7.4 | 67% |
256 | 5.0 | 49% |
|
Measuring Network Throughput
The following graphs illustrate the effect of the improvements on the
applications' throughput as a function of their runtime. In this case a
larger instance of the Anderson problem was used to obtain a higher
resolution graph.
MAP Throughput |
|
OWCTY Throughput |
|
|
|
DiVinE Multi-Core Performance Evaluation
The following tables reports on selected runtimes for Reachability and
LTL model checking verification tasks performed using DiVinE Multi-Core
version 1.1. You may also be interested in the table with all
experiments.
Reported runtimes are in seconds (rounded). All models come from the
BEEM database. The
meassurements were performed for two different architectures:
manwe: | 8x Dual Core AMD Opteron(tm) Processor 885
|
pheme: | 2x Intel Xeon CPU 5130 @ 2.00GHz
|
The value X denotes that the
corresponding experiment didn't complete (memory overflow).
The value
-- denotes that the corresponding
experiment was not performed.
The !
next to time means that a counterexample has been found before exploring
full state space of the model.
Reachability (generation of all reachable states)
Model | | divine @
manwe_64bit
1 core | 2 cores | 4 cores | 8 cores | 16
cores |
| | divine @
pheme_64bit
1 core | 2 cores | 3 cores | 4 cores |
|
| | | | |
| | | | |
| | 2:49:29 X | 25:28 X | 16:28 X | 11:44 X | 13:18 X |
| | 34:52 X | 5:14 X | 3:17 X | 2:48 X |
|
| | | | |
| | | | |
| | | | |
| | 30:39 | 24:29 | 18:22 | 19:17 | 18:46 |
| | 24:17 X | 18:40 X | 14:20 X | 14:08 X |
|
| | | | |
| | 2:15:04 X | 1:52:44 X | 30:47 X | 12:53 X | 5:19 X |
| | 34:43 X | 20:05 X | 7:21 X | 6:51 X |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | 2:12:03 X | 32:43 X | 15:50 X | 0 X | 5:11 X |
| | 24:23 X | 8:05 X | 4:59 X | 4:06 X |
|
| | | | |
| | 1:04:03 | 30:03 | 13:49 | 7:20 | 3:33 |
| | |
| | 15:06:33 X | 3:30:19 X | 1:12:51 X | 31:56 X | 13:04 X |
| | 2:22:41 X | 34:44 X | 19:48 X | 14:44 X |
|
| | | | |
| | | | |
| | 13:55 | 12:22 | 12:24 | 12:24 | 13:42 |
| | 10:09 X | 10:16 X | 10:09 X | 10:15 X |
|
| | | | |
| | 1:53:48 | 43:39 X | 32:26 | 17:33 | 9:20 |
| | 40:07 X | 6:18 X | 3:41 X | 3:34 X |
|
| | | | |
LTL Model checking
Model | | divine @ manwe_64bit
1 core | 2 cores | 4
cores | 8 cores | 16 cores |
| | divine @ pheme_64bit
1 core | 2 cores | 3
cores | 4 cores |
|
| | | | |
| | | | |
| | 49:21 | 38:16 | 29:56 | 30:45 | 32:04 |
| | 16:35 X | 10:44 X | 7:05 X | 7:15 X |
|
elevator2.3.prop4 | (valid) |
| | | | |
elevator2.3a.prop4 | (valid) |
| | | | |
| | | | |
| | 1:26:43 X | 20:32 X | 10:03 X | 6:04 X | 0 X |
| | 21:17 X | 5:13 X | 3:16 X | 2:40 X |
|
| | 1:32:48 X | 20:01 X | 9:46 X | 6:00 X | 3:09 X |
| | 22:49 X | 5:10 X | 3:07 X | 2:36 X |
|
leader_election.5.prop2 | (valid) |
| | | | |
leader_election.6.prop2 | (valid) |
| | | | |
leader_filters.6.prop2 | (valid) |
| | 1:06:16 | 37:05 | 21:42 | 15:54 | 11:06 |
| | 12:35 X | 5:03 X | 3:16 X | 2:33 X |
|
leader_filters.7.prop2 | (valid) |
| | | | |
| | | | |
| | 1:38:18 | 52:58 | 32:12 | 20:20 | 13:35 |
| | 26:06 X | 4:27 X | 3:09 X | 2:45 X |
|
| | 1:51:46 | 58:48 | 36:18 | 22:40 | 15:41 |
| | 26:22 X | 7:49 X | 5:55 X | 4:19 X |
|
public_subscribe.5.prop1 | (n/a) |
| | 1:20:41 X | 22:36 X | 10:17 X | 5:13 X | 2:41 X |
| | 20:30 X | 5:41 X | 3:29 X | 2:37 X |
|
| | | | |
| | | | |
szymanski.4.prop4 | (valid) |
| | | | |
anderson.3.prop2 | (invalid) |
| | | | |
| | | | |
| | | | |
| | | | |
elevator.5.prop2 | (invalid) |
| | 1:28:30 | 1:12:42 | 2:48
! | 1:08
! | 4
! |
| | 17:07 X | 7:29 X | 6:22 X | 2:23 ! |
|
iprotocol.4.prop4 | (invalid) |
| | | | |
lamport.6.prop2 |
(invalid) |
| | | | |
| | 1:57:49 | 24:56 ! | 39:44 | 26:21 | 19:18 |
| | 25:22 X | 7:05 X | 5:33 X | 4:20 X |
|
peterson.5.prop2 | (invalid) |
| | 1:43:02 | 14:20 ! | 36:16 | 23:49 | 17:14 |
| | 27:27 X | 4:23 X | 2:21 ! | 1:40 ! |
|
| | | | |
szymanski.5.prop2 | (invalid) |
| | 1:00:52 | 7:34 ! | 1:20 ! | 8 ! | 3 ! |
| | |
| | | | |