General Information about the LH2-part of the BW-Grid System
Information about the usage can be found on the
Wiki Pages of the LH2.
The LH2 part of the BW-Grid Cluster has the following properties:
- 70 compute-nodes, each nodes has two sockets (in total 560 cores)
- Infiniband Interconnect
- 2.8 GHz XEON Harpertown processores with 4 cores
- 16 GB Memory for each compute node, 1333 MHz FSB
- 96 ports Infiniband Switch non blocking
- Frontend node with two 2 harpertown processores, 24 GB Memory, 1333 MHz FSB
- all build in three racks with blade centres by IBM.
- Global Disk 100 TB
- Operating System: ScientificLinux? 5.0 on Intel based nodes
- Operating System: Fedora Core8 on cell based nodes
- Batchsystem: Torque/Maui
- Compiler: Intel, GCC, Java
For general information on the BW-Grid see the HLRS-wicki pages: https://wickie.hlrs.de/dgrid/index.php/Main_Page
In Figure 1 one can see a sketch of the BW-Grid at the HLRS (height
of the system is 1,86m). Three rightmost racks belong to the LH2
(frontend with bladecentres).
Figure 2 shows an overview sketch of the system.
Figure 3 shows rear-view on a single rack.
Figure 4 shows the front view on a bladecentre. Each bladecentre
holds 14 blades, each blade has two processores, each processor 4 cores
Figure 5 shows the speedup of the cluster with increasing number of
cores and increasing problem size (MUFTE-UG 2p2cni CO2-Storage example
Note that the scaling behaviour is also software/problem (MUFTE-UG, CO2-Storage) dependent.
One can see the theoretical speedup, i.e. on x cores the job runs x
times faster (compared to a single core job), differs from the real
speedup, e.g. on 64 cores the job runs (only) 53 times faster (for
0.246 million unknowns). With increasing problem size the speedup
increases for the same number of cores used. The numer of nodes is the
number of cores divided by 8 (all cores are used used on each allocated
node). For the theory on speedup calculations see Wikipedia
Figure 6 shows the efficiency of parallel computation.
For the theory on efficiency calculations see Wikipedia
Figure 7 shows the total cpu time used.
The above statistics are based on this numbers. Note that the problem
having 0.63 million unknowns, needs at least 4 cores to run (due to
memory constraints). Therefore the speedup and the efficiency for a job
running on 4 cores with 0.63 million unknowns was assumed to be equal
to the speedup and the efficiency for problem size of 0.246 million
For further questions please contact Michelle Hartnick or Andreas Kopp.