A Simple Memory Benchmark
Introduction
I wrote a very simple memory benchmark. It performs a sequential read or write
on a block of memory until 256 MB was read or written, for different block
sizes. This should show the difference in memory performance for cache hits and
cache misses.
Below are the results for some machines I have access to. callisto is
my PowerPC.
Memory Read Performance (MB/s)
64 | 350.685 | 287.640 | 104.065 | 609.524 | 609.524 | 433.898 | 218.803 | 640.000
|
---|
128 | 345.946 | 263.918 | 110.823 | 609.524 | 609.524 | 501.961 | 222.609 | 731.429
|
---|
256 | 400.000 | 290.909 | 110.823 | 640.000 | 640.000 | 522.449 | 237.037 | 775.758
|
---|
512 | 419.672 | 290.909 | 117.972 | 673.684 | 673.684 | 522.449 | 246.154 | 825.806
|
---|
1K | 433.898 | 308.434 | 116.895 | 691.892 | 711.111 | 544.681 | 234.862 | 800.000
|
---|
2K | 441.379 | 308.434 | 117.431 | 711.111 | 711.111 | 544.681 | 253.465 | 825.806
|
---|
4K | 449.123 | 316.049 | 118.519 | 711.111 | 711.111 | 556.522 | 243.810 | 825.806
|
---|
8K | 419.672 | 320.000 | 119.626 | 711.111 | 711.111 | 556.522 | 237.037 | 825.806
|
---|
16K | 161.006 | 324.051 | 117.972 | 731.429 | 711.111 | 533.333 | 220.690 | 825.806
|
---|
32K | 162.025 | 174.150 | 118.519 | 200.000 | 711.111 | 201.575 | 224.561 | 800.000
|
---|
64K | 146.286 | 180.282 | 117.431 | 130.612 | 139.891 | 192.481 | 269.474 | 253.465
|
---|
128K | 142.222 | 172.973 | 89.510 | 134.737 | 134.031 | 188.235 | 237.037 | 253.465
|
---|
256K | 117.972 | 179.021 | 89.199 | 113.274 | 123.671 | 175.342 | 211.570 | 250.980
|
---|
512K | 109.871 | 168.421 | 89.510 | 100.787 | 113.778 | 163.057 | 201.575 | 250.980
|
---|
1M | 94.465 | 110.823 | 89.199 | 90.780 | 106.224 | 146.286 | 211.570 | 142.222
|
---|
2M | 89.199 | 109.871 | 94.465 | 87.372 | 101.587 | 143.017 | 204.800 | 141.436
|
---|
4M | 88.276 | 109.402 | 93.773 | 87.372 | 101.186 | 140.659 | 200.000 | 142.222
|
---|
8M | 88.889 | 108.936 | 91.103 | 87.075 | 101.587 | 139.891 | 206.452 | 142.222
|
---|
Memory Write Performance (MB/s)
64 | 82.848 | 426.667 | 253.465 | 522.449 | 533.333 | 83.388 | 465.455 | 581.818
|
---|
128 | 82.315 | 433.898 | 297.674 | 568.889 | 568.889 | 83.934 | 595.349 | 691.892
|
---|
256 | 82.848 | 465.455 | 332.468 | 609.524 | 595.349 | 85.050 | 640.000 | 731.429
|
---|
512 | 84.211 | 512.000 | 350.685 | 624.390 | 640.000 | 84.211 | 691.892 | 775.758
|
---|
1K | 83.934 | 512.000 | 355.556 | 640.000 | 640.000 | 84.211 | 731.429 | 752.941
|
---|
2K | 82.581 | 522.449 | 360.563 | 656.410 | 656.410 | 84.211 | 731.429 | 775.758
|
---|
4K | 83.934 | 522.449 | 365.714 | 656.410 | 640.000 | 85.050 | 691.892 | 775.758
|
---|
8K | 83.660 | 544.681 | 365.714 | 656.410 | 656.410 | 83.388 | 656.410 | 800.000
|
---|
16K | 83.117 | 556.522 | 365.714 | 656.410 | 673.684 | 84.211 | 711.111 | 800.000
|
---|
32K | 83.117 | 533.333 | 365.714 | 148.837 | 656.410 | 85.050 | 711.111 | 731.429
|
---|
64K | 81.529 | 544.681 | 360.563 | 93.431 | 100.787 | 84.211 | 711.111 | 131.282
|
---|
128K | 82.848 | 533.333 | 119.626 | 97.338 | 94.815 | 83.388 | 691.892 | 131.282
|
---|
256K | 79.012 | 501.961 | 119.626 | 78.769 | 85.619 | 84.211 | 400.000 | 131.282
|
---|
512K | 82.848 | 406.349 | 120.188 | 69.377 | 78.287 | 84.211 | 371.014 | 129.949
|
---|
1M | 82.581 | 148.837 | 120.188 | 61.836 | 72.113 | 84.488 | 400.000 | 69.755
|
---|
2M | 82.848 | 142.222 | 119.626 | 59.535 | 68.633 | 84.488 | 400.000 | 69.565
|
---|
4M | 82.848 | 146.286 | 116.895 | 59.259 | 68.817 | 83.388 | 406.349 | 69.565
|
---|
8M | 82.581 | 145.455 | 114.286 | 59.953 | 69.003 | 84.211 | 328.205 | 70.330
|
---|
Machines
mercator | Pentium | 133 MHz | i430FX | 66 MHz | 2x8K | 512K | 64M EDO
|
---|
stevin | UltraSPARC | 167 MHz | Ultra-2 | 83 MHz | 2x16K | 1M | 256M
|
---|
sp05 | POWER2 | 66 MHz | SP2 thin | 66 MHz | 64K | - | 256M
|
---|
callistoE | 604e | 200 MHz | VLSI GG2 | 66 MHz | 2x32K | 512K | 32M EDO
|
---|
callistoS | 604e | 200 MHz | VLSI GG2 | 66 MHz | 2x32K | 512K | 32M SD
|
---|
piefje | Pentium MMX | 166 MHz | i430TX | 66 MHz | 2x16K | 512K | 64M EDO
|
---|
sp17 | POWER2 | 135 MHz | SP2 wide | ?? | 128K | - | 512M
|
---|
bartm | K6 | 233 MHz | i430TX | 66 MHz | 2x32K | 512K | 64M SD
|
---|
Program Source
The C source is to be compiled with the command
gcc -O3 -fomit-frame-pointer MemSpeed.c -o MemSpeed
This page is maintained by Geert Uytterhoeven.
$Date: 2006-01-25 21:24:49 $