25 lines
1.3 KiB
Plaintext
25 lines
1.3 KiB
Plaintext
BLOCK_SIZE对CUDA矩阵乘法性能影响测试
|
|
========================================
|
|
Matrix Block Time(ms) FLOPS(G)
|
|
----------------------------------------
|
|
256x256 4x4 0.115 292.57
|
|
256x256 8x8 0.040 836.85
|
|
256x256 16x16 0.029 1151.02
|
|
256x256 32x32 0.026 1315.65
|
|
----------------------------------------
|
|
512x512 4x4 0.831 323.00
|
|
512x512 8x8 0.264 1018.65
|
|
512x512 16x16 0.190 1416.04
|
|
512x512 32x32 0.174 1542.02
|
|
----------------------------------------
|
|
1024x1024 4x4 6.541 328.33
|
|
1024x1024 8x8 2.021 1062.62
|
|
1024x1024 16x16 1.393 1541.24
|
|
1024x1024 32x32 1.353 1586.69
|
|
----------------------------------------
|
|
2048x2048 4x4 54.011 318.08
|
|
2048x2048 8x8 16.104 1066.82
|
|
2048x2048 16x16 11.355 1512.97
|
|
2048x2048 32x32 10.978 1565.00
|
|
----------------------------------------
|