25 lines
1.3 KiB
Plaintext
25 lines
1.3 KiB
Plaintext
BLOCK_SIZE对CUDA矩阵乘法性能影响测试
|
|
========================================
|
|
Matrix Block Time(ms) FLOPS(G)
|
|
----------------------------------------
|
|
256x256 4x4 0.116 289.26
|
|
256x256 8x8 0.040 838.19
|
|
256x256 16x16 0.029 1170.29
|
|
256x256 32x32 0.026 1292.94
|
|
----------------------------------------
|
|
512x512 4x4 0.831 323.04
|
|
512x512 8x8 0.265 1014.10
|
|
512x512 16x16 0.189 1423.49
|
|
512x512 32x32 0.178 1506.57
|
|
----------------------------------------
|
|
1024x1024 4x4 6.539 328.40
|
|
1024x1024 8x8 2.022 1061.88
|
|
1024x1024 16x16 1.397 1536.94
|
|
1024x1024 32x32 1.364 1574.44
|
|
----------------------------------------
|
|
2048x2048 4x4 54.023 318.01
|
|
2048x2048 8x8 16.080 1068.38
|
|
2048x2048 16x16 11.454 1499.84
|
|
2048x2048 32x32 11.019 1559.16
|
|
----------------------------------------
|