57 lines
2.9 KiB
Plaintext
57 lines
2.9 KiB
Plaintext
=== CPU (OpenMP) 不同线程数 ===
|
||
CPU矩阵乘法性能测试 (OpenMP多线程)
|
||
=================================================================
|
||
Matrix Threads Time(ms) FLOPS(G) Speedup
|
||
-----------------------------------------------------------------
|
||
256x256 8 86.012 0.39 1.14
|
||
256x256 64 78.420 0.43 1.25
|
||
256x256 256 76.496 0.44 1.28
|
||
-----------------------------------------------------------------
|
||
512x512 8 747.483 0.36 1.00
|
||
512x512 64 743.606 0.36 1.01
|
||
512x512 256 748.649 0.36 1.00
|
||
-----------------------------------------------------------------
|
||
1024x1024 8 6033.205 0.36 1.00
|
||
1024x1024 64 6049.318 0.35 1.00
|
||
1024x1024 256 6051.757 0.35 1.00
|
||
-----------------------------------------------------------------
|
||
2048x2048 8 51065.609 0.34 1.00
|
||
2048x2048 64 50995.406 0.34 1.00
|
||
2048x2048 256 51083.363 0.34 1.00
|
||
-----------------------------------------------------------------
|
||
|
||
|
||
ASCII图表:CPU性能分析
|
||
=================================================================
|
||
1. 不同线程数下的加速比趋势
|
||
Matrix Threads=8 Threads=64 Threads=256
|
||
|
||
2. 不同矩阵规模下的性能趋势
|
||
Threads 256x256 512x512 1024x1024 2048x2048
|
||
|
||
注意:完整图表建议使用Python (matplotlib) 生成。
|
||
推荐生成以下图表:
|
||
- 折线图:不同线程数下的加速比 vs 矩阵规模
|
||
- 柱状图:不同配置下的GFLOPS对比
|
||
- 热力图:线程数 × 矩阵规模 的性能分布
|
||
=== CUDA Kernel1 (基础版本) ===
|
||
CUDA Kernel1 矩阵乘法性能测试结果
|
||
=================================
|
||
Matrix Size Time(s) Time(ms) GFLOPS
|
||
---------------------------------
|
||
512x512 0.000316 0.316 849.49
|
||
1024x1024 0.002374 2.374 904.75
|
||
2048x2048 0.019190 19.190 895.23
|
||
4096x4096 0.152897 152.897 898.90
|
||
=================================
|
||
=== CUDA Kernel2 (共享内存优化) ===
|
||
CUDA Kernel2 (共享内存优化) 矩阵乘法性能测试结果
|
||
=================================
|
||
Matrix Size Time(s) Time(ms) GFLOPS
|
||
---------------------------------
|
||
512x512 0.000827 0.827 324.65
|
||
1024x1024 0.006484 6.484 331.22
|
||
2048x2048 0.053599 53.599 320.52
|
||
4096x4096 0.433242 433.242 317.23
|
||
=================================
|