yly/hpc-lab-code

Fork 0

yly 27b49b7237 save dev files

2026-01-21 18:02:30 +08:00

5.0 KiB

Raw Permalink Blame History

实验 2.3 并行环境下 OpenMP 程序的编译和运行

实验目的

掌握 OpenMP 的基本功能、构成方式、句法
掌握 OpenMP 体系结构、特点与组成
掌握采用 OpenMP 进行多核架构下多线程编程的基本使用方法

实验环境

操作系统: Linux
编译器: GCC with OpenMP support
构建工具: xmake

实验一：Hello World (示例)

源代码

文件: src/openmp_hello_world.c

#include <stdio.h>
#include <omp.h>

int main() {
    int i;
    
    #pragma omp parallel
    {
        printf("Hello World\n");
        for(i=0; i<4; i++) {
            printf("Iter:%d\n",i);
        }
        printf("GoodBye World\n");
    }
    
    return 0;
}

编译和运行

xmake build openmp_hello_world
xmake run openmp_hello_world

运行结果

程序创建了多个线程（默认为系统核心数），每个线程都执行了 parallel 区域内的代码。可以看到多个 "Hello World" 和 "GoodBye World" 输出，展示了 OpenMP 的并行执行特性。

实验二：利用中值积分定理计算 Pi 值

串行版本

文件: src/pi.c

并行版本

文件: src/pi_par.c

关键并行化技术：

使用 #pragma omp parallel private(x) reduction(+:sum) 创建并行区域
使用 #pragma omp for 分配循环迭代
使用 private(x) 声明每个线程的私有变量
使用 reduction(+:sum) 自动合并各线程的 sum 值

性能对比

线程数	PI 值	执行时间 (秒)	加速比
1 (串行)	3.141592653590	1.554281	1.00x
2	3.141592653590	0.831361	1.87x
4	3.141592653590	0.448621	3.47x
8	3.141592653590	0.241111	6.45x

分析

并行化后结果完全一致，精度保持不变
随着线程数增加，执行时间显著减少
8 线程时达到 6.45 倍加速比，接近理想加速比
该算法计算密集，适合并行化

实验三：PI 值蒙特卡洛算法

串行版本

文件: src/pimonte_serial.c

并行版本

文件: src/pimonte_par.c

关键并行化技术：

使用 #pragma omp parallel private(i, j, x, y, r) reduction(+:dUnderCurve)
使用 rand_r(&seed) 替代 rand() 以保证线程安全
每个线程使用不同的种子：seed = omp_get_thread_num() + 1
数组 r 声明为 private，每个线程拥有独立副本

性能对比

线程数	PI 值	执行时间 (秒)	加速比
1 (串行)	3.141636540	8.347886	1.00x
2	3.141610420	1.662027	5.02x
4	3.141572660	0.858852	9.72x
8	3.141683140	0.464995	17.95x

分析

蒙特卡洛方法的并行化效果非常显著
8 线程时达到近 18 倍加速比，超过理想加速比
原因：串行版本包含随机数生成的开销，而并行版本每个线程独立生成随机数
PI 值精度略有波动，这是蒙特卡洛方法的特性（随机算法）

OpenMP 并行化方法总结

1. 创建并行区域

#pragma omp parallel
{
    // 代码块
}

2. 并行化 for 循环

#pragma omp parallel for
for(int i=0; i<N; i++) {
    // 循环体
}

3. 变量作用域声明

#pragma omp parallel private(var1, var2) shared(var3) reduction(+:sum)
{
    // 代码块
}

private: 每个线程拥有独立副本
shared: 所有线程共享同一变量
reduction: 各线程计算后自动合并结果

4. 临界区保护

#pragma omp critical
{
    // 需要互斥访问的代码
}

实验心得

OpenMP 简化了并行编程：通过编译器指令即可实现并行化，无需显式创建线程
变量作用域管理很重要：正确使用 private 和 shared 关键字避免数据竞争
Reduction 操作很实用：自动处理累加等操作的并行合并
线程安全需要注意：如 rand() 函数需要替换为 rand_r()
性能提升显著：计算密集型任务通过并行化可获得接近线性的加速比

编译和运行命令

编译所有程序

cd /home/yly/dev/hpc-lab-code/lab2/omp
xmake

运行单个程序

# Hello World
xmake run openmp_hello_world

# PI 串行
xmake run pi

# PI 并行（指定线程数）
export OMP_NUM_THREADS=4
xmake run pi_par

# 蒙特卡洛串行
xmake run pimonte_serial

# 蒙特卡洛并行（指定线程数）
export OMP_NUM_THREADS=4
xmake run pimonte_par

文件结构

lab2/omp/
├── src/
│   ├── openmp_hello_world.c    # 实验一：Hello World
│   ├── pi.c                     # 实验二：PI 串行（中值积分）
│   ├── pi_par.c                 # 实验二：PI 并行（中值积分）
│   ├── pimonte_serial.c         # 实验三：PI 串行（蒙特卡洛）
│   └── pimonte_par.c            # 实验三：PI 并行（蒙特卡洛）
├── xmake.lua                    # 构建配置
└── 实验报告.md                   # 本文档

5.0 KiB Raw Permalink Blame History Unescape Escape

实验 2.3 并行环境下 OpenMP 程序的编译和运行

实验目的

实验环境

实验一：Hello World (示例)

源代码

编译和运行

运行结果

实验二：利用中值积分定理计算 Pi 值

串行版本

并行版本

性能对比

分析

实验三：PI 值蒙特卡洛算法

串行版本

并行版本

性能对比

分析

OpenMP 并行化方法总结

1. 创建并行区域

2. 并行化 for 循环

3. 变量作用域声明

4. 临界区保护

实验心得

编译和运行命令

编译所有程序

运行单个程序

文件结构

5.0 KiB

Raw Permalink Blame History