hpc-lab-code/work/README.md
2026-01-22 04:31:52 +08:00

87 lines
2.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# MPI+OpenMP Hybrid Parallel Matrix Multiplication Experiments
## Overview
This document summarizes the experimental analysis of MPI+OpenMP hybrid parallel matrix multiplication performance.
## Generated Files
### Analysis Scripts
- `analyze_mpi_openmp.py` - Python script for data analysis and visualization
### Figures (All labels in English)
1. **experiment1_analysis.png** - Experiment 1: Varying MPI Processes (OpenMP threads=1)
- Execution Time vs MPI Processes
- Speedup vs MPI Processes
- Parallel Efficiency vs MPI Processes
- Parallel Efficiency Heatmap
2. **experiment2_analysis.png** - Experiment 2: Varying Both MPI and OpenMP
- Efficiency Comparison (Total Processes=16)
- Best Configuration Efficiency vs Matrix Size
- MPI Process Impact on Efficiency
- Speedup Comparison for Different Configurations
3. **experiment3_analysis.png** - Experiment 3: Optimization Results
- Execution Time Comparison (Before/After)
- Efficiency Comparison (Before/After)
- Optimization Effect for Different Matrix Sizes
- Best Configuration Efficiency Comparison
### Data Files
- `experiment_results.csv` - Complete experimental data
- `serial_results.csv` - Serial baseline performance
### Reports (in Chinese)
- `MPI_OpenMP实验分析报告.md` - Detailed analysis report
- `实验总结.md` - Summary of key findings
## Key Findings
### Experiment 1: MPI Process Scaling
- **Optimal configuration**: 6 MPI processes
- **Efficiency**: 75%-89% for 1-6 processes
- **Performance bottleneck**: Communication overhead increases significantly beyond 6 processes
### Experiment 2: MPI+OpenMP Configuration
- **Optimal configuration**: 4×4 (4 MPI processes × 4 OpenMP threads)
- **Superlinear speedup**: Achieved for large matrices (4096×4096) with 107% efficiency
- **Key insight**: Balance between node-level (MPI) and node-internal (OpenMP) parallelism is crucial
### Experiment 3: Optimization Results
- **Performance improvement**: 1.1-2.3x speedup
- **Optimization techniques**:
- Loop tiling (64×64 blocks)
- Loop unrolling
- Memory access optimization
- **Best result**: 4×4 configuration achieves 107% efficiency for 4096×4096 matrix
## Recommendations
### Configuration Selection
- **Small matrices (<1024)**: 2×2 or 4×2 configuration
- **Medium matrices (1024-2048)**: 4×4 configuration
- **Large matrices (>2048)**: 4×4 or 8×2 configuration
### Avoid
- 1×N configurations (too few MPI processes)
- N×1 configurations (too few OpenMP threads)
- Excessive total processes (>48)
## Running the Analysis
```bash
cd /home/yly/dev/hpc-lab-code/work
python3 analyze_mpi_openmp.py
```
## Requirements
- Python 3.x
- pandas
- matplotlib
- numpy
## Notes
- All figures have been regenerated with English labels
- Font: DejaVu Sans (supports all characters)
- Resolution: 300 DPI for publication quality