2026-01-22 04:31:52 +08:00
..
2026-01-22 04:31:52 +08:00
2026-01-22 04:31:52 +08:00
2026-01-22 04:31:52 +08:00
2026-01-22 04:31:52 +08:00
2026-01-22 04:31:52 +08:00
2026-01-21 18:30:58 +08:00
2026-01-21 18:30:58 +08:00
2026-01-21 18:30:58 +08:00
2026-01-22 04:31:52 +08:00
2026-01-22 04:31:52 +08:00
2026-01-22 04:31:52 +08:00
2026-01-22 04:30:04 +08:00
2026-01-22 04:31:52 +08:00

MPI+OpenMP Hybrid Parallel Matrix Multiplication Experiments

Overview

This document summarizes the experimental analysis of MPI+OpenMP hybrid parallel matrix multiplication performance.

Generated Files

Analysis Scripts

  • analyze_mpi_openmp.py - Python script for data analysis and visualization

Figures (All labels in English)

  1. experiment1_analysis.png - Experiment 1: Varying MPI Processes (OpenMP threads=1)

    • Execution Time vs MPI Processes
    • Speedup vs MPI Processes
    • Parallel Efficiency vs MPI Processes
    • Parallel Efficiency Heatmap
  2. experiment2_analysis.png - Experiment 2: Varying Both MPI and OpenMP

    • Efficiency Comparison (Total Processes=16)
    • Best Configuration Efficiency vs Matrix Size
    • MPI Process Impact on Efficiency
    • Speedup Comparison for Different Configurations
  3. experiment3_analysis.png - Experiment 3: Optimization Results

    • Execution Time Comparison (Before/After)
    • Efficiency Comparison (Before/After)
    • Optimization Effect for Different Matrix Sizes
    • Best Configuration Efficiency Comparison

Data Files

  • experiment_results.csv - Complete experimental data
  • serial_results.csv - Serial baseline performance

Reports (in Chinese)

  • MPI_OpenMP实验分析报告.md - Detailed analysis report
  • 实验总结.md - Summary of key findings

Key Findings

Experiment 1: MPI Process Scaling

  • Optimal configuration: 6 MPI processes
  • Efficiency: 75%-89% for 1-6 processes
  • Performance bottleneck: Communication overhead increases significantly beyond 6 processes

Experiment 2: MPI+OpenMP Configuration

  • Optimal configuration: 4×4 (4 MPI processes × 4 OpenMP threads)
  • Superlinear speedup: Achieved for large matrices (4096×4096) with 107% efficiency
  • Key insight: Balance between node-level (MPI) and node-internal (OpenMP) parallelism is crucial

Experiment 3: Optimization Results

  • Performance improvement: 1.1-2.3x speedup
  • Optimization techniques:
    • Loop tiling (64×64 blocks)
    • Loop unrolling
    • Memory access optimization
  • Best result: 4×4 configuration achieves 107% efficiency for 4096×4096 matrix

Recommendations

Configuration Selection

  • Small matrices (<1024): 2×2 or 4×2 configuration
  • Medium matrices (1024-2048): 4×4 configuration
  • Large matrices (>2048): 4×4 or 8×2 configuration

Avoid

  • 1×N configurations (too few MPI processes)
  • N×1 configurations (too few OpenMP threads)
  • Excessive total processes (>48)

Running the Analysis

cd /home/yly/dev/hpc-lab-code/work
python3 analyze_mpi_openmp.py

Requirements

  • Python 3.x
  • pandas
  • matplotlib
  • numpy

Notes

  • All figures have been regenerated with English labels
  • Font: DejaVu Sans (supports all characters)
  • Resolution: 300 DPI for publication quality