Comparative Performance Analysis of Strassen's Matrix Multiplication Across Programming Languages

Project Abstract

This project serves as a comprehensive benchmarking study to investigate the intricate relationship between programming language choice, algorithm implementation strategy, and computational performance in the context of Strassen's Matrix Multiplication. Utilizing a phased approach, we first evaluate "pure" language implementations (C++, Python, R, Matlab) to isolate inherent runtime and data structure overheads. Subsequently, we introduce highly optimized numerical libraries (NumPy for Python, Eigen for C++) to assess their transformative impact on performance. The study culminates in a data-driven analysis of execution times, speedup factors, and a detailed performance ranking, providing critical insights into optimizing numerical computations for Data Science and Machine Learning applications.

1. Introduction: The Research Question

The efficient execution of linear algebra operations, particularly matrix multiplication, is fundamental to a vast array of computational tasks in data science, machine learning, scientific computing, and computer graphics. The choice of programming language and implementation strategy can profoundly impact the performance of these operations. This research project aims to answer the question: "Does the choice of a programming language, and the strategy for algorithm implementation (from-scratch vs. library-optimized), significantly influence the performance of Strassen's Matrix Multiplication?"

By systematically comparing diverse implementations, this study seeks to quantify these performance differences, elucidate their underlying causes, and provide actionable insights for practitioners.

2. The Algorithm: Strassen's Matrix Multiplication

Strassen's algorithm is a classic divide-and-conquer algorithm for matrix multiplication. It improves upon the conventional naive algorithm, which has a time complexity of $O(N^3)$ for multiplying two $N \times N$ matrices. Strassen's algorithm recursively divides matrices into smaller sub-matrices and strategically reduces the number of recursive matrix multiplications from eight to seven (at the expense of more matrix additions and subtractions). This leads to a superior asymptotic time complexity of $O(N^{\log_2 7})$, which approximates $O(N^{2.807})$. This theoretical advantage suggests that for sufficiently large matrices, Strassen's algorithm should outperform the naive approach.

3. Research Methodology

This benchmarking exercise is structured into two main phases, with consistent experimental parameters for reliable comparison.

3.1. Phase 1: Pure Language Implementations (Benchmarking Fundamentals)

Objective: To evaluate the raw computational efficiency of each programming language's native data structures and core operations, without reliance on external specialized numerical libraries. This phase aims to highlight the inherent overheads and optimizations within each language's runtime environment.

Languages and Matrix Representations:

C++: Matrices are represented using std::vector<std::vector<double>>. All matrix arithmetic (addition, subtraction, splitting, joining) is implemented manually with explicit nested loops. No external numerical libraries (e.g., Eigen, Armadillo) are used.
Python: Matrices are represented using list of lists. All matrix arithmetic is implemented manually with explicit nested loops. No external numerical libraries (e.g., NumPy) are used.
R: Matrices utilize R's native matrix type. Matrix arithmetic leverages R's built-in operators (+, -, * for scalar multiplication, %*% for naive matrix multiplication), which are internally implemented in highly optimized C/Fortran routines (often utilizing BLAS/LAPACK libraries). No external R packages (beyond base R) are explicitly used.
Matlab: Matrices utilize Matlab's native double array type. Matrix arithmetic leverages Matlab's built-in operators (+, -, *), which are also internally implemented in highly optimized C/Fortran routines (typically utilizing BLAS/LAPACK libraries). No external toolboxes are explicitly used.

Benchmarking Protocol:

Matrix Sizes (N): [2, 4, 8, 16, 32, 64]. All dimensions are powers of 2 to simplify Strassen's recursion by avoiding padding requirements.
Iterations: 10 runs for each matrix size.
Input Data: Random floating-point numbers between 0.0 and 10.0 are generated for each matrix in every iteration to ensure statistical robustness and mitigate effects of specific data patterns.
Timing: High-resolution timers specific to each language are employed (std::chrono in C++, time.perf_counter() in Python, system.time() in R, tic/toc in Matlab).
Metrics: For each matrix size, the Total Time (s) for all 10 iterations and the Average Time (nanoseconds) per single iteration (rounded to the nearest whole number) are reported.
Extended Test: A single run for a 128x128 matrix is performed at the end of each script to observe performance at a slightly larger scale.

3.2. Phase 2: Library-Optimized Implementations (Benchmarking Real-World Performance)

Objective: To quantify the performance gains achieved by integrating highly optimized, battle-tested numerical libraries into the Strassen's algorithm, reflecting real-world data science practices. This phase also allows for a direct comparison of Strassen's (library-optimized) against standard (naive) matrix multiplication provided by these libraries.

Libraries and Matrix Representations:

Python (Optimized): Matrices are numpy.ndarray objects. All matrix operations (addition, subtraction, slicing, blocking, naive multiplication) are performed using NumPy's optimized functions and operators (+, -, slicing, np.block, np.dot or @).
C++ (Optimized): Matrices are Eigen::MatrixXd objects. All matrix operations are performed using Eigen's optimized functions and operator overloads (+, -, .block(), *).
R & Matlab: For these languages, their "unoptimized" implementations from Phase 1 inherently leverage optimized C/Fortran core routines, making them functionally equivalent to their "library-optimized" versions for basic matrix operations. Therefore, their Phase 1 results are carried forward for comparison in this "optimized" context.

Benchmarking Protocol:

Matrix Sizes (N): [2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192] (with C++/Python Strassen capped at N=64 due to recursive overhead).
Iterations: 10 runs per matrix size.
Input Data: Random floating-point numbers (0.0-10.0).
Metrics: Average time per iteration (seconds) for both Naive Matrix Multiplication (library-optimized) and Strassen's Matrix Multiplication (library-optimized).
Extended Test: Single runs for larger matrix sizes are included where practical.

4. Experimental Setup

The benchmarks were executed on a personal computing environment. Specific hardware and software details:

Operating System: Windows 11
Processor: Intel Core i5-8250U
RAM: 16GB
C++ Compiler: MSVC via Visual Studio
Python Version: 3.10.11 with NumPy 3.10.8
R Version: 4.5.2
Matlab Version: R2025b
Eigen Library Version: [e.g., 3.4.0]

5. Repository Structure

. ├── cpp/ │ ├── Strassens_algorithm.cpp # C++ Pure Language Strassen's │ └── Matmult_Eigen.cpp # C++ Eigen-optimized Naive & Strassen's ├── python/ │ ├── Strassens_Algorithm_Matmult_Python.py # Pure Python list of lists Strassen's │ └── Matmult_Numpy.py # Python NumPy-optimized Naive & Strassen's ├── r/ │ └── Strassens_MatMult.R # R native matrix Strassen's ├── matlab/ │ └── Strassens_Matmult_Matlab.m # Matlab native array Strassen's ├── results/ # Directory for saved plots (performance_unoptimized.png, etc.) │ └── ... └── README.md # This file code Code

6. Results

6.1. Combined Average Execution Times (seconds per iteration)

The following table summarizes the average execution time (in seconds) for a single iteration across all implemented strategies and matrix sizes. Note: 'N/A' indicates that the benchmark was not performed or was capped due to prohibitive runtime for larger sizes.

Matrix Size (N)	R (Strassen Unopt)	Python (Strassen Unopt)	Matlab (Strassen Unopt)	C++ (Strassen Unopt)	Python (NumPy Naive)	Python (NumPy Strassen)	C++ (Eigen Naive)	C++ (Eigen Strassen)
2	0.017000	0.002622	0.000594	0.000245	0.000074	0.000618	0.000012	0.000151
4	0.004000	0.014576	0.000447	0.002110	0.000033	0.005483	0.000013	0.000593
8	0.002000	0.069940	0.000769	0.018181	0.000047	0.036853	0.000083	0.003745
16	0.041000	0.529643	0.003499	0.098053	0.000062	0.239787	0.000268	0.032549
32	0.296000	3.772028	0.016803	0.562704	0.000052	1.559668	0.001911	0.228921
64	2.051000	26.513940	0.114371	4.878092	0.000089	9.369346	0.012637	1.434813
128	14.500000	188.065743	0.816692	27.322600	0.000051	N/A	0.106479	N/A
256	N/A	N/A	N/A	N/A	0.000110	N/A	0.889357	N/A
512	N/A	N/A	N/A	N/A	0.000483	N/A	6.543643	N/A
1024	N/A	N/A	N/A	N/A	0.004134	N/A	57.168115	N/A
2048	N/A	N/A	N/A	N/A	0.028420	N/A	N/A	N/A
4096	N/A	N/A	N/A	N/A	0.166775	N/A	N/A	N/A
8192	N/A	N/A	N/A	N/A	1.311303	N/A	N/A	N/A

6.2. Visualizations

The performance data is visualized to highlight key trends and comparisons, available as plot images

Detailed Ranking at N=64 (Fastest to Slowest): Python (NumPy Naive) 0.000089 C++ (Eigen Naive) 0.012637 Matlab (Strassen Unoptimized) 0.114371 C++ (Eigen Strassen) 1.434813 R (Strassen Unoptimized) 2.051000 Python (NumPy Strassen) 9.369346 C++ (Strassen Unoptimized) 4.878092 Python (Strassen Unoptimized) 26.513940

Speedup Factors (Relative to the Slowest: Python Unoptimized Strassen at N=64): Python (NumPy Naive) 297909x C++ (Eigen Naive) 2098x Matlab (Strassen Unoptimized) 231x R (Strassen Unoptimized) 12x C++ (Eigen Strassen) 5x Python (NumPy Strassen) 2x C++ (Strassen Unoptimized) 1x Python (Strassen Unoptimized) 1x

7. Discussion and Interpretation of Results

The empirical results from this benchmarking exercise provide compelling evidence regarding the impact of language choice and implementation strategy on the performance of numerical algorithms.

The Transformative Power of Optimized Numerical Libraries: The most striking finding is the extraordinary performance gain achieved by integrating specialized numerical libraries (NumPy, Eigen). Python with NumPy, and C++ with Eigen, demonstrated speedups of thousands to hundreds of thousands of times (or even millions for NumPy Naive) compared to their respective pure-language counterparts. This underscores that for any serious numerical computation in Data Science and Machine Learning, leveraging these highly optimized (often C/Fortran-backed) libraries is an absolute necessity.
The Nuance of "Native" Performance: Our Phase 1 results revealed a surprising initial hierarchy: Matlab > R > C++ > Python.
- Matlab and R's leading performance in Phase 1 (without explicit external libraries like NumPy/Eigen) is attributed to their fundamental design. Their core matrix types and built-in operators are already heavily optimized and backed by high-performance C/Fortran BLAS/LAPACK routines. Their JIT compilers further enhance this. This means their "native" approach is their optimized approach for matrix arithmetic.
- C++ (std::vector<std::vector<double>>) was surprisingly slower than Matlab and R. While C++ is a compiled language, this implementation suffered from memory fragmentation (non-contiguous std::vectors) and the overhead of numerous dynamic allocations/deallocations for intermediate matrices during recursion. The manual loop-based arithmetic could not compete with the highly tuned BLAS/LAPACK calls implicit in R/Matlab's core.
- Pure Python (list of lists) was, as expected, the slowest by several orders of magnitude, due to its interpreted nature, high object overhead for individual floats, and fragmented memory.
Strassen's Practical Crossover Point and Overhead: Despite its theoretical asymptotic advantage ($O(N^{2.807})$ vs. $O(N^3)$), our recursive Strassen's implementations (even with NumPy/Eigen) generally did not outperform, or were only marginally faster (e.g., Python NumPy Strassen vs. Python NumPy Naive at N=64, C++ Eigen Strassen vs. C++ Eigen Naive at N=64), than the highly optimized naive matrix multiplication provided directly by NumPy and Eigen. This is a common phenomenon:
- Optimized libraries (like NumPy and Eigen) often use extremely sophisticated naive algorithms (e.g., cache-aware blocking, instruction-level parallelism, direct BLAS calls) that dramatically reduce the constant factor of the $O(N^3)$ complexity.
- The recursive overhead (function calls, memory management for temporary matrices) of our Strassen's implementation, even when using library primitives, can outweigh its asymptotic benefits for typical matrix sizes (up to N=1024 or even higher). True practical crossover points for Strassen's often occur at much larger N (e.g., >2000, >4000) or require specialized, heavily tuned Strassen-like algorithms implemented deep within the libraries themselves.

8. Conclusion

This benchmarking exercise unequivocally demonstrates that the choice of programming language and, more critically, the underlying implementation strategy for numerical operations, profoundly influences algorithm performance. For matrix multiplication, relying on low-level, highly optimized numerical libraries (like NumPy and Eigen) or languages with C/Fortran-backed core primitives (like R and Matlab) is paramount for achieving high performance. Pure language implementations, especially in Python and C++ using standard containers, introduce significant overheads that render them impractical for large-scale numerical computations. This project provides a robust empirical foundation for understanding these trade-offs, which is indispensable for designing efficient computational solutions in data science, machine learning, and scientific computing.

9. Future Work and Next Steps

This foundational analysis opens several avenues for continued research:

AlphaTensor Integration (Phase 3): Implement and benchmark a published AlphaTensor-derived matrix multiplication decomposition for small base cases, utilizing the optimized libraries. This would provide a comparison against both Strassen's and naive library-optimized methods.
Advanced Strassen's Optimizations: Investigate and implement techniques to reduce Strassen's recursive overhead (e.g., dynamic crossover points to switch to naive for small sub-problems, in-place operations, custom memory pools).
Linear Algebra Practice: Expand the project to include benchmarking of other fundamental linear algebra operations (e.g., matrix-vector products, transpose, determinants, inverses, eigenvalues using iterative methods) across the optimized language environments to deepen understanding of their computational aspects relevant to AI/ML.
Hardware Considerations: Explore the impact of different hardware architectures (e.g., GPUs) and parallelization strategies on performance.

10. How to Reproduce

To reproduce the benchmark results and plots:

Clone the repository:

git clone https://github.com/your-username/Strassens-Performance.git
cd Strassens-Performance

Ensure necessary tools are installed:
- C++ compiler
- Python 3.x with pandas, numpy, matplotlib
- R environment
- Matlab environment
- Eigen Library: Download Eigen
Run Benchmarks:
- C++ Pure: cd cpp && g++ Strassens_algorithm.cpp -o strassens_cpp -O3 -std=c++17 && ./strassens_cpp
- C++ Eigen: cd cpp && g++ Matmult_Eigen.cpp -o strassens_eigen_cpp -O3 -std=c++17 -I/path/to/eigen/parent/directory && ./strassens_eigen_cpp
- Python Pure: cd python && python Strassens_Algorithm_Matmult_Python.py
- Python NumPy: cd python && python Matmult_Numpy.py
- R: cd r && Rscript Strassens_MatMult.R
- Matlab: Open Matlab, navigate to matlab/ directory, and run Strassens_Matmult_Matlab in the command window.
Generate Plots:
- The plotting code is provided as a Jupyter Notebook visualizing-implementations.ipynb.
- Open this notebook in Jupyter Lab/Notebook and run all cells. The plots will be displayed and saved to the results/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Execution_Time.png		Execution_Time.png
Implementation_Detailed.png		Implementation_Detailed.png
Implementation_Strassen.py		Implementation_Strassen.py
Implementation_Strassen.pyproj		Implementation_Strassen.pyproj
Language_Pure_plot.png		Language_Pure_plot.png
Matmult_Eigen.cpp		Matmult_Eigen.cpp
Matmult_Numpy.py		Matmult_Numpy.py
Matmult_Numpy.pyproj		Matmult_Numpy.pyproj
Optimized_2_numpy_py_output.png		Optimized_2_numpy_py_output.png
README.md		README.md
Ranking.png		Ranking.png
Strassen_Output_cpp.png		Strassen_Output_cpp.png
Strassen_Output_m.png		Strassen_Output_m.png
Strassen_Output_py.png		Strassen_Output_py.png
Strassen_Output_r.png		Strassen_Output_r.png
Strassens_Algorithm_Matmult_Python.py		Strassens_Algorithm_Matmult_Python.py
Strassens_Algorithm_Matmult_Python.pyproj		Strassens_Algorithm_Matmult_Python.pyproj
Strassens_Matmult.R		Strassens_Matmult.R
Strassens_Matmult_Matlab (1).m		Strassens_Matmult_Matlab (1).m
Strassens_algorithm.cpp		Strassens_algorithm.cpp
improvement.png		improvement.png
improvement_c++.png		improvement_c++.png
optimized_eigen_cpp_output.png		optimized_eigen_cpp_output.png
output_performance_impl.png		output_performance_impl.png
visualizing-implementations.ipynb		visualizing-implementations.ipynb
visualizing-implementations.ipynb.txt		visualizing-implementations.ipynb.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparative Performance Analysis of Strassen's Matrix Multiplication Across Programming Languages

Project Abstract

1. Introduction: The Research Question

2. The Algorithm: Strassen's Matrix Multiplication

3. Research Methodology

3.1. Phase 1: Pure Language Implementations (Benchmarking Fundamentals)

3.2. Phase 2: Library-Optimized Implementations (Benchmarking Real-World Performance)

4. Experimental Setup

5. Repository Structure

6. Results

6.1. Combined Average Execution Times (seconds per iteration)

6.2. Visualizations

7. Discussion and Interpretation of Results

8. Conclusion

9. Future Work and Next Steps

10. How to Reproduce

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Comparative Performance Analysis of Strassen's Matrix Multiplication Across Programming Languages

Project Abstract

1. Introduction: The Research Question

2. The Algorithm: Strassen's Matrix Multiplication

3. Research Methodology

3.1. Phase 1: Pure Language Implementations (Benchmarking Fundamentals)

3.2. Phase 2: Library-Optimized Implementations (Benchmarking Real-World Performance)

4. Experimental Setup

5. Repository Structure

6. Results

6.1. Combined Average Execution Times (seconds per iteration)

6.2. Visualizations

7. Discussion and Interpretation of Results

8. Conclusion

9. Future Work and Next Steps

10. How to Reproduce

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages