This project explores efficient matrix multiplication on FPGA hardware, focusing on optimizing computation speed using serial, parallel, and pipelined processing methods. Communication between the PC and FPGA is implemented through UART or Ethernet.
- U.L. Mohamed Abshar Shihab
- Imad ud Din
- Kanwar M. Umar
- Establish UART communication between the PC and FPGA.
- Implement and test 2x2 upto 10x10 matrix multiplication on the FPGA using Verilog.
- Optimize matrix multiplication using pipelining for increased computational speed.
The-fast-matrix-multiplication-on-fpga/
βββ 2x2 Matrix_Multiplication/
βββ Uart_Testing/
βββ complete-matrix-multiplication/
β βββ 16-Bit-Transmitter/
β β βββ Outputs/
β β β βββ 16bit-2x2.png
β β β βββ 16bit-3x3.png
β β βββ Python Code/
β β β βββ 16bit.py
β β βββ BaudRateGenerator.v
β β βββ Uart8Receiver.v
β β βββ Uart8Transmitter.v
β β βββ Uart8_Matrix.v
β β βββ UartStates.vh
β β βββ ucf.ucf
β βββ Matrix-Multiplication-with-pipeline/
β β βββ Outputs/
β β β βββ 3x3-p.png
β β β βββ 4x4-p.png
β β β βββ 5x5-p.png
β β β βββ 10x10-p.png
β β βββ Python Code/
β β β βββ upto_Ten_Time.py
β β βββ BaudRateGenerator.v
β β βββ Uart8Receiver.v
β β βββ Uart8Transmitter.v
β β βββ Uart8_pip.v
β β βββ UartStates.vh
β β βββ ucf.ucf
β βββ Matrix_Multiplication/
β β βββ Outputs/
β β β βββ 3x3.png
β β β βββ 4x4.png
β β β βββ 5x5.png
β β β βββ 10x10.png
β β βββ Python Code/
β β β βββ upto_Ten_Time.py
β β βββ BaudRateGenerator.v
β β βββ Uart8Receiver.v
β β βββ Uart8Transmitter.v
β β βββ Uart8_Matrix.v
β β βββ UartStates.vh
β β βββ ucf.ucf
βββ README.md
- Clone the repository:
git clone https://github.com/Abshar-Shihab/The-fast-matrix-multiplication-on-fpga.git
- Navigate to the appropriate folder for the desired module.
- Follow the Python script in each folder to test the corresponding functionality.
- Use
16bit.pyfor 16-bit transmitter tests. - Use
upto_Ten_Time.pyfor matrix multiplication tests.
- Use
- Analyze the output images in the
Outputsfolder to validate the results.
- Explore larger matrix sizes and improve hardware utilization.
- Compare FPGA performance against CPU and GPU implementations.