TopOpt-CUDA

CUDA-based 3D topology optimization solver with multigrid preconditioned conjugate gradient method.

Features

Matrix-free stencil operators for memory-efficient computation
Multigrid V-cycle preconditioner for faster convergence
Conjugate gradient solver with diagonal preconditioning
Multi-GPU support for large-scale problems
Double precision computation for numerical stability
VTK output for visualization in ParaView

Requirements

NVIDIA GPU with CUDA Compute Capability 8.9+ (tested on RTX 4090)
CUDA Toolkit 12.2+
NVCC compiler
Make

Compilation

make clean
make

This will generate the executable top3d_cuda.

Usage

./top3d_cuda <nelx> <nely> <nelz> <num_mg_levels>

Parameters:

nelx, nely, nelz: Number of elements in X, Y, Z directions
num_mg_levels: Number of multigrid levels (typically 4)

Example:

# 16x8x8 problem with 4 multigrid levels
./top3d_cuda 16 8 8 4

# 64x32x32 problem
./top3d_cuda 64 32 32 4

Output

The solver generates VTK files for visualization:

result_<nelx>_<nely>_<nelz>.vtk - Optimized density field

Open the VTK file in ParaView to visualize the optimized structure.

Algorithm

Topology Optimization

Method: SIMP (Solid Isotropic Material with Penalization)
Optimizer: Optimality Criteria (OC) method
Volume fraction: 12%
Penalty factor: 3.0
Filter radius: 1.5

Linear Solver

Method: Preconditioned Conjugate Gradient (PCG)
Preconditioner: Diagonal (Jacobi)
Convergence tolerance: 1e-5
Max iterations: 500

Multigrid

Levels: 4 (configurable)
Smoother: Damped Jacobi (ω = 0.6)
Coarse grid: 200 Jacobi iterations
Restriction: Full-weighting
Prolongation: Trilinear interpolation

Implementation Details

Element and Node Indexing

Elements: 1-indexed, range [1, nelx] × [1, nely] × [1, nelz]
Nodes: 1-indexed, range [1, nelx+1] × [1, nely+1] × [1, nelz+1]
Element index: i * (wrapy-1) * (wrapz-1) + k * (wrapy-1) + j
Node index: i * wrapy * wrapz + k * wrapy + j

Boundary Conditions

Fixed: Left face (i=1), all DOFs = 0
Load: Right face edge (i=nelx, k=1, j=1 to nely+1), force in -Z direction

Stencil Computation

8-node hexahedral elements with 24 DOFs per element
27-point stencil for each node (8 neighboring elements)
Material interpolation: E(ρ) = Emin + ρ³(E0 - Emin)

Comparison with OpenMP Version

This CUDA implementation is based on the OpenMP multi-GPU version but with some key differences:

Feature	OpenMP Version	CUDA Version
Coarse grid solver	CHOLMOD (direct)	Jacobi (iterative)
CG iterations	5-20 per step	200-230 per step
Precision	Mixed (float/double)	Double precision
Final compliance (16x8x8)	~8,235	~6,447

Known Differences

Coarse grid solver: The OpenMP version uses CHOLMOD sparse direct solver for the coarsest multigrid level, while the CUDA version uses 200 Jacobi iterations. This leads to:
- Different convergence behavior
- Different optimization paths
- Different final structures
Numerical precision: CUDA version uses double precision for all computations to improve stability.
Results: Both versions produce valid optimized structures, but they may differ due to the different coarse grid solvers and resulting optimization paths.

Project Structure

TopOpt-CUDA/
├── src/                    # Source files
│   ├── main.cu            # Main entry point
│   ├── solver.cu          # Solver initialization
│   ├── cg_solver.cu       # CG solver with multigrid
│   ├── optimization.cu    # Topology optimization loop
│   ├── stencil_kernels.cu # Matrix-free stencil operators
│   ├── multigrid_kernels.cu # Restriction/prolongation
│   ├── optimization_kernels.cu # Compliance/sensitivity
│   ├── vector_ops.cu      # Vector operations
│   ├── vtk_output.cu      # VTK file generation
│   ├── multi_gpu.cu       # Multi-GPU support
│   ├── halo_exchange.cu   # GPU communication
│   ├── distributed_cg.cu  # Distributed CG solver
│   └── distributed_stencil.cu # Distributed stencil
├── include/               # Header files
│   ├── definitions.h      # Type definitions
│   ├── solver.h          # Solver data structures
│   ├── cuda_kernels.cuh  # Kernel declarations
│   └── multi_gpu.h       # Multi-GPU declarations
├── docs/                  # Documentation
├── Makefile              # Build configuration
└── README.md             # This file

Performance

Tested on dual NVIDIA RTX 4090 GPUs:

Problem Size	Elements	DOFs	Time/Iteration	Memory
16×8×8	1,024	11,286	~0.5s	<1 GB
64×32×32	65,536	295,470	~2s	~2 GB
128×64×64	524,288	2,146,689	~15s	~10 GB

Future Improvements

Implement cuSOLVER direct solver for coarse grid to match OpenMP version performance
Optimize memory access patterns for better GPU utilization
Add support for different boundary conditions and load cases
Implement continuation method for better convergence
Add support for stress constraints and multiple materials

References

Andreassen, E., Clausen, A., Schevenels, M., Lazarov, B. S., & Sigmund, O. (2011). Efficient topology optimization in MATLAB using 88 lines of code. Structural and Multidisciplinary Optimization, 43(1), 1-16.
Aage, N., Andreassen, E., & Lazarov, B. S. (2015). Topology optimization using PETSc: An easy-to-use, fully parallel, open source topology optimization framework. Structural and Multidisciplinary Optimization, 51(3), 565-572.
Liu, H., Zong, H., Shi, T., & Xia, Q. (2020). M-VCUT level set method for optimizing cellular structures. Computer Methods in Applied Mechanics and Engineering, 367, 113154.

License

This project is for research and educational purposes.

Acknowledgments

Based on the OpenMP multi-GPU topology optimization framework
Developed with assistance from Claude (Anthropic)

Contact

For questions or issues, please open an issue on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TopOpt-CUDA

Features

Requirements

Compilation

Usage

Output

Algorithm

Topology Optimization

Linear Solver

Multigrid

Implementation Details

Element and Node Indexing

Boundary Conditions

Stencil Computation

Comparison with OpenMP Version

Known Differences

Project Structure

Performance

Future Improvements

References

License

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
include		include
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

TopOpt-CUDA

Features

Requirements

Compilation

Usage

Output

Algorithm

Topology Optimization

Linear Solver

Multigrid

Implementation Details

Element and Node Indexing

Boundary Conditions

Stencil Computation

Comparison with OpenMP Version

Known Differences

Project Structure

Performance

Future Improvements

References

License

Acknowledgments

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages