This package provides a complete, working example of deploying a trained Akida neural network model as a standalone C++ application on a Linux host. It bridges the gap between Python-based model training and production C++ deployment without requiring Python at runtime.
- Complete C++ inference pipeline (
src/inference.cpp) with detailed comments explaining each API call - Linux system implementation (
src/system_linux.cpp) for required runtime hooks - Model conversion script (
scripts/convert_model.py) to transform.fbzmodels to C++ binaries - Test input generator (
scripts/generate_test_input.py) for validation against Python results - Ready-to-build CMake configuration (
src/CMakeLists.txt)
You have:
- ✅ A trained Keras model converted to Akida format (
.fbzfile) - ✅ Validated the model works in Python using the Akida SDK
- ✅ An AKD1000 or AKD1500 device connected to a Linux host via PCIe
You want:
- 🎯 A standalone C++ application that runs inference without Python
- 🎯 Production-ready code you can integrate into your existing C++ codebase
- 🎯 Full control over the inference pipeline (batching, latency measurement, pre/post-processing)
Hardware:
- AKD1000 or AKD1500 connected via PCIe to a Linux host (Ubuntu/Debian recommended)
- PCIe driver installed and device visible
Software:
- CMake 3.16 or later
- C++17 compatible compiler (GCC 7+, Clang 5+)
- Python 3.7+ with
akidapackage (only needed for model conversion, not runtime) - Trained Akida model file (
.fbzformat)
Verification Steps:
# 1. Check PCIe driver is loaded
lsmod | grep akida
# Should show: akida_dw_edma
# 2. Verify device is visible
akida devices
# Should list at least one device
# 3. Check permissions
ls -l /dev/akida*
# You should have read/write access (or run as root)The Akida Engine is the C++ library that interfaces with the hardware. Deploy it using the Python SDK:
# Create a working directory
mkdir akida_deployment && cd akida_deployment
# Deploy the engine with host examples
akida engine deploy --dest-path . --with-host-examplesThis creates an engine/ directory containing:
api/- C++ API headershost/- Host-specific drivers (PCIe)infra/- Infrastructure interfacescmake/- Build system modulestest/- Example programslibakida.a- Static library
Verify the deployment:
# Build and run the included example
cd engine/test/simple_conv_v2
mkdir build
cmake . -B build
make -C build
./build/simple_conv_v2
# Should output: "Success!"
cd ../../.. # Return to akida_deployment/Copy this customer_deployment_package/ directory into engine/test/:
# Assuming you received this package as a zip file
unzip customer_deployment_package.zip
cp -r customer_deployment_package engine/test/Why this location? The CMake configuration uses relative paths to find the engine's build modules. Placing the package at engine/test/customer_deployment_package/ ensures these paths resolve correctly.
Your directory structure should now look like:
akida_deployment/
└── engine/
├── api/
├── host/
├── infra/
├── cmake/
├── libakida.a
└── test/
├── simple_conv_v2/
└── customer_deployment_package/ ← This package
├── README.md
├── src/
└── scripts/
Use the provided Python script to convert your trained .fbz model to C++ source files:
cd engine/test/customer_deployment_package/scripts
# Convert your model
python convert_model.py --model-path /path/to/your/model.fbz
# This generates:
# ../src/program.h - Header declaring the model binary
# ../src/program.cpp - Source containing the model dataWhat happens during conversion:
- Loads your
.fbzmodel using the Akida Python SDK - Discovers the connected Akida device
- Maps the model to hardware (generates the binary program)
- Exports the program as C++ arrays (
program[]andprogram_len)
Expected output:
===========================================================
Akida Model to C++ Converter
===========================================================
Using device: NSoC/2.0
Loading model: /path/to/model.fbz
Mapping model to hardware...
Model mapped successfully
Generating C++ files in: ../src/
✓ Created: ../src/program.h
✓ Created: ../src/program.cpp
Program size: 524288 bytes
===========================================================
Conversion Summary
===========================================================
Model file: /path/to/model.fbz
Input shape: (32, 32, 3)
Output shape: (1, 1, 10)
...
cd .. # Return to customer_deployment_package/
# Create build directory and compile
mkdir build
cmake src/ -B build
make -C buildExpected output:
-- Configuring Akida Inference Demo
-- Executable: akida_inference
-- Engine root: /path/to/engine
-- CMake module path: /path/to/engine/cmake
...
[ 25%] Building CXX object CMakeFiles/akida_inference.dir/main.cpp.o
[ 50%] Building CXX object CMakeFiles/akida_inference.dir/inference.cpp.o
[ 75%] Building CXX object CMakeFiles/akida_inference.dir/system_linux.cpp.o
[100%] Building CXX object CMakeFiles/akida_inference.dir/program.cpp.o
[100%] Linking CXX executable akida_inference
./build/akida_inferenceExpected output:
======================================
Akida C++ Inference Demo
======================================
Discovering Akida devices...
Found 1 Akida device(s)
Using device: NSoC/2.0
Starting inference pipeline...
=== Step 1: Creating Hardware Device ===
Device created successfully
Device description: NSoC/2.0
=== Step 2: Programming Device ===
Program size: 524288 bytes
Device programmed successfully
Input shape: [32, 32, 3]
Output shape: [1, 1, 10]
Input format: Dense
Activation enabled: Yes
=== Step 3: Setting Batch Size ===
Batch size set to 1
=== Step 4: Preparing Input Tensor ===
Input data prepared: 3072 elements
=== Step 5: Convert to Sparse format if required ===
Model accepts dense input - no conversion needed
=== Step 6: Enqueueing Input ===
Input enqueued successfully
=== Step 7: Fetching Results ===
Results fetched successfully
=== Step 8: Dequantizing Output ===
Output dequantized to float
=== Step 9: Results ===
Output shape: [1, 1, 10]
Output dtype: float32
First 10 output values:
[0]: 0.123456
[1]: 0.234567
...
[9]: 0.987654
Predicted class: 7 (score: 0.987654)
=== Inference Complete ===
======================================
Inference completed successfully!
======================================
| File | Purpose |
|---|---|
main.cpp |
Entry point - discovers Akida devices via PCIe and calls run_inference() |
inference.h |
Header declaring the run_inference() function |
inference.cpp |
Core file - implements the complete inference pipeline with detailed comments |
system_linux.cpp |
Linux implementation of required runtime hooks (msleep, time_ms, panic, kick_watchdog) |
program.h |
Generated by convert_model.py - declares the model binary |
program.cpp |
Generated by convert_model.py - contains the model binary data |
CMakeLists.txt |
Build configuration |
The core pipeline follows these steps:
std::unique_ptr<akida::HardwareDevice> device = akida::HardwareDevice::create(driver);Wraps the low-level driver with the high-level API.
akida::ProgramInfo program_info = device->program(program, program_len);Loads the neural network model onto the chip. The ProgramInfo returned contains critical metadata about input/output shapes, data types, and memory requirements.
device->set_batch_size(1, false);Configures how many samples are processed per inference call. Most deployment scenarios use batch size 1.
auto dense_input = akida::Dense::create_view(
input_data.data(),
akida::Shape{height, width, channels},
akida::DType::uint8
);Wraps your input data in an akida::Tensor. Note that create_view() does not copy data - it creates a lightweight wrapper.
if (!program_info.input_is_dense()) {
auto sparse_input = akida::conversion::to_sparse(*dense_input);
input_tensor = std::move(sparse_input);
}Some models require sparse input format. Check program_info.input_is_dense() and convert if necessary.
device->enqueue(*input_tensor);Sends the input to the device and triggers inference. This is non-blocking.
std::unique_ptr<akida::Tensor> output = device->fetch();Blocks until inference completes and returns the output tensor.
if (program_info.activation_enabled()) {
final_output = device->dequantize(*dense_output);
}If the model uses quantized activations, the output is in int32 format and must be dequantized to float.
const float* data = final_output->data<float>();Access the raw output data as a typed pointer.
The Akida Engine requires the runtime to implement 4 functions declared in infra/system.h:
msleep(uint32_t ms)- Sleep for milliseconds (uses POSIXusleep)time_ms()- Get current time in milliseconds (usesclock_gettime)kick_watchdog()- Service hardware watchdog (no-op on Linux host)panic(const char* fmt, ...)- Fatal error handler (prints to stderr and aborts)
These implementations work for any Linux host. For embedded targets (RTOS), you would implement these using your platform's APIs.
auto drivers = akida::get_drivers();This function scans for Akida devices connected via PCIe. It returns a vector of HardwareDriver smart pointers.
Note: akida::get_drivers() is specific to the host PCIe deployment. For embedded systems, you would instead create chip-specific drivers:
akida::BareMetalDriverfor AKD1000akida::Akd1500SpiDriverfor AKD1500 over SPI
To validate that your C++ pipeline produces the same results as Python, use the test input generator:
cd scripts/
# Generate C++ input files from a numpy array
python generate_test_input.py \
--input-file /path/to/test_input.npy \
--input-shape 32,32,3 \
--input-type uint8 \
--model-path /path/to/model.fbz \
--output-dir ../src/This creates:
test_input.h/test_input.cpp- C++ arrays with your test dataexpected_output.npy- Python inference results for comparison
Modify inference.cpp to use the test input:
#include "test_input.h" // Add this
// In run_inference(), replace the test pattern with:
std::vector<uint8_t> input_data(test_input, test_input + test_input_len);Rebuild and run. Compare the C++ output values with expected_output.npy.
Expected differences:
- Small floating-point differences (< 1e-5) are normal due to rounding
- Larger differences may indicate issues with input preprocessing or format conversion
The current inference.cpp uses a simple test pattern. Replace it with your actual data source:
Example: Load from image file (requires OpenCV)
#include <opencv2/opencv.hpp>
// In run_inference():
cv::Mat img = cv::imread("input.jpg");
cv::resize(img, img, cv::Size(32, 32));
cv::cvtColor(img, img, cv::COLOR_BGR2RGB);
std::vector<uint8_t> input_data(img.data, img.data + (32 * 32 * 3));Example: Real-time sensor data
// Read from sensor API
SensorReading reading = sensor.read();
std::vector<uint8_t> input_data(reading.data, reading.data + reading.size);Common pre-processing steps for image models:
// Normalize to [0, 1]
std::vector<float> normalized(input_size);
for (int i = 0; i < input_size; i++) {
normalized[i] = input_data[i] / 255.0f;
}
// Create tensor with float data
auto dense_input = akida::Dense::create_view(
normalized.data(),
shape,
akida::DType::float32
);Softmax (convert logits to probabilities):
std::vector<float> probabilities(output_size);
float sum = 0.0f;
for (int i = 0; i < output_size; i++) {
probabilities[i] = std::exp(data[i]);
sum += probabilities[i];
}
for (int i = 0; i < output_size; i++) {
probabilities[i] /= sum;
}Threshold-based decisions:
const float CONFIDENCE_THRESHOLD = 0.8f;
if (probabilities[predicted_class] > CONFIDENCE_THRESHOLD) {
printf("Confident prediction: class %d\n", predicted_class);
} else {
printf("Low confidence - rejecting prediction\n");
}For simpler use cases where you don't need fine-grained control, use predict():
// Combines enqueue() + fetch() in one call
auto output = device->predict(*input_tensor);
if (!output) {
// Handle error
}This is more convenient but gives less control (e.g., you can't measure enqueue/fetch separately).
// Enable hardware clock counter
device->toggle_clock_counter(true);
// Run inference
device->enqueue(*input_tensor);
auto output = device->fetch();
// Read cycle count
uint64_t cycles = device->read_clock_counter();
device->toggle_clock_counter(false);
// Convert to time (assumes 375 MHz clock for AKD1000)
double latency_ms = (cycles / 375000.0);
printf("Inference latency: %.2f ms\n", latency_ms);For throughput-oriented applications, process multiple samples per batch:
// Set batch size
device->set_batch_size(8, false);
// Prepare batched input (8 samples concatenated)
std::vector<uint8_t> batch_data(8 * input_size);
// ... fill with 8 samples ...
auto batch_input = akida::Dense::create_view(
batch_data.data(),
akida::Shape{8 * height, width, channels}, // Batch in first dimension
akida::DType::uint8
);
// Process batch
device->enqueue(*batch_input);
auto batch_output = device->fetch();
// Extract individual results
// Output shape: [8 * out_height, out_width, out_channels]Error:
ERROR: No Akida devices found!
Solutions:
-
Check PCIe driver:
lsmod | grep akida # Should show: akida_dw_edma # If not loaded: sudo modprobe akida_dw_edma
-
Verify device visibility:
akida devices # Should list at least one device # If empty, check dmesg: dmesg | grep akida
-
Check permissions:
ls -l /dev/akida* # Should have rw permissions # If not, either run as root or add udev rule: sudo usermod -a -G akida $USER # Then log out and back in
-
Hardware connection:
- Verify the PCIe card is seated properly
- Check
lspci | grep -i brainto see if the card is detected
Error:
ERROR: Failed to create hardware device
Device is busy (locked by another process)
Solution: Only one process can access the device at a time. Close any other applications using the device:
# Check for processes using /dev/akida*
lsof /dev/akida*
# Kill them if safe
kill <pid>Error:
CMake Error: Could not find akida-model module
Solution:
The CMake module path is incorrect. Verify the package is placed at engine/test/customer_deployment_package/. The CMakeLists.txt expects this exact location.
If you must place it elsewhere, edit src/CMakeLists.txt line 21:
set(CMAKE_MODULE_PATH
"${CMAKE_CURRENT_LIST_DIR}/<relative_path_to_engine>/cmake"
${CMAKE_MODULE_PATH}
)Error:
fatal error: api/akida/hardware_device.h: No such file or directory
Solution: The include path is incorrect. Verify that:
- The engine was deployed with
akida engine deploy - You're building from the correct directory
- The
CMakeLists.txtinclude path (line 63) correctly points to the engine root
Error:
PANIC: Scratch memory too small
Solution: The model requires more scratch memory than available on the device. This typically means:
- The model is too large for the device
- The batch size is too large
Try:
- Reducing batch size:
device->set_batch_size(1, false) - Using a smaller model
- Using a device with more memory
Error:
ERROR: Failed to program device - invalid ProgramInfo
Solution: The model binary is incompatible with the device. This can happen if:
- The model was converted for a different chip (AKD1000 vs AKD1500)
- The
program.cppfile is corrupted - The model file was not properly converted
Re-run convert_model.py and ensure it completes without errors.
Issue: C++ inference results differ significantly from Python.
Common causes:
-
Input format mismatch:
- Check that input data type matches (uint8 vs float)
- Verify preprocessing is identical (normalization, scaling)
- Ensure input shape is correct (HWC vs CHW)
-
Sparse conversion:
- If
input_is_dense()returnsfalse, ensure you're converting to sparse - Check that the conversion parameters match Python
- If
-
Dequantization:
- Verify
activation_enabled()check is correct - Ensure you're calling
dequantize()when needed
- Verify
-
Test input data:
- Use
generate_test_input.pyto create test data from the exact same numpy array used in Python - Compare outputs element-by-element
- Use
Acceptable differences:
- Floating-point differences < 1e-5 are normal due to rounding
- Quantized models may have larger differences (< 1e-3) but should predict the same class
- Replace test input with your data source (file, camera, sensor)
- Add necessary preprocessing (normalization, resizing, color conversion)
- Add postprocessing (softmax, argmax, thresholding)
- Implement error handling and logging for production
- Add latency measurement and performance monitoring
- Test with diverse inputs to validate correctness
- Optimize batch size for your throughput requirements
- Add unit tests for the inference pipeline
- Document model-specific requirements (input format, class labels, etc.)
-
Minimize data copies:
- Use
create_view()instead ofcreate()when possible - Avoid unnecessary format conversions
- Use
-
Batch processing:
- For throughput-critical applications, use larger batch sizes
- Profile to find the optimal batch size for your latency requirements
-
Pipelining:
- While one batch is being processed, prepare the next batch
- Use multiple devices in parallel if available
-
Asynchronous I/O:
- Don't block on input/output while inference is running
- Use separate threads for data acquisition and inference
For deeper understanding of the Akida C++ API, consult the full documentation:
- Getting Started Guide - Overview of the inference workflow
- Hardware Device API - Detailed API reference for
HardwareDevice - Tensor Operations - Working with Dense and Sparse tensors
- Input Conversion - Format conversion utilities
- Program Info - Understanding model metadata
- Device Drivers - Implementing custom drivers for embedded systems
- Build System - Advanced CMake configuration
These documents are included in the engine deployment at engine/docs/.
This package is provided as a reference implementation to help you get started. The code is heavily commented to explain the API usage and design decisions.
For issues specific to:
- This package: Review the code comments and troubleshooting section
- Akida Engine API: Consult the full documentation in
engine/docs/ - Model accuracy: Verify against Python inference using
generate_test_input.py - Performance: See the optimization section above
This package is provided as-is for use with BrainChip Akida hardware. The Akida Engine library is proprietary software licensed by BrainChip.
Document Version: 1.0 Last Updated: 2024 Compatible with: Akida Engine 2.x, AKD1000, AKD1500