Two Stream CNN is proposed in SKELETON-BASED ACTION RECOGNITION WITH CONVOLUTIONAL NEURAL NETWORKS, which is used for skeleton-based action recognition. It maps a skeleton sequence to an image( coordinates x,y,z to image R,G,B ). And they specially designed skeleton transformer module to rearrange and select important skeleton joints automatically.
- Python3
- Keras
- h5py
- matplotlib
- numpy
The network mainly consists of four modules which are Skeleton Transformer, ConvNet, Feature Fusion and Classification. The inputs of two stream are raw data(x, y, z) and frame difference respectively. As show below :

-
function/data_generator.py : generate the inputs numpy array of two stream
-
layers/transformer : the layer of Skeleton Transformer implement in Keras
-
network/ : the fold has four flies with different feature fusion way
| model | accuracy(cs) |
|---|---|
| base line | 83.2% |
| my model | 80.7% |
Introduce attention mechanism to Skeleton Transformer module. Then, the accurancy can reach at 82.1%.
If you have any questions, please feel free to contact me.
Duohan Liang (duohanl@outlook.com)