A simple, limited scope model server for JIT compiled PyTorch models.
The key idea is that one should be able to spin up a simple HTTP service that should just work, and be able to handle inbound requests to JIT compiled models. We want to do a few key things to make this work well:
- Model caching - store deserialized
torch::jit::Modulepointers in an (modification-threadsafe) LRU cache so we don't need to pay deserialization cost every roundtrip - JSON marshaling for tensor types.
- Speed!
The build is suboptimal - it would be great to make the CMake config better.
Clone the repo (and submodule) with:
git clone --recursive https://github.com/lukedeo/torch-servingWe expect you to have libtorch unpacked somewhere (available here), and CMake available (as well as a C++11 compliant compiler).
This is also achievable using a python installed version of torch. We provide a script to locate the path to libtorch in your python installation.
Suspected Compatible PyTorch versions: torch<=1.6.0>=1.4.0
Run:
mkdir build && cd build
cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
# OR
cmake -DCMAKE_PREFIX_PATH=$(../scripts/libtorch-root) ..
makeThe executable will be apps/torch-serving. Go ahead and move that up a directory:
mv apps/torch-serving .. && cd ..Run the tests (in the parent directory) with:
build/tests/test-torch-servingNow, ensure you have Python & torch==1.4.0 installed, and run
python example/create_model.pywhich creates a dumb big model with interesting inputs and outputs (i.e., List[torch.Tensor], etc.) and runs a JIT trace, saving to model-example.pt.
From the repo directory (after following the build), just run ./torch-serving.
In another terminal, run a request through (maybe pipe through jq if you've got that installed)!
curl -X POST \
--data @example/post-data.json \
localhost:8888/serve?servable_identifier=model-example.ptwhich should output:
{
"type": "generic_dict",
"value": {
"out": [
[
{
"data_type": "float32",
"shape": [1, 3],
"type": "tensor",
"value": [19.27, 5.28, 3.72]
},
{
"data_type": "float32",
"shape": [1, 3],
"type": "tensor",
"value": [60, 33, 33]
}
],
{
"type": "string",
"value": "Hello!"
},
{
"data_type": "int64",
"type": "scalar",
"value": 30
}
]
}
}Note that we represent tensors unraveled and specify a shape, where you can do tensor.tensor(unraveled_tensor).reshape(shape).
- CI (Someone feel like setting up GH Actions?)
- Documentation
- Better build
- Support
type: imagefrom JSON (base64 encoding) - Wire up the cache invalidation probability to the API.