Carton

Run any ML model from any programming language. *

One open-source API for all frameworks.

1import cartonml as carton
2
3MODEL_PATH = "/path/to/pytorch_model.carton"                 
4
5model = await carton.load(MODEL_PATH)
6await model.infer({
7  "x": np.zeros(5)
8})

PythonJavaScript*TypeScript*RustCC++C#*Java*Golang*Swift*Ruby*PHP*Kotlin*Scala*PythonJavaScript*TypeScript*RustCC++C#*Java*Golang*Swift*Ruby*PHP*Kotlin*Scala*

placeholder

* Work in progress

1 Pack a model

Carton wraps your model with some metadata and puts it in a zip file. It does not modify the original model, avoiding error-prone conversion steps.

You just need to specify a framework and required version of that framework.

original_model.pt

requiresTorchScript 2.0.x

model/
original_model.pt
carton.toml
...

model.carton

2 Load a model

When loading a packed model, Carton reads the included metadata to figure out the appropriate "runner" to use and automatically fetches one if needed.

Tip: a runner is a component of Carton that knows how to run a model with a specifc version of an ML framework.

model.carton

Loading withTorchScript 2.0.1runner...

3 Run a model

All your inference code is framework-agnostic.

Your application uses Carton's API and Carton calls into the underlying framework.

Carton is implemented in Rust with bindings to several languages, all using the same optimized core.

Your Python Application

Carton

TorchScript 2.0.1 Runner

Get Started

or explore the community model registry.

Frequently Asked Questions

Why not use Torch, TF, etc. directly?

Ideally, the ML framework used to run a model should just be an implementation detail. By decoupling your inference code from specific frameworks, you can easily keep up with the cutting-edge.

How much overhead does Carton have?

Most of Carton is implemented in optimized async Rust code. Preliminary benchmarks with small inputs show an overhead of less than 100 microseconds (0.0001 seconds) per inference call.

We're still optimizing things further with better use of Shared Memory. This should bring models with large inputs to similar levels of overhead.

What platforms does Carton support?

Currently, Carton supports the following platforms:

x86_64 Linux and macOS
aarch64 Linux (e.g. Linux on AWS Graviton)
aarch64 macOS (e.g. M1 and M2 Apple Silicon chips)
WebAssembly (metadata access only for now, but WebGPU runners are coming soon)

What is "a carton"?

A carton is the output of the packing step. It is a zip file that contains your original model and some metadata. It does not modify the original model, avoiding error-prone conversion steps.

Why use Carton instead of ONNX?

ONNX converts models while Carton wraps them. Carton uses the underlying framework (e.g. PyTorch) to actually execute a model under the hood. This is important because it makes it easy to use custom ops, TensorRT, etc without changes. For some sophisticated models, "conversion" steps (e.g. to ONNX) can be problematic and require validation. By removing these conversion steps, Carton enables faster experimentation, deployment, and iteration.

With that said, we plan to support ONNX models within Carton. This lets you use ONNX if you choose and it enables some interesting use cases (like running models in-browser with WASM).