Loading a model

Select a programming language:

Once you have a packed model, you can pass in a file path or URL to the model.

import asyncio
import cartonml as carton
import numpy as np

async def main():
    # Note this might take a while the first time you use Carton.
    # Make sure to enable logging as described in the quickstart
    model = await carton.load("https://carton.pub/google-research/bert-base-uncased")
    out = await model.infer({
        "input": np.array(["Today is a good [MASK]."])
    })
    print(out)
    # {
    #     'scores': array([[12.977381]]),
    #     'tokens': array([['day']], dtype='<U3')
    # }

asyncio.run(main())

Carton loads the model (caching it locally if necessary).

If you need a packed model, take a look at the packing docs or explore the community model registry.

Load an unpacked model

Carton also supports loading an unpacked model via the load_unpacked method. This is conceptually the same as pack followed by load, but is implemented more efficiently internally. It supports all the options that load and pack support.

See the quickstart guide for an example.

Options

There are a few options you can pass in when loading a model, but none of them are required.

`visible_device`

Type: string

The device that is visible to this model.

Allowed values:

cpu
A GPU index (e.g. 0, 1, etc.)
A GPU UUID (including the GPU- or MIG-GPU- prefix).
See https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars for more details.

The default is GPU 0 (or CPU if no GPUs are available).

Note: a visible device does not necessarily mean that the model will use that device; it is up to the model to actually use it (e.g. by moving itself to GPU if it sees one available).

Note: If a GPU index is specified, but no GPUs are available, Carton will print a warning and attempt to fallback to CPU

await carton.load(
    # ...
    visible_device = "0",
)

`override_runner_opts`

Type: (see below)

Options to pass to the runner. These are runner-specific (e.g. PyTorch, TensorFlow, etc).

Overrides are merged with the options set when packing the model.

These are sometimes used to configure thread-pool sizes, etc.

For allowed values, see the packing docs for each framework.

await carton.load(
    # ...
    override_runner_opts = {
        # For example, if we know this is a torchscript model and we want to set
        # threading configuration for running this model.
        "num_interop_threads": 4,
        "num_threads": 1,
    },
)

`override_required_framework_version`

Type: string

This is a semver version range that specifies the version of the framework that the model requires.

See https://docs.rs/semver/1.0.16/semver/enum.Op.html and https://docs.rs/semver/1.0.16/semver/struct.VersionReq.html for more details on version ranges.

This is useful if a model is limited to a specific framework version range and you want to override it.

Note: this is not guaranteed to work if the underlying model isn't compatible with the version range you specify.

await carton.load(
    # ...

    # If we know this is a python model and we want to force it to
    # run with a `3.10.x` version of python.
    override_required_framework_version = "=3.10",
)