You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
651 B
651 B
CPU Inference
When GPU resources are limited, some model layers can be offloaded to the CPU, and full CPU inference will be used if no GPU is available.
To deploy the model with CPU offloading, enable the Allow CPU Offloading option during deployment.
Once the deployment is complete, you can see how many layers have been offloaded to the CPU.
Next, you can test the model's inference performance in the Playground.



