GitHub release (latest by
date) Visual
Studio CMake Licence

YOLO-NAS-DeepSparse.cpp Link to this heading

YOLO-NAS is a state-of-the-art object detector by Deci AI. This project implements the YOLO-NAS object detector in C++ with DeepSparse backend to speed up inference performance. DeepSparse is an inference runtime by Neural Magic that can greatly speed up inference performance on CPUs by leveraging sparsity.

Features Link to this heading

  • Supports both image and video inference.
  • Faster CPU inference speeds.

Getting Started Link to this heading

The following instructions demonstrates how to build this project on a Linux system. Windows is currently not supported by the DeepSparse library.

Prerequisites Link to this heading

  • CMake v3.8+ - found at https://cmake.org/

  • GCC/G++ compiler - found at https://gcc.gnu.org/

  • Python 3.8+ - Python is used to install the deepsparse library which is required for the build. Download here.

  • OpenCV v4.0+ - Download here.

  • DeepSparse v1.6.0+ - Download here.

Building the project Link to this heading

  1. Set the OpenCV_DIR environment variable to point to your ../../opencv/build directory (if not set).

  2. Run the following build commands: a. [Linux] Bash:

    cd <yolo-nas-deepsparse-cpp-directory>
    cmake -S. -Bbuild -DCMAKE_BUILD_TYPE=Release
    cd build
    
    make
  3. The compiled executable will be in root folder of the build directory.

Inference Link to this heading

  1. Export the ONNX file:

    from super_gradients.training import models
    
    model = models.get("yolo_nas_s", pretrained_weights="coco")
    model.eval()
    model.prep_model_for_conversion(input_size=(1, 3, 640, 640))
    models.convert_to_onnx(model=model, prep_model_for_conversion_kwargs={"input_size":(1, 3, 640, 640)}, out_path="yolo_nas_s.onnx")
  2. To run the inference, execute the following command:

    yolo-nas-deepsparse-cpp --model <ONNX_MODEL_PATH> [-i <IMAGE_PATH> | -v <VIDEO_PATH>] [--imgsz IMAGE_SIZE] [--gpu] [--iou-thresh IOU_THRESHOLD] [--score-thresh CONFIDENCE_THRESHOLD]

Benchmarks Link to this heading

The following benchmarks were done on Google Colab using Intel® Xeon® Processor E5-2699 v4 @ 2.20GHz with 2 vCPUs.

Backend Latency FPS Implementation
PyTorch 867.02ms 1.15 Native (model.predict() in super_gradients)
ONNX C++ (via OpenCV DNN) 962.27ms 1.04 Hyuotu
ONNX Python 626.37ms 1.59 Hyuotu
OpenVINO C++ 628.04ms 1.59 Y-T-G
DeepSparse C++ 565.75ms 1.83 Y-T-G

Authors Link to this heading

Acknowledgements Link to this heading

Thanks to @Hyuto for his work on ONNX implementation of YOLO-NAS in C++ which was utilized in this project.

License Link to this heading

This project is licensed under the MIT License - see the LICENSE file for details. DeepSparse Community edition is only for evaluation, research, and non-production. See the DeepSparse Community License for more details.