Step-by-Step Guide for PIDNet Deployment¶

PIDNet is a well-known model for performing semantic segmentation.

This guide offers a comprehensive guide on how to use our model tools to compile and test the newly downloaded PIDNet model.

The article is divided into two sections:

Preparing and verifying the AI model for NPU (NEF) on a Linux PC
Running the AI NPU model (NEF) on KNEO Pi

Preparing and Verifying the AI Model for NPU (NEF) on a Linux PC¶

Step 1: Set Up the Environment and Data¶

Step 1-1: Set up the toolchain environment

First, we need to download the latest toolchain Docker image which includes all the necessary tools for the process.
```
docker pull kneron/toolchain:latest
```
Start the Docker container with a local folder mounted inside the container.
```
docker run --rm -it -v /your/folder/path/for/docker_mount:/data1 kneron/toolchain:latest
```
Install the necessary Python packages for PIDNet
```
pip install opencv-python-headless
pip install torchvision --index-url https://download.pytorch.org/whl/cpu
```
Step 1-2: Clone the model

Navigate to the mounted folder and clone the public PyTorch-based PIDNet model from GitHub https://github.com/XuJiacong/PIDNet.git using the following command:
```
cd /data1 && git clone https://github.com/XuJiacong/PIDNet.git
```
Check the model's documentation https://github.com/XuJiacong/PIDNet?tab=readme-ov-file#models to obtain the download link for the PIDNet-S pretrained pt model (PIDNet_S_Cityscapes_test.pt). Once downloaded, save the model to /data1 (or /your/folder/path/for/docker_mount)
Step 1-3: Prepare the images for model quantization

We need to prepare some images in the mounted folder. Example input images can be found at http://doc.kneron.com/docs/toolchain/res/test_image10.zip.

Here’s how you can obtain it:
```
cd /data1
wget http://doc.kneron.com/docs/toolchain/res/test_image10.zip
unzip test_image10.zip
```
Now we have images in folder test_image10/ at /data1.

Important

These images are provided as examples. For improved quantization accuracy, users may need to select their own data.

We also require additional images for accuracy testing. However, due to the complexity of the documentation, we are using only one image (PIDNet/samples/frankfurt_000000_003025_leftImg8bit.png) in the toolchain Docker for testing purposes.

Step 2: Import KTC and required lib in python shell¶

Now, we will go through the entire toolchain flow using the KTC (Kneron Toolchain) Python API in a Python shell.

run `python" to open to python shell:

Figure 1. python shell

import KTC and others necessary libs

import ktc
import os
import onnx
from PIL import Image
import numpy as np

import sys
sys.path.append('./PIDNet/tools')
from custom import load_pretrained
import models

import torch

Step 3: Convert and optimize the pretrain model¶

We need to first convert the pretrained .pt model to an ONNX model.

model = models.pidnet.get_pred_model('pidnet-s', 19)
model = load_pretrained(model, './PIDNet_S_Cityscapes_test.pt')
model.eval()

dummy_input_1 = torch.randn(1, 3, 480, 640, device="cpu")
save_path = "./{}.onnx".format('kneopi_pidnet_s_480x640')     
torch.onnx.export(
    model, 
    (dummy_input_1), 
    save_path, 
    verbose=False, 
    keep_initializers_as_inputs=True, 
    opset_version=11)

The exported ONNX model will be saved as /data1/kneopi_pidnet_s_480x640.onnx (or /your/folder/path/for/docker_mount/kneopi_pidnet_s_480x640.onnx). You can view the model architecture using Netron.

In addition to the conversion, we also need to optimize the model to ensure it is compatible and efficient for our hardware.

ONNX_PATH = "kneopi_pidnet_s_480x640.onnx" 
m = onnx.load(ONNX_PATH)
md_opt = ktc.onnx_optimizer.onnx2onnx_flow(m)

We now have the optimized ONNX model stored in the variable, m. You can save this ONNX model (m) to disk for further inspection, such as using Netron or onnxruntime.

onnx.save(md_opt,'kneopi_pidnet_s_480x640.opt.onnx')

Here we save it to /data1/kneopi_pidnet_s_480x640.opt.onnx for further verification in Step 5.

Step 4: IP Evaluation¶

To ensure the ONNX model functions as expected, it is important to evaluate its performance and check for any unsupported operators or CPU nodes within the model.

km = ktc.ModelConfig(11111, "0001", "730", onnx_model=md_opt)

# npu(only) performance simulation
eval_result = km.evaluate()
print("\nNpu performance evaluation result:\n" + str(eval_result))

You can find the estimated FPS (NPU only) and a detailed report in the kneron_flow/model_fx_report.json and kneron_flow/model_fx_report.html files.

cat kneron_flow/model_fx_report.json

{
    "docker_version": "kneron/toolchain_debug:shared_v_20241208",
    "comments": "",
    "kdp730/input bitwidth": "int8",
    "kdp730/output bitwidth": "int8",
    "kdp730/cpu bitwidth": "int8",
    "kdp730/datapath bitwidth": "int8",
    "kdp730/weight bitwidth": "int8",
    "kdp730/ip_eval/fps": "47.0866",
    "kdp730/ip_eval/ITC(ms)": "21.2375 ms",
    "kdp730/ip_eval/C(GOPs)": "6.93943e+09",
    "kdp730/ip_eval/RDMA bandwidth GB/s": "8 GB/s",
    "kdp730/ip_eval/WDMA bandwidth GB/s": "8 GB/s",
    "kdp730/ip_eval/GETW bandwidth GB/s": "4.5 GB/s",
    "kdp730/ip_eval/RV(mb)": 43.6495,
    "kdp730/ip_eval/WV(mb)": 39.3821,
    "kdp730/ip_eval/cpu_node": "AveragePool: /spp/scale1/scale1.0/AveragePool, /spp/scale2/scale2.0/AveragePool, /spp/scale3/scale3.0/AveragePool",
    "kdp730/kne": "models_730.kne",
    "kdp730/nef": "models_730.nef",
    "kdp730/bie": "input.kdp730.scaled.release.bie",
    "kdp730/onnx": "input.kdp730.graph_opt.onnx",
    "gen fx model report": "model_fx_report.html",
    "gen fx model json": "model_fx_report.json"
}

the estimated FPS is 47.0866, the report is for NPU only

Step 5: Check ONNX model and Pre&Post process are good¶

If we obtain the correct segmentation result from the ONNX model, along with the proper pre and post-processing, everything should be functioning correctly.

First, we need to review the pre and post-processing methods. You can find the relevant information in the following code: https://github.com/XuJiacong/PIDNet/blob/main/tools/custom.py.

Here is the extracted pre-processing method:

import sys
sys.path.append('./PIDNet/tools')
from custom import input_transform

def preprocess(img, witdth, heigh):

    img = cv2.resize(img,(witdth,heigh))
    sv_img = np.zeros_like(img).astype(np.uint8)
    img = input_transform(img)
    np_data = img.transpose((2, 0, 1)).copy()
    np_data = np.expand_dims(np_data, axis=[0]).astype(np.float32)

    return np_data

and post-processing method:

import torch
import torch.nn.functional as F

def postprocess(inf_results, ori_image_shape):

    pred = torch.from_numpy(inf_results)
    raw_shape_info = ori_image_shape
    pred = F.interpolate(pred, size=(raw_shape_info[0],raw_shape_info[1]), 
                            mode='bilinear', align_corners=True)
    pred = torch.argmax(pred, dim=1).squeeze(0).cpu().numpy()

    return pred

To verify that the exported ONNX model and the extracted pre/post-processing are correct, we use ONNX Runtime to ensure everything is functioning properly.

## onnx model check using onnxruntime
input_image = cv2.imread('/data1/PIDNet/samples/frankfurt_000000_003025_leftImg8bit.png', cv2.IMREAD_COLOR)
ort_session = ort.InferenceSession("/data1/kneopi_pidnet_s_480x640.opt.onnx")

# resize and normalize input data
model_input_w, model_input_h = 640, 480
in_data = preprocess(input_image, model_input_w, model_input_h)

# onnx inference using onnxruntime
out_data = ort_session.run(None, {'input.1': in_data})

# onnx output data processing
pred = postprocess(out_data[0], input_image.shape)

# visualize segmentation result to img
sv_img = np.zeros_like(input_image).astype(np.uint8)
for i, color in enumerate(color_map):
    for j in range(3):
        sv_img[:,:,j][pred==i] = color_map[i][j]

cv2.imwrite("/data1/kneopi_onnxruntime_inf.png", sv_img)

you can see the result image on kneopi_onnxruntime_inf.png.

Now, we can check the ONNX inference result with ktc api 'ktc.kneron_inference'.

## onnx model check

input_image = cv2.imread('/data1/PIDNet/samples/frankfurt_000000_003025_leftImg8bit.png', cv2.IMREAD_COLOR)

# resize and normalize input data
model_input_w, model_input_h = 640, 480
in_data = preprocess(input_image, model_input_w, model_input_h)

# onnx inference 
out_data = ktc.kneron_inference([in_data], onnx_file="/data1/kneopi_pidnet_s_480x640.opt.onnx", input_names=["input.1"])

# onnx output data processing
pred = postprocess(out_data[0], input_image.shape)

# visualize segmentation result to img
sv_img = np.zeros_like(input_image).astype(np.uint8)
for i, color in enumerate(color_map):
    for j in range(3):
        sv_img[:,:,j][pred==i] = color_map[i][j]

cv2.imwrite("/data1/kneopi_tc_onnx_inf.png", sv_img)

You can view the result image in the file kneopi_tc_onnx_inf.png.

Step 6: Quantization¶

We identified the preprocessing method in Step 5.

Perform the same steps on our quantization data and add it to a list.

# load and normalize all image data from folder
q_imgs_path = "./test_image10/"
images_list = glob.glob(q_imgs_path+'*'+ '.jpg')

normalized_img_list = []
for img_path in images_list:

    img_name = img_path.split("/")[-1]
    img = cv2.imread(os.path.join(q_imgs_path, img_name), cv2.IMREAD_COLOR)

    model_input_w, model_input_h = 640, 480
    img = preprocess(img, model_input_w, model_input_h)

    normalized_img_list.append(img)

then do quantization:

# fix point analysis
bie_model_path = km.analysis({"input.1": normalized_img_list})
print("\nFix point analysis done. Save bie model to '" + str(bie_model_path) + "'")

The bie model will be generated and saved at /data1/kneron_flow/input.kdp730.scaled.release.bie.

Step 7: Verify BIE Model Accuracy¶

After quantization, a slight drop in model accuracy is expected. It’s important to check if the accuracy is still sufficient for use.

The Toolchain API ktc.kneron_inference can assist with this check. Its usage is similar to Step 5, with the only difference being that the second parameter is changed from the ONNX file to the BIE file.

## bie model check

input_image = cv2.imread('/data1/PIDNet/samples/frankfurt_000000_003025_leftImg8bit.png', cv2.IMREAD_COLOR)

# resize and normalize input data
model_input_w, model_input_h = 640, 480
in_data = preprocess(input_image, model_input_w, model_input_h)

# bie inference 
out_data = ktc.kneron_inference([in_data], bie_file=bie_model_path, input_names=["input.1"], platform=730)

# bie output data processing
pred = postprocess(out_data[0], input_image.shape)

# visualize segmentation result to img
sv_img = np.zeros_like(input_image).astype(np.uint8)
for i, color in enumerate(color_map):
    for j in range(3):
        sv_img[:,:,j][pred==i] = color_map[i][j]

cv2.imwrite("/data1/kneopi_tc_bie_inf.png", sv_img)

You can view the result image in kneopi_tc_bie_inf.png.

It is slightly different from Step 4, but it looks good enough.

Note

We are currently using only one image as an example. It’s a good idea to use more data to check the accuracy.

Step 8: Compile¶

The final step is to compile the BIE model into a NEF model.

# compile
nef_model_path = ktc.compile([km])
print("\nCompile done. Save Nef file to '" + str(nef_model_path) + "'")

You can find the .nef file under /data1/kneron_flow/models_730.nef. The 'models_730.nef' is the final compiled model.

To learn the usage of generated NEF model on KL730, please check example section: KL730End2EndTutorialPidnet

(optional) Step 9. Check NEF model¶

The Toolchain API ktc.inference supports performing NEF model inference. Its usage is similar to Step 5 and Step 7, but with one difference:

The second parameter in ktc.kneron_inference should be the nef_file. The code appears as follows:

# nef model check

input_image = cv2.imread('/data1/PIDNet/samples/frankfurt_000000_003025_leftImg8bit.png', cv2.IMREAD_COLOR)

# resize and normalize input data
model_input_w, model_input_h = 640, 480
in_data = preprocess(input_image, model_input_w, model_input_h)

# nef inference 
out_data = ktc.kneron_inference([in_data], nef_file=nef_model_path, input_names=["input.1"], platform=730)

# nef output data processing
pred = postprocess(out_data[0], input_image.shape)

# visualize segmentation result to img
sv_img = np.zeros_like(input_image).astype(np.uint8)
for i, color in enumerate(color_map):
    for j in range(3):
        sv_img[:,:,j][pred==i] = color_map[i][j]

cv2.imwrite("/data1/kneopi_tc_nef_inf.png", sv_img)

You can view the result image in kneopi_tc_nef_inf.png. The NEF model results should match exactly with those of the BIE model.

Appendix - The entire process¶

The entire model conversion process from ONNX to NEF (Step 2 - 9) can be written into a single Python script.

Python script for the entire process

import ktc
from PIL import Image
import cv2
import os
import numpy as np
import glob
import onnx
import torch
import torch.nn.functional as F

import onnxruntime as ort

import sys
sys.path.append('./PIDNet/tools')
from custom import color_map, mean, std, input_transform

def preprocess(img, witdth, heigh):

    img = cv2.resize(img,(witdth,heigh))
    sv_img = np.zeros_like(img).astype(np.uint8)
    img = input_transform(img)
    np_data = img.transpose((2, 0, 1)).copy()
    np_data = np.expand_dims(np_data, axis=[0]).astype(np.float32)

    return np_data

def postprocess(inf_results, ori_image_shape):

    pred = torch.from_numpy(inf_results)
    raw_shape_info = ori_image_shape
    pred = F.interpolate(pred, size=(raw_shape_info[0],raw_shape_info[1]), 
                            mode='bilinear', align_corners=True)
    pred = torch.argmax(pred, dim=1).squeeze(0).cpu().numpy()

    return pred


######################
###### onnx to nef
######################

q_imgs_path = "./test_image10/"
images_list = glob.glob(q_imgs_path+'*'+ '.jpg')

normalized_img_list = []
for img_path in images_list:

    img_name = img_path.split("/")[-1]
    img = cv2.imread(os.path.join(q_imgs_path, img_name), cv2.IMREAD_COLOR)

    model_input_w, model_input_h = 640, 480
    img = preprocess(img, model_input_w, model_input_h)

    normalized_img_list.append(img)


ONNX_PATH = "kneopi_pidnet_s_480x640.onnx" 
m = onnx.load(ONNX_PATH)

# optimized and normalize onnx data format for kneron toolchain
md_opt = ktc.onnx_optimizer.onnx2onnx_flow(m)

# save opt onnx model 
onnx.save(md_opt,'kneopi_pidnet_s_480x640.opt.onnx')

km = ktc.ModelConfig(11111, "0001", "730", onnx_model=md_opt)

# npu(only) performance simulation
eval_result = km.evaluate()
print("\nNpu performance evaluation result:\n" + str(eval_result))


# fix point analysis
bie_model_path = km.analysis({"input.1": normalized_img_list})
print("\nFix point analysis done. Save bie model to '" + str(bie_model_path) + "'")


# compile
nef_model_path = ktc.compile([km])
print("\nCompile done. Save Nef to " + nef_model_path)



## onnx model check using onnxruntime
input_image = cv2.imread('/data1/PIDNet/samples/frankfurt_000000_003025_leftImg8bit.png', cv2.IMREAD_COLOR)
ort_session = ort.InferenceSession('/data1/kneopi_pidnet_s_480x640.opt.onnx')

# resize and normalize input data
model_input_w, model_input_h = 640, 480
in_data = preprocess(input_image, model_input_w, model_input_h)

# onnx inference using onnxruntime
out_data = ort_session.run(None, {'input.1': in_data})

# onnx output data processing
pred = postprocess(out_data[0], input_image.shape)

# visualize segmentation result to img
sv_img = np.zeros_like(input_image).astype(np.uint8)
for i, color in enumerate(color_map):
    for j in range(3):
        sv_img[:,:,j][pred==i] = color_map[i][j]

cv2.imwrite("/data1/kneopi_onnxruntime_inf.png", sv_img)


######################
###### onnx model check
######################

input_image = cv2.imread('/data1/PIDNet/samples/frankfurt_000000_003025_leftImg8bit.png', cv2.IMREAD_COLOR)

# resize and normalize input data
model_input_w, model_input_h = 640, 480
in_data = preprocess(input_image, model_input_w, model_input_h)

# onnx inference 
out_data = ktc.kneron_inference([in_data], onnx_file="/data1/kneopi_pidnet_s_480x640.opt.onnx", input_names=["input.1"])

# onnx output data processing
pred = postprocess(out_data[0], input_image.shape)

# visualize segmentation result to img
sv_img = np.zeros_like(input_image).astype(np.uint8)
for i, color in enumerate(color_map):
    for j in range(3):
        sv_img[:,:,j][pred==i] = color_map[i][j]

cv2.imwrite("/data1/kneopi_tc_onnx_inf.png", sv_img)

######################
###### bie model check
######################

input_image = cv2.imread('/data1/PIDNet/samples/frankfurt_000000_003025_leftImg8bit.png', cv2.IMREAD_COLOR)

# resize and normalize input data
model_input_w, model_input_h = 640, 480
in_data = preprocess(input_image, model_input_w, model_input_h)

# bie inference 
out_data = ktc.kneron_inference([in_data], bie_file=bie_model_path, input_names=["input.1"], platform=730)

# bie output data processing
pred = postprocess(out_data[0], input_image.shape)

# visualize segmentation result to img
sv_img = np.zeros_like(input_image).astype(np.uint8)
for i, color in enumerate(color_map):
    for j in range(3):
        sv_img[:,:,j][pred==i] = color_map[i][j]

cv2.imwrite("/data1/kneopi_tc_bie_inf.png", sv_img)

######################
###### nef model check
######################

input_image = cv2.imread('/data1/PIDNet/samples/frankfurt_000000_003025_leftImg8bit.png', cv2.IMREAD_COLOR)

# resize and normalize input data
model_input_w, model_input_h = 640, 480
in_data = preprocess(input_image, model_input_w, model_input_h)

# nef inference 
out_data = ktc.kneron_inference([in_data], nef_file=nef_model_path, input_names=["input.1"], platform=730)

# nef output data processing
pred = postprocess(out_data[0], input_image.shape)

# visualize segmentation result to img
sv_img = np.zeros_like(input_image).astype(np.uint8)
for i, color in enumerate(color_map):
    for j in range(3):
        sv_img[:,:,j][pred==i] = color_map[i][j]

cv2.imwrite("/data1/kneopi_tc_nef_inf.png", sv_img)

Run AI model on KNEO Pi¶

Step 1: Log into your KNEO Pi, check your IP address¶

Command reference

# find the IP address
ifconfig

# to restart nnm USB loopback mode if you have disabled it.
# must login as root
systemctl disable nnm-usb-companion
systemctl enable nnm-usb-loopback

# reboot
reboot

Step 2: Install the necessary packages on the KNEO Pi.¶

Step 2-1: download KneronPLUS python whl from Kneron Developer Support Page https://www.kneron.com/tw/support/developers first

Kneron Developer Support page     -> 
KNEO Pi section                   -> 
Kneron PLUS Core Python Wheel     -> 
KneronPLUS-3.1.1-py3-none-any.whl

Step 2-2: Install packages

cd /home
python3 -m venv venv_pidnet_test
source venv_pidnet_test/bin/activate
python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
python3 -m pip install opencv-python-headless
scp KneronPLUS-3.1.1-py3-none-any.whl alarm@<kneopi_ip_address>:/tmp
python3 -m pip install /tmp/KneronPLUS-3.1.1py3-none-any.whl
pacman -S libusb

<kneopi_ip_address> is the IP of your KNEO Pi

Warning

KneronPLUS might have version compatible issue, please check your PLUS package version and KNEO Pi image version first. KneronPLUS-3.1.1-py3-none-any.whl is compatible with KNEO Pi image version 1.1.3.

How to use Python?

Refer to: Use Python

Step 3 Prepare the code and the NEF model.¶

Step 3-1: Download firmware, kp_firmware.tar, from KNEO Pi Example github page https://github.com/kneron/kneopi-examples/blob/main/ai_application/plus_python/res/firmware/KL730/kp_firmware.tar
Step 3-2: Transfer the official firmware and the generated NEF model from the PC to the KNEO Pi (PC operation)
```
scp models_730_GENERATED_FROM_PREVIOUS_SECTION.nef alarm@<kneopi_ip_address>:/tmp
scp kp_firmware.tar alarm@<kneopi_ip_adderss>:/tmp
```
<kneopi_ip_address> is the IP of your KNEO Pi

Step 3-3: Prepare the data and sample code

mv /tmp/models_730_GENERATED_FROM_PREVIOUS_SECTION.nef /home/.
mv /tmp/kp_firmware.tar /home/.

cd /home
git clone https://github.com/kneron/kneopi-examples.git

cd kneopi-examples

Step 4 Run the sample code¶

python plus_python/KL730End2EndTutorialPidnet.py  \
        -fw /home/kp_firmware.tar   \
        -m /home/models_730_GENERATED_FROM_PREVIOUS_SECTION.nef  \
        -img plus_python/res/images/bus.jpg

Warning

The code has been modified from Step 9 in the previous section. For an easier demonstration, we have extracted the post-processing from the original PIDNet repository.