高通手机跑AI系列之——姿态识别

呼延冰枫 · 2025-9-26 10:56:49

(原创作者@CSDN_伊利丹~怒风)
环境准备

手机

测试手机型号：Redmi K60 Pro
处理器：第二代骁龙8移动--8gen2
运行内存：8.0GB ，LPDDR5X-8400，67.0 GB/s
摄像头：前置16MP+后置50MP+8MP+2MP
AI算力：NPU 48Tops INT8 && GPU 1536ALU x 2 x 680MHz = 2.089 TFLOPS
提示：任意手机均可以，性能越好的手机速度越快
软件

APP：AidLux 2.0
系统环境：Ubuntu 20.04.3 LTS
提示：AidLux登录后代码运行更流畅，在代码运行时保持AidLux APP在前台运行，避免代码运行过程中被系统回收进程，另外屏幕保持常亮，一般息屏后一段时间，手机系统会进入休眠状态，如需长驻后台需要给APP权限。
算法Demo

Demo代码介绍

这段代码是一个姿态检测模型的实时姿态估计应用，它使用了两个模型级联工作：一个用于检测人体，另一个用于在检测到的人体上识别关键点。下面是添加了详细中文注释的代码：
代码功能特点介绍

双模型级联处理：使用两个模型协同工作，第一个模型负责检测人体，第二个模型负责在检测到的人体上识别详细的关键点。
自适应摄像头选择：代码会自动检测并优先使用 USB 摄像头，如果没有 USB 摄像头，则会尝试使用设备内置摄像头。
图像处理优化：
- 图像预处理包括调整大小、填充和归一化
- 保持原始图像的宽高比，避免变形
- 支持图像水平翻转，使显示更符合用户习惯
高性能推理：
- 使用 aidlite 框架进行模型推理
- 姿态检测模型使用 CPU 加速
- 关键点识别模型使用 GPU 加速
- 多线程支持，提高处理效率
精确的姿态关键点识别：
- 检测人体 22 个关键点（上半身模型）
- 支持关键点连接，形成完整的姿态骨架
- 提供置信度阈值过滤，确保检测准确性
灵活的 ROI 提取：
- 基于检测结果动态提取感兴趣区域
- 支持旋转不变性，即使人体倾斜也能准确提取
- 自动调整 ROI 大小，适应不同距离的人体
直观的可视化：
- 清晰显示检测到的人体边界框
- 绘制关键点和连接线，形成直观的姿态骨架
- 支持自定义颜色和大小，便于区分不同姿态
鲁棒的错误处理：
- 摄像头打开失败自动重试
- 模型加载和推理错误检测
- 异常情况优雅处理，确保程序稳定运行

这个应用可以用于多种场景，如健身指导、动作分析、人机交互等，通过识别和跟踪人体关键点，可以实时分析人体姿态并提供反馈。
Demo中的算法模型分析

这段代码使用了两个模型用AidLite 框架进行人体姿态检测和关键点识别，它们分别是：
1. 姿态检测模型 (pose_detection.tflite)

作用：从输入图像中检测人体的大致位置和姿态。
输入：128×128 像素的 RGB 图像。
输出：包含边界框和关键点的预测结果（896 个候选框，每个框有 12 个坐标值）。
特点：
- 轻量级设计，适合实时处理。
- 使用锚框机制提高检测精度。
- 输出人体的粗略位置和关键点（如眼睛、耳朵、肩膀等）。
- 采用 CPU 加速，平衡性能与精度。

2. 上半身姿态关键点模型(pose_landmark_upper_body.tflite)

作用：在检测到的人体区域内，精确识别上半身的 22 个关键点。
输入：256×256 像素的 RGB 图像（ROI 区域）。
输出：31 个关键点的坐标（每个点包含 x、y、z 坐标和可见性）。
特点：
- 高精度识别肩部、肘部、手腕等关节位置。
- 使用 GPU 加速，提升复杂模型的推理速度。
- 支持多角度和遮挡场景下的姿态估计。
- 输出每个关键点的置信度，用于过滤不可靠的检测结果。

模型协同工作流程

姿态检测：先使用第一个模型快速定位人体位置。
ROI 提取：基于检测结果裁剪并旋转感兴趣区域（ROI）。
关键点识别：将 ROI 输入第二个模型，获取精细的上半身关键点。
坐标映射：将归一化的关键点坐标映射回原始图像空间。

这种级联模型的设计兼顾了效率和精度，适合实时视频流处理。
Demo代码

import math
import numpy as np
from scipy.special import expit
import time
from time import sleep
import aidlite
import os
import subprocess
import aidcv as cv2

复制代码

继续展开代码

# 摄像头设备路径
root_dir = "/sys/class/video4linux/"
<p>def resize_pad(img):<br>
"""<br>
调整图像大小并填充，使其适合检测器输入</p>
[code]人脸和手掌检测器网络分别需要256x256和128x128的输入图像。
此函数会保持原始图像的宽高比进行缩放，并在需要时添加填充。
返回值:
img1: 256x256大小的图像
img2: 128x128大小的图像
scale: 原始图像与256x256图像之间的缩放因子
pad: 原始图像中添加的填充像素
"""
size0 = img.shape
if size0[0]>=size0[1]:
h1 = 256
w1 = 256 * size0[1] // size0[0]
padh = 0
padw = 256 - w1
scale = size0[1] / w1
else:
h1 = 256 * size0[0] // size0[1]
w1 = 256
padh = 256 - h1
padw = 0
scale = size0[0] / h1
padh1 = padh//2
padh2 = padh//2 + padh%2
padw1 = padw//2
padw2 = padw//2 + padw%2
img1 = cv2.resize(img, (w1,h1))
img1 = np.pad(img1, ((padh1, padh2), (padw1, padw2), (0,0)), 'constant', constant_values=(0,0))
pad = (int(padh1 * scale), int(padw1 * scale))
img2 = cv2.resize(img1, (128,128))
return img1, img2, scale, pad

复制代码

def denormalize_detections(detections, scale, pad):
"""
将归一化的检测坐标映射回原始图像坐标

人脸和手掌检测器网络需要256x256和128x128的输入图像，
因此输入图像会被填充和缩放。此函数将归一化坐标映射回原始图像坐标。
输入:
detections: nxm张量。n是检测到的对象数量。
m是4+2*k，其中前4个值是边界框坐标，k是检测器输出的额外关键点数量。
scale: 用于调整图像大小的缩放因子
pad: x和y维度上的填充量
"""
detections[:, 0] = detections[:, 0] * scale * 256 - pad[0]
detections[:, 1] = detections[:, 1] * scale * 256 - pad[1]
detections[:, 2] = detections[:, 2] * scale * 256 - pad[0]
detections[:, 3] = detections[:, 3] * scale * 256 - pad[1]
detections[:, 4::2] = detections[:, 4::2] * scale * 256 - pad[1]
detections[:, 5::2] = detections[:, 5::2] * scale * 256 - pad[0]
return detections

复制代码

def _decode_boxes(raw_boxes, anchors):
"""
将预测结果转换为实际坐标

使用锚框将模型预测转换为实际边界框坐标，一次性处理整个批次。
"""
boxes = np.zeros_like(raw_boxes)
x_center = raw_boxes[..., 0] / 128.0 * anchors[:, 2] + anchors[:, 0]
y_center = raw_boxes[..., 1] / 128.0 * anchors[:, 3] + anchors[:, 1]
w = raw_boxes[..., 2] / 128.0 * anchors[:, 2]
h = raw_boxes[..., 3] / 128.0 * anchors[:, 3]
boxes[..., 0] = y_center - h / 2. # ymin
boxes[..., 1] = x_center - w / 2. # xmin
boxes[..., 2] = y_center + h / 2. # ymax
boxes[..., 3] = x_center + w / 2. # xmax
for k in range(4):
offset = 4 + k*2
keypoint_x = raw_boxes[..., offset ] / 128.0 * anchors[:, 2] + anchors[:, 0]
keypoint_y = raw_boxes[..., offset + 1] / 128.0 * anchors[:, 3] + anchors[:, 1]
boxes[..., offset ] = keypoint_x
boxes[..., offset + 1] = keypoint_y
return boxes

复制代码

def _tensors_to_detections(raw_box_tensor, raw_score_tensor, anchors):
"""
将神经网络输出转换为检测结果

神经网络输出是一个形状为(b, 896, 16)的张量，包含边界框回归预测，
以及一个形状为(b, 896, 1)的张量，包含分类置信度。
此函数将这两个"原始"张量转换为适当的检测结果。
返回一个(num_detections, 17)的张量列表，每个张量对应批次中的一张图像。
"""
detection_boxes = _decode_boxes(raw_box_tensor, anchors)
thresh = 100.0
raw_score_tensor = np.clip(raw_score_tensor, -thresh, thresh)
detection_scores = expit(raw_score_tensor)
# 注意：我们从分数张量中去掉了最后一个维度，因为只有一个类别。
# 现在我们可以简单地使用掩码来过滤掉置信度太低的框。
mask = detection_scores >= 0.75
# 由于批次中的每张图像可能有不同数量的检测结果，
# 因此使用循环一次处理一个图像。
boxes = detection_boxes[mask]
scores = detection_scores[mask]
scores = scores[..., np.newaxis]
return np.hstack((boxes, scores))

复制代码

def py_cpu_nms(dets, thresh):
"""
纯Python实现的非极大值抑制算法

用于过滤重叠的检测框，保留置信度最高的框。
"""
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
scores = dets[:, 12]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
# 按置信度从大到小排序，获取索引
order = scores.argsort()[::-1]
# keep列表存储最终保留的边框
keep = []
while order.size > 0:
# order[0]是当前分数最大的窗口，肯定要保留
i = order[0]
keep.append(dets[i])
# 计算窗口i与其他所有窗口的交叠部分的面积，矩阵计算
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
# 计算IoU（交并比）
ovr = inter / (areas[i] + areas[order[1:]] - inter)
# ind为所有与窗口i的IoU值小于阈值的窗口的索引
inds = np.where(ovr <= thresh)[0]
# 下一次计算前要把窗口i去除，所以索引加1
order = order[inds + 1]
return keep

复制代码

[/code]
模型位置

人脸和手掌检测器网络需要256x256和128x128的输入图像，
因此输入图像会被填充和缩放。此函数将归一化坐标映射回原始图像坐标。
输入:
detections: nxm张量。n是检测到的对象数量。
m是4+2*k，其中前4个值是边界框坐标，k是检测器输出的额外关键点数量。
scale: 用于调整图像大小的缩放因子
pad: x和y维度上的填充量
"""
detections[:, 0] = detections[:, 0] * scale * 256 - pad[0]
detections[:, 1] = detections[:, 1] * scale * 256 - pad[1]
detections[:, 2] = detections[:, 2] * scale * 256 - pad[0]
detections[:, 3] = detections[:, 3] * scale * 256 - pad[1]
detections[:, 4::2] = detections[:, 4::2] * scale * 256 - pad[1]
detections[:, 5::2] = detections[:, 5::2] * scale * 256 - pad[0]
return detections

复制代码

模型效果

来源：程序园用户自行投稿发布，如果侵权，请联系站长删除
免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！

钨哄魁 · 2025-10-29 00:48:30

不错，里面软件多更新就更好了

捡嫌 · 2025-11-13 05:25:26

谢谢分享，辛苦了

东门清心 · 2025-11-19 17:39:51

用心讨论，共获提升！

账号		自动登录	找回密码
密码			立即注册

高通手机跑AI系列之——姿态识别

相关帖子

回复

签约作者

高通手机跑AI系列之——姿态识别

相关帖子

相关推荐

回复

签约作者