AI 视觉实战 1：实时人脸检测

作者：轻口味

2023-04-20
北京
本文字数：6796 字
阅读完需：约 22 分钟

1. 背景

AI 在视觉领域最常用的就是人脸检测、人脸识别、活体检测、人体与行为分析、图像识别、图像增强等，而且目前都是比较成熟的技术，不论商业化的 Paas 平台还是开源的模型，都几乎一抓一大把。一般的，AI 开发过程有以下几步：

特征分析
数据采集
数据标注
模型训练
模型推理

推理可以在云端也可以在客户端，端云各有各的场景，比如一般把人脸检测放到客户端，把人脸识别放到云端。本系列我们主要介绍视觉方向模型推理的工程实践。

2. 项目介绍

我们基于谷歌开源项目 mediapipe 提供的的模型，在客户端部署运行进行推理，mediapipe 提供了一下能力：

人脸检测(Face Detection)
三维人脸网络模型(Face Mesh)
虹膜检测(Iris)
手势(Hands)
姿态(Pose)
全身姿态(Holistic)
头发分隔(Hair Segmentation)
对象检测(Object Detection)
物体追踪(Box Tracking)
即时移动检测(Instant Motion Tracking)
Objectron
KNIFT
...

mediapipe 提供了bazel build -c opt --config=android_arm64 mediapipe/examples/android/src/java/com/google/mediapipe/apps/handtrackinggpu:handtrackinggpu编译出来即可运行。我们这里移动端开发框架我们基于开源项目https://github.com/terryky/android_tflite，该项目用来使用 Android NDK 运行和测量 TensorFlow Lite GPU Delegate 的性能。整体基于 NativeActivity 框架在进行摄像头采集后画面渲染和性能数据渲染。本文我们跑通实时人脸识别模型。移动端开发框架我们基于开源项目https://github.com/terryky/android_tflite，该项目用来使用 Android NDK 运行和测量 TensorFlow Lite GPU Delegate 的性能。整体基于 NativeActivity 框架在进行摄像头采集后画面渲染和性能数据渲染。本文我们跑通实时人脸识别模型。

3. 了解 NativeActivity

NativeActivity 是为单独使用 C|C++开发 app 提供的基类。纯 C++开发 Android 应用，最后还是需要一个 Java 层的壳子，在 Android 提供的开发框架中，已经使用 java 开发好了一个中间类，我们使用 C++开发的 Native 库之所以能运行，就是因为被这个中间类使用 JNI 的方式调用了，这个中间类就是 NativeActivity。这个 NativeActivity 类的核心功能，就是在特定事件发生时，调用我们使用 C++开发的 Native 库里的回调函数。比如在我们熟悉的生命周期函数 NativeActivity.onStart 中，调用 C++开发的 Native 库的 onStartNative 函数：

protected void onStart() {        super.onStart();        onStartNative(mNativeHandle);}

复制代码

Native 层 Android 为我们提供了两个接口：

native_activity.h
android_native_app_glue.h

android_native_app_glue.h 封装了 native_activity.h，我们直接实现void android_main(struct android_app* state)方法即可。

NativeActivity 更多具体信息可以参考 Android 官方文档：GameActivity | Android 开发者 | Android Developers 。

4. 运行模型

我们选择的模型：https://storage.googleapis.com/mediapipe-assets/face_detection_short_range.tflite。运行模型一般我们有以下几个步骤：

加载模型；
摄像头预览纹理转换为 RGBA
将图像数据 feed 到模型引擎进行推理
解析渲染结果

4.1 加载模型

首先我们要将模型文件读取到内存，我们的模型文件放置在 Android 工程的 asset 路径下，将文件加载到内存std::vector<uint8_t> m_tflite_model_buf;：

boolasset_read_file (AAssetManager *assetMgr, char *fname, std::vector<uint8_t>&buf) {    AAsset* assetDescriptor = AAssetManager_open(assetMgr, fname, AASSET_MODE_BUFFER);    if (assetDescriptor == NULL)    {        return false;    }
    size_t fileLength = AAsset_getLength(assetDescriptor);
    buf.resize(fileLength);    int64_t readSize = AAsset_read(assetDescriptor, buf.data(), buf.size());
    AAsset_close(assetDescriptor);
    return (readSize == buf.size());}

asset_read_file (m_app->activity->assetManager,                    (char *)BLAZEFACE_MODEL_PATH, m_tflite_model_buf);

复制代码

tflite 提供了FlatBufferModel::BuildFromBuffer加载模型，返回tflite::FlatBufferModel类型的指针：

std::unique_ptr<tflite::FlatBufferModel> model = FlatBufferModel::BuildFromBuffer(model_buf, model_size)

复制代码

加载完模型，通过模型创建推理引擎解释器tflite::Interpreter,tflite 提供了InterpreterBuilder工具来构建tflite::Interpreter：

class InterpreterBuilder { public:  InterpreterBuilder(const FlatBufferModel& model,                     const OpResolver& op_resolver);

复制代码

需要传入模型 model 及 OpResolver，OpResolver 是个抽象接口，返回给定操作码或自定义操作名的 tflite 注册器。这是将 flatbuffer 模型中引用的操作被映射到可执行函数指针(TfLiteRegistrations)的机制。InterpreterBuilder重载了括号操作符：

TfLiteStatus operator()(std::unique_ptr<Interpreter>* interpreter);TfLiteStatus operator()(std::unique_ptr<Interpreter>* interpreter,                          int num_threads);

复制代码

构建完InterpreterBuilder后创建tflite::Interpreter:

std::unique_ptr<tflite::FlatBufferModel> model;std::unique_ptr<tflite::Interpreter>     interpreter;tflite::ops::builtin::BuiltinOpResolver  resolver;InterpreterBuilder(*model, resolver)(&interpreter)

复制代码

InterpreterBuilder重载的括号操作符有两个，第二个有个线程数量的参数，我们也可以通过tflite::Interpreter的SetNumThreads手动设置：

    int num_threads = std::thread::hardware_concurrency();    char *env_tflite_num_threads = getenv ("FORCE_TFLITE_NUM_THREADS");    if (env_tflite_num_threads)    {        num_threads = atoi (env_tflite_num_threads);        DBG_LOGI ("@@@@@@ FORCE_TFLITE_NUM_THREADS=%d\n", num_threads);    }    DBG_LOG ("@@@@@@ TFLITE_NUM_THREADS=%d\n", num_threads);    interpreter->SetNumThreads(num_threads);

复制代码

接下来分配 tensor 空间：

  // Update allocations for all tensors. This will redim dependent tensors  // using the input tensor dimensionality as given. This is relatively  // expensive. This *must be* called after the interpreter has been created  // and before running inference (and accessing tensor buffers), and *must be*  // called again if (and only if) an input tensor is resized. Returns status of  // success or failure.  TfLiteStatus AllocateTensors();

复制代码

接下来解析引擎获取模型配置（主要是输入输出张量）：

inttflite_get_tensor_by_name (std::unique_ptr<tflite::Interpreter> interpreter, int io, const char *name, tflite_tensor_t *ptensor){    memset (ptensor, 0, sizeof (*ptensor));
    int tensor_idx;    int io_idx = -1;    int num_tensor = (io == 0) ? interpreter->inputs ().size() :                                 interpreter->outputs().size();
    for (int i = 0; i < num_tensor; i ++)    {        tensor_idx = (io == 0) ? interpreter->inputs ()[i] :                                 interpreter->outputs()[i];
        const char *tensor_name = interpreter->tensor(tensor_idx)->name;        if (strcmp (tensor_name, name) == 0)        {            io_idx = i;            break;        }    }
    if (io_idx < 0)    {        DBG_LOGE ("can't find tensor: \"%s\"\n", name);        return -1;    }
    void *ptr = NULL;    TfLiteTensor *tensor = interpreter->tensor(tensor_idx);    switch (tensor->type)    {    case kTfLiteUInt8:        ptr = (io == 0) ? interpreter->typed_input_tensor <uint8_t>(io_idx) :                          interpreter->typed_output_tensor<uint8_t>(io_idx);        break;    case kTfLiteFloat32:        ptr = (io == 0) ? interpreter->typed_input_tensor <float>(io_idx) :                          interpreter->typed_output_tensor<float>(io_idx);        break;    case kTfLiteInt64:        ptr = (io == 0) ? interpreter->typed_input_tensor <int64_t>(io_idx) :                          interpreter->typed_output_tensor<int64_t>(io_idx);        break;    default:        DBG_LOGE ("ERR: %s(%d)\n", __FILE__, __LINE__);        return -1;    }
    ptensor->idx    = tensor_idx;    ptensor->io     = io;    ptensor->io_idx = io_idx;    ptensor->type   = tensor->type;    ptensor->ptr    = ptr;    ptensor->quant_scale = tensor->params.scale;    ptensor->quant_zerop = tensor->params.zero_point;
    for (int i = 0; (i < 4) && (i < tensor->dims->size); i ++)    {        ptensor->dims[i] = tensor->dims->data[i];    }
    return 0;}
static tflite_tensor_t      s_detect_tensor_input;static tflite_tensor_t      s_detect_tensor_scores;static tflite_tensor_t      s_detect_tensor_bboxes;
tflite_get_tensor_by_name (&s_detect_interpreter, 0, "input",          &s_detect_tensor_input);tflite_get_tensor_by_name (&s_detect_interpreter, 1, "regressors",     &s_detect_tensor_bboxes);tflite_get_tensor_by_name (&s_detect_interpreter, 1, "classificators", &s_detect_tensor_scores);

复制代码

根据模型配置可以读取支持输入图片宽高：

int det_input_w = s_detect_tensor_input.dims[2];int det_input_h = s_detect_tensor_input.dims[1];

复制代码

4.2 摄像头预览纹理转换为 RGBA

将摄像头读取的纹理数据转换成 RGBA 模型才能识别，我们将纹理转换为内存数据：

    unsigned char *buf_ui8 = NULL;    static unsigned char *pui8 = NULL;
    if (pui8 == NULL)        pui8 = (unsigned char *)malloc(w * h * 4);
    buf_ui8 = pui8;
    draw_2d_texture_ex (srctex, 0, win_h - h, w, h, RENDER2D_FLIP_V);
    glPixelStorei (GL_PACK_ALIGNMENT, 4);    glReadPixels (0, 0, w, h, GL_RGBA, GL_UNSIGNED_BYTE, buf_ui8);

复制代码

需要想将摄像头读取的纹理绘制到帧缓存区，再通过 OpenGL 函数 glReadPixels 将纹理读取到内存缓存。

「注意：glReadPixels 是耗时操作」

4.3 将图像数据 feed 到模型引擎进行推理

先通过上面获取的引起输入张量 s_detect_tensor_input 获取引起分配的输入缓存：

void *get_blazeface_input_buf (int *w, int *h){    *w = s_detect_tensor_input.dims[2];    *h = s_detect_tensor_input.dims[1];    return s_detect_tensor_input.ptr;}

复制代码

将上面获取的图片内容转换成 float，赋给输入张量：

float mean = 128.0f;    float std  = 128.0f;    for (y = 0; y < h; y ++)    {        for (x = 0; x < w; x ++)        {            int r = *buf_ui8 ++;            int g = *buf_ui8 ++;            int b = *buf_ui8 ++;            buf_ui8 ++;          /* skip alpha */            *buf_fp32 ++ = (float)(r - mean) / std;            *buf_fp32 ++ = (float)(g - mean) / std;            *buf_fp32 ++ = (float)(b - mean) / std;        }    }

复制代码

4.4 解析渲染结果

接下来调用解释器的Invoke()方法执行推理：

    if (interpreter->Invoke() != kTfLiteOk)    {        DBG_LOGE ("ERR: %s(%d)\n", __FILE__, __LINE__);        return -1;    }

复制代码

接下来解析检测结果：

static intdecode_bounds (std::list<face_t> &face_list, float score_thresh, int input_img_w, int input_img_h){    face_t face_item;    float  *scores_ptr = (float *)s_detect_tensor_scores.ptr;
    int i = 0;    for (auto itr = s_anchors.begin(); itr != s_anchors.end(); i ++, itr ++)    {        fvec2 anchor = *itr;        float score0 = scores_ptr[i];        float score = 1.0f / (1.0f + exp(-score0));
        if (score > score_thresh)        {            float *p = get_bbox_ptr (i);
            /* boundary box */            float sx = p[0];            float sy = p[1];            float w  = p[2];            float h  = p[3];
            float cx = sx + anchor.x;            float cy = sy + anchor.y;
            cx /= (float)input_img_w;            cy /= (float)input_img_h;            w  /= (float)input_img_w;            h  /= (float)input_img_h;
            fvec2 topleft, btmright;            topleft.x  = cx - w * 0.5f;            topleft.y  = cy - h * 0.5f;            btmright.x = cx + w * 0.5f;            btmright.y = cy + h * 0.5f;
            face_item.score    = score;            face_item.topleft  = topleft;            face_item.btmright = btmright;
            /* landmark positions (6 keys) */            for (int j = 0; j < kFaceKeyNum; j ++)            {                float lx = p[4 + (2 * j) + 0];                float ly = p[4 + (2 * j) + 1];                lx += anchor.x;                ly += anchor.y;                lx /= (float)input_img_w;                ly /= (float)input_img_h;
                face_item.keys[j].x = lx;                face_item.keys[j].y = ly;            }
            face_list.push_back (face_item);        }    }    return 0;}

复制代码

face_t 封装了识别结果中的得分、左上、右下坐标：

typedef struct _face_t{    float score;    fvec2 topleft;    fvec2 btmright;    fvec2 keys[kFaceKeyNum];} face_t;

复制代码

通过坐标我们可以在识别到的“人脸”上绘制一个框：

5. 总结

本文介绍了常见的 AI 开发步骤，以及常用的 AI 视觉应用。通过人脸检测功能，了解了 tensorflow lite 加载模型、输入数据、执行推理、获取结果等常用接口。

发布于: 刚刚阅读数: 2

原文链接:【http://xie.infoq.cn/article/e7b83607aa5bee2b2781ceb2f】。文章转载请联系作者。

轻口味

关注

🏆2021年InfoQ写作平台-签约作者 🏆 2017-10-17 加入

Android、音视频、AI相关领域从业者。欢迎加我微信wodekouwei拉您进InfoQ音视频沟通群邮箱：qingkouwei@gmail.com

发布

暂无评论

创作场景

AI 视觉实战 1：实时人脸检测

1. 背景

2. 项目介绍

3. 了解 NativeActivity

4. 运行模型

4.1 加载模型

4.2 摄像头预览纹理转换为 RGBA

4.3 将图像数据 feed 到模型引擎进行推理

4.4 解析渲染结果

5. 总结

轻口味

评论