【音视频】手把手带你实现超实用实时音视频工具

轻口味

关注

发布于: 2021 年 04 月 11 日

基于声网 Agora 视频通话 SDK 实现的“手把手”实时在线教学工具

今天我们基于声网的视频通话 SDK 实现一个超酷的双人视频“手把手”教学工具。具体效果看下面：

效果展示

视频效果：

好像内嵌视频不好用，直接贴上效果视频地址：https://www.bilibili.com/video/BV1YA411L7Hz/

通话过程两个用户 A、B 分别代表老师，学生：

学生用户 A 将手机摄像头对准待请教画面，比如书本页面，仪器画面等
老师用户 B 将画面对准白色画面，比如白墙或者白纸

此时的效果是用户 A、B 看到的画面均为用户 A 摄像头画面。如果用户 B 将自己手指放到位于白墙上的摄像头下，用户 A、B 均看到了用户 B 的手指指到了用户 A 摄像头画面的效果。用户 B 可以基于该视频通话工具指点用户 A 画面内容，达到“手把手”教学效果。

是不是很酷？原理是什么呢？往下看~

核心原理介绍

双人视频通话的过程，会传输自己画面给对方，传统视频通话场景分别在屏幕渲染自己和对方画面，最典型的是微信语音通话的画中画效果。我们在这里将两个画面叠加渲染到一层：

A 用户画面的内容全部渲染
B 用户画面内容剔除白色后渲染到 A 画面上方

最终我们使用 OpenGL shader 的片元作色器实现该效果：

private static final String RGB_RGB_FRAGMENT_SHADER_STRING =      "precision mediump float;\n"          + "varying vec2 interp_tc;\n"          + "\n"          + "uniform sampler2D rgb1_tex;\n"          + "uniform sampler2D rgb2_tex;\n"          + "\n"          + "void main() {\n"          + "  vec4 col1 = texture2D(rgb1_tex, interp_tc);\n"          + "  vec4 col2 = texture2D(rgb2_tex, interp_tc);\n"          + "  gl_FragColor = vec4(col1.r * col2.r," +          "                        col1.g * col2.g," +          "                        col1.b * col2.b," +          "                        col1.a * col2.a);\n"          + "}\n";

复制代码

GLSL(OpenGL 着色语言 OpenGL Shading Language)语法跟 C 语言很类似，这里不对 OpenGL 做全面介绍，先简单介绍 GLSL 语法，帮助读懂上面效果原理。

GLSL 有两种作色器：

顶点作色，直观理解主要处理坐标相关运算
片元作色：直观理解主要处理颜色相关运算

GLSL 限定符有：

< none: default > :局部可读写变量，或者函数的参数
const :编译时常量，或只读的函数参数
attribute: 由应用程序传输给顶点着色器的逐顶点的数据
uniform :在图元处理过程中其值保持不变，由应用程序传输给着色器
varying :由顶点着色器传输给片段着色器中的插值数据

在片元着色器中的输入有：

在光栅化阶段通过插值为每个片段生成的纹理坐标。
用 uniform 修饰的属性, 可以传递在片元着色器需要使用的数据等。
用 sampler2D 修饰的属性, 传递纹理对象，实际上传递的是对应的纹理单元(texture unit)索引号。

片元着色器的常用输出：

gl_FragColor 输出每个片段的颜色。

在片元着色器的常用处理有：

计算颜色
获取纹素
往像素点中填充颜色值

有了上面了解我们再来看我们上面的脚本：

interp_tc 是通用的顶点作色器传入的插值变量
rgb1_tex、rgb2_tex 是外部传入的两个画面的纹理贴图
texture2D 将纹理贴图转换为 RGBA 四个分量的颜色值
gl_FragColor 输出每个片段的颜色，我们将两个画面颜色值做运算输出到 gl_FragColor 内置变量中
我们知道 RGBA 中，全 1 为白色，全 0 位黑色，两个画面的 RGBA 各个分量相乘，B 用户采集的基本为全白画面，全为 1，和 A 用户的 RGBA 分量想乘还是 A 的值，所以最开始双方显示都为 A 用户画面；当有非白色物品（手指）进入 B 画面，此时是双方叠加后的效果。

明白了核心原理后我们怎么实现这么一套工具呢？接着往下看~

实现步骤

我们基于声网的视频通话 SDK 作为双人视频通信工具，声网为开发者提供 10000 分钟/月的免费试用量，足够我们体验我们的工具了。接下来我们一步步实现。

1. 申请声网账号

在官网直接注册账号，并且创建项目，获取 APP ID。下载声网单人视频通话demo并跑通 demo。

2. 声网视频通话 API 介绍

参考视频通话API，主要为我们暴露了 RtcEngine 接口，接下来参考 demo 搭建实时通信环境：

创建引擎:mRtcEngine = RtcEngine.create(getBaseContext(), getString(R.string.agora_app_id), mRtcEventHandler);
配置视频参数:mRtcEngine.setVideoEncoderConfiguration(new VideoEncoderConfiguration(VideoEncoderConfiguration.VD_640x360,VideoEncoderConfiguration.FRAME_RATE.FRAME_RATE_FPS_15,VideoEncoderConfiguration.STANDARD_BITRATE,VideoEncoderConfiguration.ORIENTATION_MODE.ORIENTATION_MODE_FIXED_PORTRAIT));
设置远端画面渲染 View：mRtcEngine.setupRemoteVideo(new VideoCanvas(view, VideoCanvas.RENDER_MODE_HIDDEN, uid));,这里 View 为 SurfaceView
启动本地预览：mRtcEngine.setupLocalVideo(new VideoCanvas(view, VideoCanvas.RENDER_MODE_HIDDEN, 0));，这里 View 为 SurfaceView
加入频道：mRtcEngine.joinChannel(token, "demoChannel1", "Extra Optional Data", 0);

现在已经跑通画中画形式的双人通话。接下来我们要改造为叠加形式的渲染方式。上面的方式使用了声网 SDK 提供的默认采集和渲染，要叠加渲染就要使用到声网 SDK 为我们提供的自采集和自渲染方式 API：

设置自定义的视频源

函数：setVideoSource(IVideoSource source)
说明：实时通信过程中，Agora SDK 会启动默认的视频输入设备，即内置的摄像头，采集视频。但是，如果你需要自定义视频输入设备，你可以先通过 IVideoSource 类自定义视频源，再调用该方法将自定义的视频源加入到 SDK 中。IVideoSource 的 onInitialize 传回 IVideoFrameConsumer，通过不断的调用 IVideoFrameConsumer 中的方法完成采集视频的上传

自定义本地视频渲染器

函数：setLocalVideoRenderer (IVideoSink render)
说明：该方法设置本地视频渲染器。实时通讯过程中，Agora SDK 通常会启动默认的视频渲染器进行视频渲染。当需要自定义视频渲染设备时，App 可以先通过 IVideoSink 自定义渲染器，然后调用该方法将视频渲染器加入到 SDK 中。该方法在加入频道前后都能调用。

自定义远端视频渲染器

函数：setRemoteVideoRenderer(int uid,IVideoSink render )
说明：该方法设置远端视频渲染器。实时通讯过程中，Agora SDK 通常会启动默认的视频渲染器进行视频渲染。当需要自定义视频渲染设备时，App 可以先通过 IVideoSink 自定义渲染器，然后调用该方法将视频渲染器加入到 SDK 中。该方法在加入频道前后都能调用。如果在加入频道前调用，需要自行维护远端用户的 uid。IVideoSink 为我们提供了获取远端画面内容接口：

public interface IVideoSink extends IVideoFrameConsumer {  boolean onInitialize();//初始化回调
  boolean onStart();//启动回调
  void onStop();//停止回调
  void onDispose();//释放渲染器回调
  /**  * 获取 EGLContextHandle。Media Engine 在需要创建 EGLContext 的时候，首先会从自定义的渲染器中查询，是都已经创建了 EGLContext。  * 如果自定义渲染器中已经 创建了并管理 EGL 环境，这个方法就会返回 EGLContext 的 Native Handle 并共享给 Media Engine。  * 如果自定义渲染器中没有创建 EGLContext，会返回 0。  **/  long getEGLContextHandle();
  int getBufferType();//获取buffer类型
  int getPixelFormat();//获取像素格式    //接收 ByteBuffer 类型的视频帧。  void consumeByteBufferFrame(ByteBuffer buffer, int format, int width, int height, int rotation, long timestamp);
  //接收 ByteArray 类型的视频帧。  void consumeByteArrayFrame(byte[] data, int format, int width, int height, int rotation, long timestamp);
  //接收 Texture 类型的视频帧。  void consumeTextureFrame(int textureId, int format, int width, int height, int rotation, long timestamp, float[] matrix);}

复制代码

3. 准备 OpenGL 上下文环境

上面我们注意到自渲染接口 IVideoSink 有一个 getEGLContextHandle 要实现，EGLContext 是什么呢？这里我们简单了解一下 EGL。

EGL 简介

EGL 是 OpenGL 渲染和本地窗口系统(Windows 系统的 Window，Android 中的 SurfaceView 等)之间的一个中间接口层。引入 EGL 就是为了屏蔽不同平台上不同窗口的区别。EGL API 是独立于 OpenGL ES 各版本标准的独立的一套 API，其主要作用是为 OpenGL 指令创建 Context 、绘制目标 Surface 、配置 FrameBuffer 属性、Swap 提交绘制结果等。

EGL 提供如下机制：

与设备原生窗口通信
查询绘制 surface 的可用类型和配置
创建绘制 surface
在 OpenGL ES 3.0 或其他渲染 API 之间同步渲染
管理纹理贴图等渲染资源

使用流程：

创建连接：EGLDisplay eglDisplay(EGLNativeDisplayType displayId);
初始化连接：EGLBoolean eglInitialize(EGLDisplay display, EGLint *majorVersion, EGLint *minorVersion);
获取配置：EGLBoolean eglChooseChofig(EGLDispay display, const EGLint *attribList,EGLConfig *config,EGLint maxReturnConfigs,ELGint *numConfigs );
创建渲染区域：EGLSurface eglCreateWindowSurface(EGLDisplay display,EGLConfig config,EGLNatvieWindowType window,const EGLint *attribList);
创建渲染上下文：EGLContext eglCreateContext(EGLDisplay display,EGLConfig config,EGLContext shareContext,const EGLint* attribList);
关联上下文与渲染区域：EGLBoolean eglMakeCurrent(EGLDisplay display,EGLSurface draw,EGLSurface read,EGLContext context);

EGLContextHandle 维护

自渲染过程中，声网 SDK 通过在解码线程将视频数据绑定到纹理，我们渲染线程拿到 SDK 返回的纹理进行 OpenGL 绘制会产生纹理在不同线程共享的问题，不同线程间共享纹理等需要使用有共同父 EGLContext 的 EGLContext。在这里我们需要在渲染画布 GLSurfaceView onCreated 方法回调的时候拿到 GLSurfaceView 的 EGLContext 作为根 EGLContext：

public void onSurfaceCreated(GL10 paramGL10, EGLConfig paramEGLConfig) {    _log.i("VideoRendererHelper.onSurfaceCreated");    // Store render EGL context.    synchronized (MixVideoHelper.class) {      if (EglBase14.isEGL14Supported()) {        eglContext = new EglBase14.Context(EGL14.eglGetCurrentContext());      } else {        eglContext = new EglBase10.Context(((EGL10) EGLContext.getEGL()).eglGetCurrentContext());      }
      _log.i("VideoRendererHelper EGL Context: " + eglContext);    }

复制代码

后面会在摄像头采集线程，解码线程，渲染线程间贡献该 EGLContext。

关于 EglBase

EGL 相关资源接口的使用，声网 SDK 已为我们封装好了 EGLBase 相关接口，我们可以拿来主义精神直接使用：

```

public abstract class EglBase {  public static final Object lock = new Object();  public static final int EGL_OPENGL_ES2_BIT = 4;  public static final int EGL_RECORDABLE_ANDROID = 12610;  public static final int[] CONFIG_PLAIN = new int[]{12324, 8, 12323, 8, 12322, 8, 12352, 4, 12344};  public static final int[] CONFIG_RGBA = new int[]{12324, 8, 12323, 8, 12322, 8, 12321, 8, 12352, 4, 12344};  public static final int[] CONFIG_PIXEL_BUFFER = new int[]{12324, 8, 12323, 8, 12322, 8, 12352, 4, 12339, 1, 12344};  public static final int[] CONFIG_PIXEL_RGBA_BUFFER = new int[]{12324, 8, 12323, 8, 12322, 8, 12321, 8, 12352, 4, 12339, 1, 12344};  public static final int[] CONFIG_RECORDABLE = new int[]{12324, 8, 12323, 8, 12322, 8, 12352, 4, 12610, 1, 12344};
  public EglBase() {  }
  public static EglBase create(EglBase.Context sharedContext, int[] configAttributes) {    return (EglBase)(!EglBase14.isEGL14Supported() || sharedContext != null && !(sharedContext instanceof io.agora.rtc.gl.EglBase14.Context) ? new EglBase10((io.agora.rtc.gl.EglBase10.Context)sharedContext, configAttributes) : new EglBase14((io.agora.rtc.gl.EglBase14.Context)sharedContext, configAttributes));  }
  public static EglBase create() {    return create((EglBase.Context)null, CONFIG_PLAIN);  }
  public static EglBase create(EglBase.Context sharedContext) {    return create(sharedContext, CONFIG_PLAIN);  }
  public static EglBase createEgl10(int[] configAttributes) {    return new EglBase10((io.agora.rtc.gl.EglBase10.Context)null, configAttributes);  }
  public static EglBase createEgl10(EGLContext sharedContext, int[] configAttributes) {    return new EglBase10(new io.agora.rtc.gl.EglBase10.Context(sharedContext), configAttributes);  }
  public static EglBase createEgl14(int[] configAttributes) {    return new EglBase14((io.agora.rtc.gl.EglBase14.Context)null, configAttributes);  }
  public static EglBase createEgl14(android.opengl.EGLContext sharedContext, int[] configAttributes) {    return new EglBase14(new io.agora.rtc.gl.EglBase14.Context(sharedContext), configAttributes);  }
  public abstract void createSurface(Surface surface);
  public abstract void createSurface(SurfaceTexture surfaceTexture);
  public abstract void createDummyPbufferSurface();
  public abstract void createPbufferSurface(int width, int height);
  public abstract EglBase.Context getEglBaseContext();
  public abstract boolean hasSurface();
  public abstract int surfaceWidth();
  public abstract int surfaceHeight();
  public abstract void releaseSurface();
  public abstract void release();
  public abstract void makeCurrent();
  public abstract void detachCurrent();
  public abstract void swapBuffers();
  public abstract void swapBuffers(long presentationTimeStampNs);
  public interface Context {    long getNativeEglContext();  }}

复制代码

```

4. 创建 VideoSource

自采集视频最简单的是实现从摄像头采集，我们要实现功能更强大的还可以实现对屏幕的采集，画面中绘制内容的采集，这里抽象基础的 VideoSource 便于后面扩展：

abstract public class VideoSource implements IVideoSource{  @Override  public boolean onInitialize(IVideoFrameConsumer consumer) {    _log.i("onInitialize:" + this);    this.mConsumer = new WeakReference(consumer);    return true;  }
  @Override  public boolean onStart() {    _log.i("onStart:" + this);    if (cameraThread == null) {      cameraThread = new HandlerThread(CameraVideoSource.class.getName());      cameraThread.start();    }    if (cameraHandler == null) {      cameraHandler = new Handler(cameraThread.getLooper());    }    startCamera(videoSize.width, videoSize.height);    return true;  }}

复制代码

采集及渲染都是通过 OpenGL 方式，OpenGL 线程间共享纹理要使用共享的 OpenGL 上下文环境。在自采集时我们创建 camera 线程，camera 线程中打开摄像头，camera 设置预览纹理：camera.setPreviewTexture(getSurfaceTexture());

 public static SurfaceTextureHelper create(final String threadName, final EglBase.Context sharedContext, final Object lock) {    HandlerThread thread = new HandlerThread(threadName);    thread.start();    final Handler handler = new Handler(thread.getLooper());    return (SurfaceTextureHelper) ThreadUtils.invokeAtFrontUninterruptibly(handler, new Callable<SurfaceTextureHelper>() {      public SurfaceTextureHelper call() {        try {          return new SurfaceTextureHelper(sharedContext, handler, lock);        } catch (RuntimeException var2) {          Log.e("SurfaceTextureHelper", threadName + " create failure", var2);          return null;        }      }    });  }

复制代码

摄像头预览数据不断的渲染到 SurfaceTexture 封装的纹理，并调用 setOnFrameAvailableListener 设置的 onFrameAvailable 方法，最终调用 IVideoFrameConsumer 的 consumeTextureFrame 方法。

5. 创建自定义渲染

创建 YuvImageRenderer 显示 IVideoSink 接口；
我们采用纹理方式，所以主要实现 consumeTextureFrame 方法
暂存当前纹理，通知屏幕渲染线程渲染
屏幕渲染线程将本地画满纹理与远程画面纹理通过上面着色器渲染作色

public static class YuvImageRenderer implements IVideoSink{
    /**     * 声网接口回调     */    @Override    public void consumeTextureFrame(int texId, int format, int width, int height, int rotation,        long ts, float[] matrix) {      //_log.d("ID: " + id + ". consumeTextureFrame:" + texId);      /**       * 向pendingFrame队列插入数据       */      renderFrame(new MixVideoHelper.MyI420Frame(width, height, rotation, texId, matrix));    }    /**     * 向pendingFrame队列插入数据,并且调用GLSurfaceView的requestRender通知刷新消费pendingFrame     */    public synchronized void renderFrame(MyI420Frame frame) {      if (surface == null) {        // This object has been released.        renderFrameDone(frame);        return;      }      if (renderFrameThread == null) {        renderFrameThread = Thread.currentThread();      }      if (!seenFrame && rendererEvents != null) {        _log.i("ID: " + id + ". Reporting first rendered frame.");        rendererEvents.onFirstFrameRendered();      }      framesReceived++;      synchronized (pendingFrameLock) {        // Check input frame parameters.        if (frame.yuvFrame) {          if (frame.yuvStrides[0] < frame.width || frame.yuvStrides[1] < frame.width / 2              || frame.yuvStrides[2] < frame.width / 2) {            _log.i("Incorrect strides " + frame.yuvStrides[0] + ", " + frame.yuvStrides[1]                + ", " + frame.yuvStrides[2]);            renderFrameDone(frame);            return;          }        }
        if (pendingFrame != null) {          // Skip rendering of this frame if previous frame was not rendered yet.          framesDropped++;          renderFrameDone(frame);          seenFrame = true;          return;        }        pendingFrame = frame;      }      setSize(frame.width, frame.height, frame.rotationDegree);      seenFrame = true;
      // Request rendering.      surface.requestRender();    }}

复制代码

总结

今天我们实现了基于声网视频通话 SDK 的通话过程中画面叠加效果的“手把手”教学工具，文章中我们介绍了：

EGL 相关知识
OpenGL GLSL 相关知识
声网视频通话自采集自渲染相关接口

后续逐步介绍共享屏幕内容等实现，希望对大家喜欢~~~

发布于: 2021 年 04 月 11 日阅读数: 156

原文链接:【http://xie.infoq.cn/article/f801c15daabfa831d85f89649】。文章转载请联系作者。

轻口味

关注

Android音视频、AI相关领域 2017.10.17 加入

Android多媒体开发从业者~

发布

暂无评论

创作场景

【音视频】手把手带你实现超实用实时音视频工具

基于声网 Agora 视频通话 SDK 实现的“手把手”实时在线教学工具

效果展示

核心原理介绍

实现步骤

1. 申请声网账号

2. 声网视频通话 API 介绍

3. 准备 OpenGL 上下文环境

EGL 简介

EGLContextHandle 维护

关于 EglBase

4. 创建 VideoSource

5. 创建自定义渲染

总结

轻口味

评论