cstdio 的源码学习分析 10- 格式化输入输出函数 fprintf--- 宏定义 / 辅助函数分析 06

作者：桑榆

2022-10-18
广东
本文字数：5496 字
阅读完需：约 1 分钟

cstdio 中的格式化输入输出函数

fprintf 函数的实现 vfprintf 中包含了相当多的宏定义和辅助函数，接下来我们一起来分析一下它们对应的源码实现。

函数逻辑分析---vfprintf

2.宏定义/辅助函数分析

(19)printf_positional---打印特定位置的信息

①函数入参分析

FILE *s：文件流对象
const CHAR_T *format：输入的 format 参数
int readonly_format：标识输入的 format 参数是否只读，1 表示只读，-1 表示可读可写，0 表示未知

  /* 1 if format is in read-only memory, -1 if it is in writable memory,     0 if unknown.  */  int readonly_format = 0;

复制代码

va_list ap：当前不定参数列表，指向下一个待解析的参数
va_list *ap_save：原始不定参数列表指针，指向第一个不定参数
int done：当前已输出的字符数
int nspecs_done：当前已处理的特殊类型数量，即已经处理几个占位符的打印
const UCHAR_T *lead_str_end：format 字符串无需解析部分的末尾，要么指向 format 中第一个'%'，要么指向'\0'
CHAR_T *work_buffer：局部 buffer 变量，保存当前生成的结果，大小为 WORK_BUFFER_SIZE 个 CHAR_T，一般配合使用的还有 CHAR_T *workend，指向 work_buffer 末尾的下一个元素，类似 STL 中的 end()概念。
int save_errno：保存在此操作之前的 errno 值

  /* For the %m format we may need the current `errno' value.  */  int save_errno = errno;

复制代码

const char *grouping：描述数字组大小的字符串，即多少个数字为一组进行打印，通常我们会看到国外会将数字三个一组进行打印，这里可以参考https://xie.infoq.cn/article/83adb9a58cfdb38c337d50e9a

中的类型定义。

grouping---指定形成每个组的位数，对于非货币量，这些数字将由千分位分隔符分隔。举个例子，比如 thousand_sep = ','，当我们表示 1000000 时： grouping = '\3'，那就表示为'1,000,000'; grouping = '\1\2\3'，那就表示为'1,000,00,0'; grouping = '\3\1'，那就表示为'1,0,0,0,000'。

THOUSANDS_SEP_T thousands_sep：千位分隔符，即三位表示时，中间的间隔符

thousands_sep---用于分隔非货币量小数点左侧的数字组的分隔符；

unsigned int mode_flags：vfprintf 输入的 printf 打印格式
返回值：done，当前输出的字符数

  /* Hand off processing for positional parameters.  */do_positional:  done = printf_positional (s, format, readonly_format, ap, &ap_save,                                                                                                    done, nspecs_done, lead_str_end, work_buffer,                save_errno, grouping, thousands_sep, mode_flags);                /* Handle positional format specifiers.  */static intprintf_positional (FILE *s, const CHAR_T *format, int readonly_format,           va_list ap, va_list *ap_savep, int done, int nspecs_done,           const UCHAR_T *lead_str_end,           CHAR_T *work_buffer, int save_errno,           const char *grouping, THOUSANDS_SEP_T thousands_sep,           unsigned int mode_flags){

复制代码

②.整体流程分析

申请 specsbuf 用于处理后续每一个占位符对应的特殊类型
申请 argsbuf 用于缓存不定参数的信息
一些局部变量的申请，如 nspecs，nargs
根据 locale 信息初始化 grouping 和 thousands_sep 信息，便于后续打印数据
遍历 format 字符串中每一个占位符，初始化 specs，specs_limit，nargs
校正 nargs 值
根据上面的信息初始化 argsbuf.data 的数据，填充数据与 mode_flags 相关
遍历所有的类型进行类型信息和 size 大小填充
知道类型后，填充数据，这时需要按顺序解析参数
遍历每一个参数，然后进行打印处理
最后 all_done 处理，释放 buffer 资源

中间的细节部分需要详细分析，我们专门开文进行讲解

static int printf_positional (...){    // 申请specsbuf用于处理后续每一个占位符对应的特殊类型    ...    // 申请argsbuf用于缓存不定参数的信息    ...    // 一些局部变量的申请，如nspecs，nargs    ...    // 根据locale信息初始化grouping和thousands_sep信息，便于后续打印数据    if (grouping == (const char *) -1)    {    ...    }    // 遍历format字符串中每一个占位符，初始化specs，specs_limit，nargs      for (const UCHAR_T *f = lead_str_end; *f != L_('\0');       f = specs[nspecs++].next_fmt)    {        ...        specs = specsbuf.data;        specs_limit = specsbuf.length / sizeof (specs[0]);        ...        nargs += __parse_one_specmb (f, nargs, &specs[nspecs], &max_ref_arg);    }    // 校正nargs值    nargs = MAX (nargs, max_ref_arg);    // 根据上面的信息初始化argsbuf.data的数据，填充数据与mode_flags相关    {        ...        memset (args_type, (mode_flags & PRINTF_FORTIFY) != 0 ? '\xff' : '\0',        nargs * sizeof (*args_type));    }    // 遍历所有的类型进行类型信息和size大小填充      /* Fill in the types of all the arguments.  */    for (cnt = 0; cnt < nspecs; ++cnt)    {        ...        args_type[specs[cnt].data_arg] = specs[cnt].data_arg_type;        args_size[specs[cnt].data_arg] = specs[cnt].size;        ...    }    // 知道类型后，填充数据，这时需要按顺序解析参数     /* Now we know all the types and the order.  Fill in the argument     values.  */    for (cnt = 0; cnt < nargs; ++cnt)    switch (args_type[cnt])      {          ...          args_value[cnt].pa_double = va_arg (*ap_savep, double);          ...      }    // 遍历每一个参数，然后进行处理      /* Now walk through all format specifiers and process them.  */  for (; (size_t) nspecs_done < nspecs; ++nspecs_done)    {        ...    }    // 最后all_done处理，释放buffer资源     all_done:  scratch_buffer_free (&argsbuf);  scratch_buffer_free (&specsbuf);  return done;}

复制代码

(20)group_number---对数字进行分组

通过使用案例，我们其实可以很清晰地看出，将 number.word 转换为字符串 string 后，将 work_buffer 首地址作为 front_ptr，string 作为 w，workend 作为 rear_ptr，调用了 group_number，最后返回值赋值给到字符串 string。

实际上做的操作是这样的，将字符串“1000000“转换为"1,000,000"，假定规则为 3 个为一组，间隔符为','。

①.函数入参分析

CHAR_T *front_ptr：空闲 buffer 开始地址，到 w 截止
CHAR_T *w：数字字符串开始地址，到 rear_ptr 截止
CHAR_T *rear_ptr：整块 work_buffer 的末尾的下一个元素
const char *grouping：分组信息

grouping---指定形成每个组的位数，对于非货币量，这些数字将由千分位分隔符分隔。举个例子，比如 thousand_sep = ','，当我们表示 1000000 时： grouping = '\3'，那就表示为'1,000,000'; grouping = '\1\2\3'，那就表示为'1,000,00,0'; grouping = '\3\1'，那就表示为'1,0,0,0,000'。

THOUSANDS_SEP_T thousands_sep：分组间隔符

thousands_sep---用于分隔非货币量小数点左侧的数字组的分隔符；

返回值：返回分组之后的字符串开始地址，到 rear_ptr 截止

          /* Put the number in WORK.  */                                                                                                                           string = _itoa_word (number.word, workend, base,                               spec == L_('X'));          if (group && grouping)            string = group_number (work_buffer, string, workend,                                   grouping, thousands_sep);
/* Group the digits from W to REAR_PTR according to the grouping rules   of the current locale.  The interpretation of GROUPING is as in   `struct lconv' from <locale.h>.  The grouped number extends from   the returned pointer until REAR_PTR.  FRONT_PTR to W is used as a   scratch area.  */static CHAR_T *group_number (CHAR_T *front_ptr, CHAR_T *w, CHAR_T *rear_ptr,                                                                                                      const char *grouping, THOUSANDS_SEP_T thousands_sep){

复制代码

②.函数逻辑分析

初始化局部参数以及异常检测

tlen 初始化为 thousands_sep 的长度，当然，这是针对宽字符的情况使用；
如果 grouping 字符串的第一个字符异常，等于 CHAR_MAX 或着是负数，这时就不需要看后面的信息了，直接返回 w 指针，不进行分组处理；
len 被赋值为第一个组的宽度，即 grouping 的第一个字符

细心的读者可能会注意到，这里不应该是第二个字符吗？因为前文说 grouping = '\3'，注意这里有一个背景，我们使用的是 Glibc 库的实现，我们来看看这里面的说明：
// glibc/locale/locale.h
/* Structure giving information about numeric and monetary notation. */
struct lconv
{
...
/* Each element is the number of digits in each group;
elements with higher indices are farther left.
An element with value CHAR_MAX means that no further grouping is done.
An element with value 0 means that the previous element is used
for all groups farther left. */
char *grouping;
...
};
这里就没有说明用引号间隔开来，而是 grouping="321"，这样的实现，所以我们的代码也是这样的

  /* Length of the current group.  */  int len;#ifndef COMPILE_WPRINTF  /* Length of the separator (in wide mode, the separator is always a     single wide character).  */  int tlen = strlen (thousands_sep);#endif
  /* We treat all negative values like CHAR_MAX.  */
  if (*grouping == CHAR_MAX || *grouping <= 0)    /* No grouping should be done.  */    return w;
  len = *grouping++;

复制代码

拷贝处理避免数据被覆盖

将源数据[w,rear_ptr]拷贝到[front_ptr,s]，因为我们最终要返回的数据一定会覆盖源数据，所以需要拷贝出来，然后将返回的字符指针指向 rear_ptr，准备开始进行遍历

  /* Copy existing string so that nothing gets overwritten.  */  memmove (front_ptr, w, (rear_ptr - w) * sizeof (CHAR_T));  CHAR_T *s = front_ptr + (rear_ptr - w);    w = rear_ptr;

复制代码

遍历处理

开始循环时，w 指向 rear_ptr，s 指向数字字符串的最后字符的下一个位置；

*--w = *--s，先各自减一，然后拷贝字符；

每完成一次拷贝，--len 直到 0，这时我们需要先输出分组间隔符 thousands_sep，然后更新 len 信息

  /* Process all characters in the string.  */  while (s > front_ptr)    {      *--w = *--s;      if (--len == 0 && s > front_ptr)    {                                                                                                                                                          /* A new group begins.  */     ...     }     } return w；}

复制代码

插入分组间隔符

第一种情况(标准字符)：

如果 w!=s，说明现在的拷贝还没有覆盖到我们正在向前遍历的字符串，那么直接将分组间隔符赋值；

否则说明会导致字符串覆盖，work_buffer 不够大无法处理，跳转到 copy_rest，把剩余的字符[front_ptr,s]拷贝到[w 新，w 旧]，正好使用完所有的 buffer。

第二种情况(宽字符)：

逻辑基本一致，在插入分组间隔符时，由于是宽字符，所以不能直接赋值，需要拷贝 cnt 个字节。

      /* A new group begins.  */#ifdef COMPILE_WPRINTF      if (w != s)        *--w = thousands_sep;      else        /* Not enough room for the separator.  */        goto copy_rest;#else      int cnt = tlen;      if (tlen < w - s)        do          *--w = thousands_sep[--cnt];        while (cnt > 0);      else        /* Not enough room for the separator.  */        goto copy_rest;#endif
        copy_rest:          /* No further grouping to be done.  Copy the rest of the         number.  */          w -= s - front_ptr;          memmove (w, front_ptr, (s - front_ptr) * sizeof (CHAR_T));          break;

复制代码

更新 group 信息

如果 grouping 字符串的下一个信息是 CHAR_MAX 或小于 0，那就 copy_rest，拷贝剩余的字符，不再进行分组了；

如果 grouping 字符串的下一个信息不是'\0'，即字符串类型为 grouping=‘321’这样的，那就将信息赋值给 len，然后 grouping 指针++；

其他情况，即 grouping 字符串的下一个信息是'\0'或其他情况，字符串类型为 grouping=‘3’这样的，就将 len 赋值为 grouping[-1]，即前一个数据。

grouping[-1] 等价于*(grouping+(-1))，即*(grouping-1)

      if (*grouping == CHAR_MAX#if CHAR_MIN < 0           || *grouping < 0#endif           )        {        copy_rest:          /* No further grouping to be done.  Copy the rest of the         number.  */          w -= s - front_ptr;          memmove (w, front_ptr, (s - front_ptr) * sizeof (CHAR_T));          break;        }      else if (*grouping != '\0')        len = *grouping++;      else        /* The previous grouping repeats ad infinitum.  */        len = grouping[-1];

复制代码

发布于: 刚刚阅读数: 3

原文链接:【http://xie.infoq.cn/article/f1d9116feb612ef14445416da】。文章转载请联系作者。

桑榆

关注

北海虽赊，扶摇可接；东隅已逝，桑榆非晚！ 2020-02-29 加入

Android手机厂商-相机软件系统工程师爬山/徒步/Coding

发布

暂无评论

创作场景

cstdio 的源码学习分析 10- 格式化输入输出函数 fprintf--- 宏定义 / 辅助函数分析 06

cstdio 中的格式化输入输出函数

函数逻辑分析---vfprintf

2.宏定义/辅助函数分析

(19)printf_positional---打印特定位置的信息

①函数入参分析

②.整体流程分析

(20)group_number---对数字进行分组

①.函数入参分析

②.函数逻辑分析

桑榆

评论