写点什么

CnosDB:深入了解时序数据处理函数

作者:CnosDB
  • 2023-12-24
    内蒙古
  • 本文字数:9240 字

    阅读完需:约 30 分钟

CnosDB:深入了解时序数据处理函数

CnosDB 是一个专注于时序数据处理的数据库,旨在解决时序数据存储与分析问题,为用户提供高效的时序数据管理与查询便利。为了实现这一目标,CnosDB 实现了一系列专用函数,快来和 CC 一起来看看吧!


CnosDB:深入了解时序数据处理函数

stats_agg

用途

stats_agg 函数适用于需要对时序数据进行统计分析的场景,例如计算相关系数和协方差。

并且还可以分别计算每个维度的常见统计数据,例如平均值和标准差。

stats_agg 提供了与 sum,count,corr,covar_pop,stddev 和 stddev_pop 等聚合函数相同的功能,

适用于一条 SQL 中,包含多个分析函数的场景。

函数用法

stats_agg(y, x)
复制代码

参数类型:

  • y: double 类型

  • x: double 类型

返回结果:

{   n: bigint,   -- count   sx: double,  -- sum(x)- sum(x)  sx2: double, -- sum((x-sx/n)^2) (sum of squares)  sx3: double, -- sum((x-sx/n)^3)  sx4: double, -- sum((x-sx/n)^4)  sy: double,  -- sum(y)  sy2: double, -- sum((y-sy/n)^2) (sum of squares)  sy3: double, -- sum((y-sy/n)^3)  sy4: double, -- sum((y-sy/n)^4)  sxy: double, -- sum((x-sx/n)*(y-sy/n)) (sum of products) }
复制代码

示例

想象一下,你有一张表记录了某个测试项目的执行情况,其中 x 表示时间,y 表示对应的测试数据。你想要了解这些测试数据的分布规律,而 stats_agg 就是你的得力助手。

create table if not exists test_stats(x bigint, y bigint);insert into test_stats(time, x, y) values(1, 1, 1),(2, 1, 2),(3, 1, 3),(4, 1, 4),(5, 1, 5),(6, 2, 1),(7, 2, 2),(8, 2, 3),(9, 2, 4),(10, 2, 5);
复制代码


select stats_agg(y, x) from test_stats;
复制代码

 

在这个例子中,stats_agg 会告诉你这些测试数据的数量(n)、时间的总和(sx)、时间的平方和(sx2),以及测试数据的总和(sy)、测试数据的平方和(sy2)等等。

+------------------------------------------------------------------------------------------------------------------------------------------------------------+| stats_agg(test_stats.y,test_stats.x)                                                                                                                       |+------------------------------------------------------------------------------------------------------------------------------------------------------------+| {n: 10, sx: 15.0, sx2: 2.5, sx3: -2.7755575615628914e-16, sx4: 0.6249999999999999, sy: 30.0, sy2: 20.0, sy3: -1.7763568394002505e-15, sy4: 68.0, sxy: 0.0} |+------------------------------------------------------------------------------------------------------------------------------------------------------------+
复制代码

注意事项

输入参数都不为 NULL 时才纳入聚合

gauge_agg

用途

适用于监控系统、物联网应用和金融领域,提供时间段内的数据快照。

函数用法

gauge_agg(time, value)
复制代码

参数类型:

  • time: Timestamp 类型

  • value: DOUBLE 类型

返回结果:

{   first: { ts: Timestamp, value: Double },  second: { ts: Timestamp, value: Double },  penultimate: { ts: Timestamp, value: Double },  last: { ts: Timestamp, value: Double },  num_elements: Bigint Unsingned}
复制代码

示例

这个 SQL 查询使用了 gauge_agg 函数,目的是对空气质量数据按月进行聚合,提供每个月内的压力数据快照。

SELECT gauge_agg(time, pressure) FROM air GROUP BY date_trunc('month', time);
复制代码

这条查询的目的是从名为 "air" 的数据表中选取时间 (time) 和压力 (pressure) 列的数据,并按照每月的时间戳 (date_trunc('month', time)) 进行分组。最后,对每个月的数据应用 gauge_agg 函数。

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| gauge_agg(air.time,air.pressure)                                                                                                                                                                                |+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| {first: {ts: 2023-03-01T00:00:00, val: 54.0}, second: {ts: 2023-03-01T00:00:00, val: 59.0}, penultimate: {ts: 2023-03-14T16:00:00, val: 55.0}, last: {ts: 2023-03-14T16:00:00, val: 80.0}, num_elements: 13122} || {first: {ts: 2023-01-14T16:00:00, val: 63.0}, second: {ts: 2023-01-14T16:00:00, val: 68.0}, penultimate: {ts: 2023-01-31T23:57:00, val: 77.0}, last: {ts: 2023-01-31T23:57:00, val: 54.0}, num_elements: 16640} || {first: {ts: 2023-02-01T00:00:00, val: 54.0}, second: {ts: 2023-02-01T00:00:00, val: 60.0}, penultimate: {ts: 2023-02-28T23:57:00, val: 74.0}, last: {ts: 2023-02-28T23:57:00, val: 59.0}, num_elements: 26880} |+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
复制代码

上面的数据,每行结果表示一个月的数据聚合结果。

first 到 last 分别表示每个月内的第一、第二、倒数第二和最后一条数据的时间戳和压力数值。

num_elements 表示该月的数据点数量。

compact_state_agg/state_agg

用途

用于跟踪状态的持续时间,适用于设备健康监控、事件统计等。

函数用法

compact_state_agg(time, state)state_agg(time, state)
复制代码

参数类型:

  • time: Timestamp 类型

  • state: 任意类型

返回结果:

{   state_duration: [    { state: Any, interval: Duration },    ...  ],  state_periods: [    {      state: Any,      periods: [        { start_time: Timestamp, end_time: Timestamp },        ...      ]     },    ...  ]}
复制代码

示例

在一个名为 states 的表中插入了一些包含时间戳和状态信息的数据。然后,通过使用两个不同的 SQL 查询来调用 compact_state_agg 和 state_agg 函数。

CREATE TABLE IF NOT EXISTS states(state STRING);
INSERT INTO states VALUES('2020-01-01 10:00:00', 'starting'),('2020-01-01 10:30:00', 'running'),('2020-01-03 16:00:00', 'error'),('2020-01-03 18:30:00', 'starting'),('2020-01-03 19:30:00', 'running'),('2020-01-05 12:00:00', 'stopping');
复制代码

compact_state_agg 示例

SELECT compact_state_agg(time, state) FROM states;
复制代码

返回的结果是一个 JSON 格式的对象,其中包含了不同状态的总持续时间 (state_duration)。在这个示例中,它显示了每个状态的总持续时间,但没有提供每个状态的具体切换周期。

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| compact_state_agg(states.time,states.state)                                                                                                                                                                                                                                                                                                                                          |+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| {state_duration: [{state: running, duration: 0 years 0 mons 3 days 22 hours 0 mins 0.000000000 secs}, {state: error, duration: 0 years 0 mons 0 days 2 hours 30 mins 0.000000000 secs}, {state: starting, duration: 0 years 0 mons 0 days 1 hours 30 mins 0.000000000 secs}, {state: stopping, duration: 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs}], state_periods: []} |+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
复制代码

state_agg 示例

SELECT state_agg(time, state) FROM states;
复制代码

返回的结果同样是一个 JSON 格式的对象,其中包含了每个状态的总持续时间 (state_duration) 和每个状态的切换周期 (state_periods)。在这个示例中,state_periods 列表显示了每个状态的具体切换周期,包括开始时间和结束时间。

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| state_agg(states.time,states.state)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| {state_duration: [{state: starting, duration: 0 years 0 mons 0 days 1 hours 30 mins 0.000000000 secs}, {state: running, duration: 0 years 0 mons 3 days 22 hours 0 mins 0.000000000 secs}, {state: stopping, duration: 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs}, {state: error, duration: 0 years 0 mons 0 days 2 hours 30 mins 0.000000000 secs}], state_periods: [{state: starting, periods: [{start_time: 2020-01-01T10:00:00, end_time: 2020-01-01T10:30:00}, {start_time: 2020-01-03T18:30:00, end_time: 2020-01-03T19:30:00}]}, {state: error, periods: [{start_time: 2020-01-03T16:00:00, end_time: 2020-01-03T18:30:00}]}, {state: running, periods: [{start_time: 2020-01-01T10:30:00, end_time: 2020-01-03T16:00:00}, {start_time: 2020-01-03T19:30:00, end_time: 2020-01-05T12:00:00}]}]} |+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
复制代码

注意事项

  • state_agg 会记录每个状态的每个持续时间段,当状态基数较大或持续时间段较分散的情况下,会使用较大内存,谨慎使用。

  • compact_state_agg 返回的数据中不包含 state_periods 字段。会将每个状态的所有持续时间段汇总成总时间,占用内存相对较少,两个函数可以根据具体场景来抉择使用。

所以,这两个函数的区别在于 compact_state_agg 更关注总体持续时间,而 state_agg 提供了更详细的状态切换周期信息。

candlestick_agg

用途

适用于金融领域,生成开盘价、收盘价、最高价和最低价的数据。

函数用法

candlestick_agg(time, price, volume)
复制代码

参数类型:

  • time: Timestamp 类型

  • price: Double  类型

  • volume: Double 类型

返回类型:

{   open: { ts: Timestamp, val: Double },  close: { ts: Timestamp, val: Double },  high: { ts: Timestamp, val: Double },  low: { ts: Timestamp, val: Double },  volume: { vol: Double, vwap: Double },}
复制代码

示例

那么现在创建一个示例数据集。

CREATE TABLE IF NOT EXISTS tick(price bigint ,volume bigint);INSERT tick(time, price, volume)VALUES    ('1999-12-31 00:00:00.000', 111, 444),    ('1999-12-31 00:00:00.005', 222, 444),    ('1999-12-31 00:00:00.010', 333, 222),    ('1999-12-31 00:00:10.015', 444, 111),    ('1999-12-31 00:00:10.020', 222, 555),    ('1999-12-31 00:10:00.025', 333, 555),    ('1999-12-31 00:10:00.030', 444, 333),    ('1999-12-31 01:00:00.035', 555, 222);
复制代码

下列语句查询了关于时间序列数据的蜡烛图形式的统计信息,包括开盘价、收盘价、最低价。

SELECT candlestick_agg(time, price, volume) FROM tick;
复制代码

返回结果:

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| candlestick_agg(tick.time,tick.price,tick.volume)                                                                                                                                                                                   |+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| {open: {ts: 1999-12-31T00:00:00, val: 111.0}, close: {ts: 1999-12-31T01:00:00.035, val: 555.0}, low: {ts: 1999-12-31T00:00:00, val: 111.0}, high: {ts: 1999-12-31T01:00:00.035, val: 555.0}, volume: {vol: 2886.0, vwap: 850149.0}} |+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
复制代码

asap_smooth

用途

ASAP_SMOOTH 是一种降采样函数,适用于可视化和数据探索、预测和模型训练等场景。

函数用法

asap_smooth(time, value, resolution order by time)
复制代码

参数类型:

  • time: Timestamp 类型

  • value: Double  类型

  • resolution: Bigint 类型

返回结果:

{   time: [ Timestamp, ... ],  value: [ Double, ... ],  resolution: Int Unsigned,}
复制代码

示例

使用 asap_smooth 函数,对空气质量数据按月进行聚合,并在压力 (pressure) 数据上应用 ASAP (As Soon As Possible) 平滑,以得到平滑的时间序列数据。

SELECT asap_smooth(time, pressure, 10) FROM air GROUP BY date_trunc('month', time);
复制代码

返回结果:

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| asap_smooth(air.time,air.pressure,Int64(10))                                                                                                                                                                                                                                                                                                                                                                                                   |+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| {time: [2023-01-14T16:00:00, 2023-01-16T14:13:00, 2023-01-18T12:26:00, 2023-01-20T10:39:00, 2023-01-22T08:52:00, 2023-01-24T07:05:00, 2023-01-26T05:18:00, 2023-01-28T03:31:00, 2023-01-30T01:44:00, 2023-01-31T23:57:00], value: [64.79507211538461, 65.31009615384616, 65.25841346153847, 64.8485576923077, 65.09495192307692, 65.02524038461539, 64.8389423076923, 65.2421875, 65.02103365384616, 65.1141826923077], resolution: 10}        || {time: [2023-02-01T00:00:00, 2023-02-04T02:39:40, 2023-02-07T05:19:20, 2023-02-10T07:59:00, 2023-02-13T10:38:40, 2023-02-16T13:18:20, 2023-02-19T15:58:00, 2023-02-22T18:37:40, 2023-02-25T21:17:20, 2023-02-28T23:57:00], value: [65.20982142857143, 64.90625, 64.94828869047619, 64.97916666666667, 64.88504464285714, 64.8203125, 64.64434523809524, 64.88802083333333, 65.0, 64.76004464285714], resolution: 10}                           || {time: [2023-03-01T00:00:00, 2023-03-02T12:26:40, 2023-03-04T00:53:20, 2023-03-05T13:20:00, 2023-03-07T01:46:40, 2023-03-08T14:13:20, 2023-03-10T02:40:00, 2023-03-11T15:06:40, 2023-03-13T03:33:20, 2023-03-14T16:00:00], value: [65.29115853658537, 64.58307926829268, 64.7530487804878, 64.76753048780488, 65.14405487804878, 65.4298780487805, 65.1920731707317, 65.10365853658537, 64.86356707317073, 64.83841463414635], resolution: 10} |+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
复制代码

注意事项

  • 此函数对时间顺序敏感,在使用时需要添加 `order by time`。

  • 返回值中的 time 和 value 为数组类型。

  • 此函数在执行时会一次性处理所有数据,所以尽可能避免读入大量数据导致过多的内存开销。

 

CnosDB,这可不是普通的数据库,是你数据冒险的好伙伴,助你在数据的海洋中航行无忧!

 

发布于: 刚刚阅读数: 4
用户头像

CnosDB

关注

还未添加个人签名 2022-04-18 加入

打造高性能、高压缩比、高可用的分布式云原生时间序列数据库,引领世界迈向万物智联 欢迎关注 https://www.cnosdb.com

评论

发布
暂无评论
CnosDB:深入了解时序数据处理函数_开源_CnosDB_InfoQ写作社区