PostgreSQL 插件之 TimescaleDB
TimescaleDB
TimescaleDB 是一个开源的 PostgreSQL 的时序数据库扩展。基于 PostgreSQL 的强大能力,TimescaleDB 能够提供跨时间和空间的自动分区以及完整的 SQL 支持。
TimescaleDB 既像传统的关系型数据库一样支持全量 SQL,又像 NoSQL 型数据库一样可扩展。它具有如下的优势:
简单易用
PostgreSQL 原生支持的所有的 SQL 接口(包括二级索引,基于非时间的聚合,子查询,JOIN,窗口函数)
无需更改任何连接和使用 PostgreSQL 的客户端和工具
面向时间的功能、API 函数以及优化
强大的数据保存策略的支持
扩展性强
透明的时间/空间分区功能,可进行向上和向外扩展
高数据写入率(包括批处理提交,内存索引,事务支持,对数据回填的支持)
单个节点上大小合适的块(二维数据分区)可确保即使在大数据量下也能快速提取
跨块和服务器的并行操作
可靠性高
基于可靠的 PostgreSQL
受益于 PostgreSQL 20 多年研究(包括流复制,备份)的基础
灵活的管理选项(与现有的 PostgreSQL 生态系统和工具兼容)
更多关于 TimescaleDB 的信息,请参考官方文档
系统要求
PostgreSQL 9.6+
使用说明
创建插件
CREATE EXTENSION IF NOT EXISTS timescaledb;
删除插件
DROP EXTENSION IF EXISTS timescaledb;
使用示例 1
1、创建普通测试表
postgres=# CREATE TABLE conditions (
postgres(# time TIMESTAMPTZ NOT NULL,
postgres(# location TEXT NOT NULL,
postgres(# temperature DOUBLE PRECISION NULL,
postgres(# humidity DOUBLE PRECISION NULL
postgres(# );
CREATE TABLE
2、基于 time 分区将上一步创建的普通表转换为超表
postgres=# SELECT create_hypertable('conditions', 'time');
create_hypertable
-------------------------
(1,public,conditions,t)
(1 row)
postgres=#
关于超表概念以及如何创建超表,请参考超表和创建超表
3、插入数据并查询
postgres=# INSERT INTO conditions(time, location, temperature, humidity)
SELECT now(), to_char(i, 'FM0000'), random()*i, random()*i FROM generate_series(1,10000) i;
INSERT 0 10000
postgres=# SELECT * FROM conditions ORDER BY time DESC LIMIT 10;
time | location | temperature | humidity
-------------------------------+----------+--------------------+--------------------
2021-02-04 14:25:05.861412+08 | 0001 | 0.8499118449648186 | 0.7081022857469108
2021-02-04 14:25:05.861412+08 | 0002 | 1.032869253930933 | 0.6919497591141521
2021-02-04 14:25:05.861412+08 | 0003 | 2.181961480555376 | 2.798874636756544
2021-02-04 14:25:05.861412+08 | 0004 | 2.0593786791351505 | 3.1111923049135726
2021-02-04 14:25:05.861412+08 | 0005 | 4.631874193357941 | 1.152362876862405
2021-02-04 14:25:05.861412+08 | 0006 | 3.1238939110304145 | 3.5332473749587052
2021-02-04 14:25:05.861412+08 | 0007 | 3.0055947001481975 | 2.681226072554999
2021-02-04 14:25:05.861412+08 | 0008 | 5.97665218304374 | 4.776195580392255
2021-02-04 14:25:05.861412+08 | 0009 | 0.4395944889294263 | 3.79820483119601
2021-02-04 14:25:05.861412+08 | 0010 | 5.213994089740517 | 6.613006901233973
(10 rows)
---针对过去3小时的数据,每15分钟采集度量一次,按照时间和温度降序排序
postgres=# SELECT time_bucket('15 minutes', time) AS fifteen_min,
postgres-# location, COUNT(*),
postgres-# MAX(temperature) AS max_temp,
postgres-# MAX(humidity) AS max_hum
postgres-# FROM conditions
postgres-# WHERE time > NOW() - interval '3 hours'
postgres-# GROUP BY fifteen_min, location
postgres-# ORDER BY fifteen_min DESC, max_temp DESC;
fifteen_min | location | count | max_temp | max_hum
------------------------+----------+-------+----------------------+------------------------
2021-02-04 14:15:00+08 | 9974 | 1 | 9959.154515189259 | 8393.882609488966
2021-02-04 14:15:00+08 | 9963 | 1 | 9893.781627907236 | 7968.817036613598
2021-02-04 14:15:00+08 | 9852 | 1 | 9831.658490260146 | 4594.882403274832
2021-02-04 14:15:00+08 | 9962 | 1 | 9668.102999374136 | 9060.332750304518
2021-02-04 14:15:00+08 | 9807 | 1 | 9632.48883539376 | 6923.888752408443
2021-02-04 14:15:00+08 | 9627 | 1 | 9613.157259692176 | 9241.248136652985
2021-02-04 14:15:00+08 | 9800 | 1 | 9590.790872497493 | 2464.9034796248
2021-02-04 14:15:00+08 | 9600 | 1 | 9537.116866999975 | 3404.4305571978384
2021-02-04 14:15:00+08 | 9954 | 1 | 9523.032966159419 | 1555.249976678025
2021-02-04 14:15:00+08 | 9908 | 1 | 9499.432451454542 | 1110.053567264063
2021-02-04 14:15:00+08 | 9806 | 1 | 9481.471221781818 | 8187.133268036299
2021-02-04 14:15:00+08 | 9552 | 1 | 9378.450791195617 | 5158.115086741863
2021-02-04 14:15:00+08 | 9451 | 1 | 9371.378793857662 | 2626.3288166080547
2021-02-04 14:15:00+08 | 9654 | 1 | 9340.183911422057 | 8080.395366505775
2021-02-04 14:15:00+08 | 9419 | 1 | 9337.334558829742 | 1577.535230574912
2021-02-04 14:15:00+08 | 9929 | 1 | 9335.892011237425 | 7598.584466488536
2021-02-04 14:15:00+08 | 9803 | 1 | 9333.235219857552 | 6644.544463200405
2021-02-04 14:15:00+08 | 9899 | 1 | 9327.24793091667 | 1563.9061617953776
2021-02-04 14:15:00+08 | 9750 | 1 | 9327.22125169839 | 3425.3518007559906
2021-02-04 14:15:00+08 | 9914 | 1 | 9326.780445982797 | 7394.6374437081995
2021-02-04 14:15:00+08 | 9872 | 1 | 9302.263030958715 | 1101.1974349270733
2021-02-04 14:15:00+08 | 9581 | 1 | 9277.185015285479 | 6663.525557202216
2021-02-04 14:15:00+08 | 9708 | 1 | 9273.063941424782 | 3175.170956246176
2021-02-04 14:15:00+08 | 9848 | 1 | 9260.316420825136 | 3523.155521805003
2021-02-04 14:15:00+08 | 9632 | 1 | 9259.105259260195 | 227.1827826256217
2021-02-04 14:15:00+08 | 9466 | 1 | 9254.85035699184 | 2616.3724030114145
2021-02-04 14:15:00+08 | 9860 | 1 | 9247.49280995008 | 9379.583724363356
2021-02-04 14:15:00+08 | 9235 | 1 | 9226.944980400396 | 1061.4152797323275
......
关于 time_bucket()等 TimescaleDB 提供的 API 函数,请参考 API Functions
使用示例 2
该示例演示了如何分析纽约市出租车的运营情况,关于示例数据及更多信息介绍,请参考官方 Tutorial
1、创建示例数据库及环境
postgres=# CREATE DATABASE nyc_data;
CREATE DATABASE
postgres=# \c nyc_data
You are now connected to database "nyc_data" as user "pg12".
nyc_data=# CREATE EXTENSION IF NOT EXISTS timescaledb;
2、创建普通测试表、超表及索引
nyc_data=# CREATE TABLE "rides"(
nyc_data(# vendor_id TEXT,
nyc_data(# pickup_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
nyc_data(# dropoff_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
nyc_data(# passenger_count NUMERIC,
nyc_data(# trip_distance NUMERIC,
nyc_data(# pickup_longitude NUMERIC,
nyc_data(# pickup_latitude NUMERIC,
nyc_data(# rate_code INTEGER,
nyc_data(# dropoff_longitude NUMERIC,
nyc_data(# dropoff_latitude NUMERIC,
nyc_data(# payment_type INTEGER,
nyc_data(# fare_amount NUMERIC,
nyc_data(# extra NUMERIC,
nyc_data(# mta_tax NUMERIC,
nyc_data(# tip_amount NUMERIC,
nyc_data(# tolls_amount NUMERIC,
nyc_data(# improvement_surcharge NUMERIC,
nyc_data(# total_amount NUMERIC
nyc_data(# );
CREATE TABLE
nyc_data=# SELECT create_hypertable('rides', 'pickup_datetime', 'payment_type', 2, create_default_indexes=>FALSE);
create_hypertable
--------------------
(1,public,rides,t)
(1 row)
nyc_data=# CREATE INDEX ON rides (vendor_id, pickup_datetime DESC);
CREATE INDEX
nyc_data=# CREATE INDEX ON rides (rate_code, pickup_datetime DESC);
CREATE INDEX
nyc_data=# CREATE INDEX ON rides (passenger_count, pickup_datetime DESC);
CREATE INDEX
3、创建辅助表并插入数据
nyc_data=# CREATE TABLE IF NOT EXISTS "payment_types"(
nyc_data(# payment_type INTEGER,
nyc_data(# description TEXT
nyc_data(# );
CREATE TABLE
nyc_data=# INSERT INTO payment_types(payment_type, description) VALUES
nyc_data-# (1, 'credit card'),
nyc_data-# (2, 'cash'),
nyc_data-# (3, 'no charge'),
nyc_data-# (4, 'dispute'),
nyc_data-# (5, 'unknown'),
nyc_data-# (6, 'voided trip');
INSERT 0 6
nyc_data=# CREATE TABLE IF NOT EXISTS "rates"(
nyc_data(# rate_code INTEGER,
nyc_data(# description TEXT
nyc_data(# );
CREATE TABLE
nyc_data=# INSERT INTO rates(rate_code, description) VALUES
nyc_data-# (1, 'standard rate'),
nyc_data-# (2, 'JFK'),
nyc_data-# (3, 'Newark'),
nyc_data-# (4, 'Nassau or Westchester'),
nyc_data-# (5, 'negotiated fare'),
nyc_data-# (6, 'group ride');
INSERT 0 6
4、向测试表导入测试数据
nyc_data=# \COPY rides FROM nyc_data_rides.csv CSV;
COPY 10906860
5、分析处理数据
--- 2016年1月每天都多少次打车行为
nyc_data=# SELECT date_trunc('day', pickup_datetime) as day, COUNT(*) FROM rides GROUP BY day ORDER BY day;
day | count
---------------------+--------
2016-01-01 00:00:00 | 345037
2016-01-02 00:00:00 | 312831
2016-01-03 00:00:00 | 302878
2016-01-04 00:00:00 | 316171
2016-01-05 00:00:00 | 343251
2016-01-06 00:00:00 | 348516
2016-01-07 00:00:00 | 364894
2016-01-08 00:00:00 | 392070
2016-01-09 00:00:00 | 405825
2016-01-10 00:00:00 | 351788
2016-01-11 00:00:00 | 342651
2016-01-12 00:00:00 | 367390
2016-01-13 00:00:00 | 395090
2016-01-14 00:00:00 | 396473
2016-01-15 00:00:00 | 401289
2016-01-16 00:00:00 | 411899
2016-01-17 00:00:00 | 379156
2016-01-18 00:00:00 | 341481
2016-01-19 00:00:00 | 385187
2016-01-20 00:00:00 | 382105
2016-01-21 00:00:00 | 399654
2016-01-22 00:00:00 | 420162
2016-01-23 00:00:00 | 78133
2016-01-24 00:00:00 | 159766
2016-01-25 00:00:00 | 282087
2016-01-26 00:00:00 | 327655
2016-01-27 00:00:00 | 359180
2016-01-28 00:00:00 | 383326
2016-01-29 00:00:00 | 414039
2016-01-30 00:00:00 | 435369
2016-01-31 00:00:00 | 361505
2017-11-17 00:00:00 | 2
(32 rows)
---前7天平均每个顾客的打车费是多少
nyc_data=# SELECT date_trunc('day', pickup_datetime) AS day, avg(fare_amount)
nyc_data-# FROM rides WHERE passenger_count = 1 AND pickup_datetime < '2016-01-08'
nyc_data-# GROUP BY day ORDER BY day;
day | avg
---------------------+---------------------
2016-01-01 00:00:00 | 12.5464748850129787
2016-01-02 00:00:00 | 12.1129878886746750
2016-01-03 00:00:00 | 12.8262352076841150
2016-01-04 00:00:00 | 11.9116533573721472
2016-01-05 00:00:00 | 11.7534235580737452
2016-01-06 00:00:00 | 11.7824805635293235
2016-01-07 00:00:00 | 11.9498961299166930
(7 rows)
---一个月内每种费率类型的次数是多少
nyc_data=# SELECT rate_code, COUNT(vendor_id) AS num_trips
nyc_data-# FROM rides WHERE pickup_datetime < '2016-02-01'
nyc_data-# GROUP BY rate_code ORDER BY rate_code;
rate_code | num_trips
-----------+-----------
1 | 10626315
2 | 225019
3 | 16822
4 | 4696
5 | 33688
6 | 102
99 | 216
(7 rows)
---2016年第一天每5分钟有多少次乘车?
nyc_data=# SELECT time_bucket('5 minute', pickup_datetime) AS five_min, count(*)
nyc_data-# FROM rides WHERE pickup_datetime < '2016-01-02 00:00'
nyc_data-# GROUP BY five_min ORDER BY five_min;
five_min | count
---------------------+-------
2016-01-01 00:00:00 | 703
2016-01-01 00:05:00 | 1482
2016-01-01 00:10:00 | 1959
2016-01-01 00:15:00 | 2200
2016-01-01 00:20:00 | 2285
2016-01-01 00:25:00 | 2291
2016-01-01 00:30:00 | 2349
2016-01-01 00:35:00 | 2328
2016-01-01 00:40:00 | 2440
2016-01-01 00:45:00 | 2372
2016-01-01 00:50:00 | 2388
2016-01-01 00:55:00 | 2473
2016-01-01 01:00:00 | 2395
2016-01-01 01:05:00 | 2510
2016-01-01 01:10:00 | 2412
2016-01-01 01:15:00 | 2482
2016-01-01 01:20:00 | 2428
2016-01-01 01:25:00 | 2433
2016-01-01 01:30:00 | 2337
2016-01-01 01:35:00 | 2366
2016-01-01 01:40:00 | 2325
2016-01-01 01:45:00 | 2257
2016-01-01 01:50:00 | 2316
2016-01-01 01:55:00 | 2250
2016-01-01 02:00:00 | 2303
2016-01-01 02:05:00 | 2259
2016-01-01 02:10:00 | 2253
2016-01-01 02:15:00 | 2176
2016-01-01 02:20:00 | 2138
......
版权声明: 本文为 InfoQ 作者【华为云数据库小助手】的原创文章。
原文链接:【http://xie.infoq.cn/article/479b6123288c260bd7382c8f4】。文章转载请联系作者。
华为云数据库小助手
提供华为云数据库最新鲜热门技术干货 2021.06.24 加入
汇聚华为云数据库社区热门技术干货,跟踪最新数据库新闻动态,提供问题交流平台,互动共同成长! 传送门:https://developer.huaweicloud.com/techfield/db.html
评论