写点什么

PostgreSQL 插件之 TimescaleDB

发布于: 4 小时前

TimescaleDB

TimescaleDB 是一个开源的 PostgreSQL 的时序数据库扩展。基于 PostgreSQL 的强大能力,TimescaleDB 能够提供跨时间和空间的自动分区以及完整的 SQL 支持。

TimescaleDB 既像传统的关系型数据库一样支持全量 SQL,又像 NoSQL 型数据库一样可扩展。它具有如下的优势:

简单易用

  • PostgreSQL 原生支持的所有的 SQL 接口(包括二级索引,基于非时间的聚合,子查询,JOIN,窗口函数)

  • 无需更改任何连接和使用 PostgreSQL 的客户端和工具

  • 面向时间的功能、API 函数以及优化

  • 强大的数据保存策略的支持

扩展性强

  • 透明的时间/空间分区功能,可进行向上和向外扩展

  • 高数据写入率(包括批处理提交,内存索引,事务支持,对数据回填的支持)

  • 单个节点上大小合适的块(二维数据分区)可确保即使在大数据量下也能快速提取

  • 跨块和服务器的并行操作

可靠性高

  • 基于可靠的 PostgreSQL

  • 受益于 PostgreSQL 20 多年研究(包括流复制,备份)的基础

  • 灵活的管理选项(与现有的 PostgreSQL 生态系统和工具兼容)

更多关于 TimescaleDB 的信息,请参考官方文档

系统要求

  • PostgreSQL 9.6+

使用说明

创建插件

CREATE EXTENSION IF NOT EXISTS timescaledb;
复制代码

删除插件

DROP EXTENSION IF EXISTS timescaledb;
复制代码

使用示例 1

1、创建普通测试表

postgres=# CREATE TABLE conditions (postgres(#   time        TIMESTAMPTZ       NOT NULL,postgres(#   location    TEXT              NOT NULL,postgres(#   temperature DOUBLE PRECISION  NULL,postgres(#   humidity    DOUBLE PRECISION  NULLpostgres(# );CREATE TABLE
复制代码

2、基于 time 分区将上一步创建的普通表转换为超表

postgres=# SELECT create_hypertable('conditions', 'time');    create_hypertable    ------------------------- (1,public,conditions,t)(1 row)
postgres=#
复制代码
  • 关于超表概念以及如何创建超表,请参考超表和创建超表

3、插入数据并查询

postgres=# INSERT INTO conditions(time, location, temperature, humidity)SELECT now(), to_char(i, 'FM0000'), random()*i, random()*i FROM generate_series(1,10000) i;INSERT 0 10000postgres=# SELECT * FROM conditions ORDER BY time DESC LIMIT 10;             time              | location |    temperature     |      humidity      -------------------------------+----------+--------------------+-------------------- 2021-02-04 14:25:05.861412+08 | 0001     | 0.8499118449648186 | 0.7081022857469108 2021-02-04 14:25:05.861412+08 | 0002     |  1.032869253930933 | 0.6919497591141521 2021-02-04 14:25:05.861412+08 | 0003     |  2.181961480555376 |  2.798874636756544 2021-02-04 14:25:05.861412+08 | 0004     | 2.0593786791351505 | 3.1111923049135726 2021-02-04 14:25:05.861412+08 | 0005     |  4.631874193357941 |  1.152362876862405 2021-02-04 14:25:05.861412+08 | 0006     | 3.1238939110304145 | 3.5332473749587052 2021-02-04 14:25:05.861412+08 | 0007     | 3.0055947001481975 |  2.681226072554999 2021-02-04 14:25:05.861412+08 | 0008     |   5.97665218304374 |  4.776195580392255 2021-02-04 14:25:05.861412+08 | 0009     | 0.4395944889294263 |   3.79820483119601 2021-02-04 14:25:05.861412+08 | 0010     |  5.213994089740517 |  6.613006901233973(10 rows)
---针对过去3小时的数据,每15分钟采集度量一次,按照时间和温度降序排序postgres=# SELECT time_bucket('15 minutes', time) AS fifteen_min,postgres-# location, COUNT(*),postgres-# MAX(temperature) AS max_temp,postgres-# MAX(humidity) AS max_humpostgres-# FROM conditionspostgres-# WHERE time > NOW() - interval '3 hours'postgres-# GROUP BY fifteen_min, locationpostgres-# ORDER BY fifteen_min DESC, max_temp DESC; fifteen_min | location | count | max_temp | max_hum ------------------------+----------+-------+----------------------+------------------------ 2021-02-04 14:15:00+08 | 9974 | 1 | 9959.154515189259 | 8393.882609488966 2021-02-04 14:15:00+08 | 9963 | 1 | 9893.781627907236 | 7968.817036613598 2021-02-04 14:15:00+08 | 9852 | 1 | 9831.658490260146 | 4594.882403274832 2021-02-04 14:15:00+08 | 9962 | 1 | 9668.102999374136 | 9060.332750304518 2021-02-04 14:15:00+08 | 9807 | 1 | 9632.48883539376 | 6923.888752408443 2021-02-04 14:15:00+08 | 9627 | 1 | 9613.157259692176 | 9241.248136652985 2021-02-04 14:15:00+08 | 9800 | 1 | 9590.790872497493 | 2464.9034796248 2021-02-04 14:15:00+08 | 9600 | 1 | 9537.116866999975 | 3404.4305571978384 2021-02-04 14:15:00+08 | 9954 | 1 | 9523.032966159419 | 1555.249976678025 2021-02-04 14:15:00+08 | 9908 | 1 | 9499.432451454542 | 1110.053567264063 2021-02-04 14:15:00+08 | 9806 | 1 | 9481.471221781818 | 8187.133268036299 2021-02-04 14:15:00+08 | 9552 | 1 | 9378.450791195617 | 5158.115086741863 2021-02-04 14:15:00+08 | 9451 | 1 | 9371.378793857662 | 2626.3288166080547 2021-02-04 14:15:00+08 | 9654 | 1 | 9340.183911422057 | 8080.395366505775 2021-02-04 14:15:00+08 | 9419 | 1 | 9337.334558829742 | 1577.535230574912 2021-02-04 14:15:00+08 | 9929 | 1 | 9335.892011237425 | 7598.584466488536 2021-02-04 14:15:00+08 | 9803 | 1 | 9333.235219857552 | 6644.544463200405 2021-02-04 14:15:00+08 | 9899 | 1 | 9327.24793091667 | 1563.9061617953776 2021-02-04 14:15:00+08 | 9750 | 1 | 9327.22125169839 | 3425.3518007559906 2021-02-04 14:15:00+08 | 9914 | 1 | 9326.780445982797 | 7394.6374437081995 2021-02-04 14:15:00+08 | 9872 | 1 | 9302.263030958715 | 1101.1974349270733 2021-02-04 14:15:00+08 | 9581 | 1 | 9277.185015285479 | 6663.525557202216 2021-02-04 14:15:00+08 | 9708 | 1 | 9273.063941424782 | 3175.170956246176 2021-02-04 14:15:00+08 | 9848 | 1 | 9260.316420825136 | 3523.155521805003 2021-02-04 14:15:00+08 | 9632 | 1 | 9259.105259260195 | 227.1827826256217 2021-02-04 14:15:00+08 | 9466 | 1 | 9254.85035699184 | 2616.3724030114145 2021-02-04 14:15:00+08 | 9860 | 1 | 9247.49280995008 | 9379.583724363356 2021-02-04 14:15:00+08 | 9235 | 1 | 9226.944980400396 | 1061.4152797323275 ......
复制代码


  • 关于 time_bucket()等 TimescaleDB 提供的 API 函数,请参考 API Functions


使用示例 2

  • 该示例演示了如何分析纽约市出租车的运营情况,关于示例数据及更多信息介绍,请参考官方 Tutorial

1、创建示例数据库及环境

postgres=# CREATE DATABASE nyc_data;CREATE DATABASEpostgres=# \c nyc_dataYou are now connected to database "nyc_data" as user "pg12".nyc_data=# CREATE EXTENSION IF NOT EXISTS timescaledb;
复制代码

2、创建普通测试表、超表及索引

nyc_data=# CREATE TABLE "rides"(nyc_data(#     vendor_id TEXT,nyc_data(#     pickup_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,nyc_data(#     dropoff_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,nyc_data(#     passenger_count NUMERIC,nyc_data(#     trip_distance NUMERIC,nyc_data(#     pickup_longitude  NUMERIC,nyc_data(#     pickup_latitude   NUMERIC,nyc_data(#     rate_code         INTEGER,nyc_data(#     dropoff_longitude NUMERIC,nyc_data(#     dropoff_latitude  NUMERIC,nyc_data(#     payment_type INTEGER,nyc_data(#     fare_amount NUMERIC,nyc_data(#     extra NUMERIC,nyc_data(#     mta_tax NUMERIC,nyc_data(#     tip_amount NUMERIC,nyc_data(#     tolls_amount NUMERIC,nyc_data(#     improvement_surcharge NUMERIC,nyc_data(#     total_amount NUMERICnyc_data(# );CREATE TABLEnyc_data=# SELECT create_hypertable('rides', 'pickup_datetime', 'payment_type', 2, create_default_indexes=>FALSE); create_hypertable  -------------------- (1,public,rides,t)(1 row)
nyc_data=# CREATE INDEX ON rides (vendor_id, pickup_datetime DESC);CREATE INDEXnyc_data=# CREATE INDEX ON rides (rate_code, pickup_datetime DESC);CREATE INDEXnyc_data=# CREATE INDEX ON rides (passenger_count, pickup_datetime DESC);CREATE INDEX
复制代码

3、创建辅助表并插入数据

nyc_data=# CREATE TABLE IF NOT EXISTS "payment_types"(nyc_data(#     payment_type INTEGER,nyc_data(#     description TEXTnyc_data(# );CREATE TABLEnyc_data=# INSERT INTO payment_types(payment_type, description) VALUESnyc_data-# (1, 'credit card'),nyc_data-# (2, 'cash'),nyc_data-# (3, 'no charge'),nyc_data-# (4, 'dispute'),nyc_data-# (5, 'unknown'),nyc_data-# (6, 'voided trip');INSERT 0 6nyc_data=# CREATE TABLE IF NOT EXISTS "rates"(nyc_data(#     rate_code   INTEGER,nyc_data(#     description TEXTnyc_data(# );CREATE TABLEnyc_data=# INSERT INTO rates(rate_code, description) VALUESnyc_data-# (1, 'standard rate'),nyc_data-# (2, 'JFK'),nyc_data-# (3, 'Newark'),nyc_data-# (4, 'Nassau or Westchester'),nyc_data-# (5, 'negotiated fare'),nyc_data-# (6, 'group ride');INSERT 0 6
复制代码

4、向测试表导入测试数据

nyc_data=# \COPY rides FROM nyc_data_rides.csv CSV;COPY 10906860
复制代码

5、分析处理数据

--- 2016年1月每天都多少次打车行为nyc_data=# SELECT date_trunc('day', pickup_datetime) as day, COUNT(*) FROM rides GROUP BY day ORDER BY day;         day         | count  ---------------------+-------- 2016-01-01 00:00:00 | 345037 2016-01-02 00:00:00 | 312831 2016-01-03 00:00:00 | 302878 2016-01-04 00:00:00 | 316171 2016-01-05 00:00:00 | 343251 2016-01-06 00:00:00 | 348516 2016-01-07 00:00:00 | 364894 2016-01-08 00:00:00 | 392070 2016-01-09 00:00:00 | 405825 2016-01-10 00:00:00 | 351788 2016-01-11 00:00:00 | 342651 2016-01-12 00:00:00 | 367390 2016-01-13 00:00:00 | 395090 2016-01-14 00:00:00 | 396473 2016-01-15 00:00:00 | 401289 2016-01-16 00:00:00 | 411899 2016-01-17 00:00:00 | 379156 2016-01-18 00:00:00 | 341481 2016-01-19 00:00:00 | 385187 2016-01-20 00:00:00 | 382105 2016-01-21 00:00:00 | 399654 2016-01-22 00:00:00 | 420162 2016-01-23 00:00:00 |  78133 2016-01-24 00:00:00 | 159766 2016-01-25 00:00:00 | 282087 2016-01-26 00:00:00 | 327655 2016-01-27 00:00:00 | 359180 2016-01-28 00:00:00 | 383326 2016-01-29 00:00:00 | 414039 2016-01-30 00:00:00 | 435369 2016-01-31 00:00:00 | 361505 2017-11-17 00:00:00 |      2(32 rows)
---前7天平均每个顾客的打车费是多少nyc_data=# SELECT date_trunc('day', pickup_datetime) AS day, avg(fare_amount)nyc_data-# FROM rides WHERE passenger_count = 1 AND pickup_datetime < '2016-01-08'nyc_data-# GROUP BY day ORDER BY day; day | avg ---------------------+--------------------- 2016-01-01 00:00:00 | 12.5464748850129787 2016-01-02 00:00:00 | 12.1129878886746750 2016-01-03 00:00:00 | 12.8262352076841150 2016-01-04 00:00:00 | 11.9116533573721472 2016-01-05 00:00:00 | 11.7534235580737452 2016-01-06 00:00:00 | 11.7824805635293235 2016-01-07 00:00:00 | 11.9498961299166930(7 rows)
---一个月内每种费率类型的次数是多少nyc_data=# SELECT rate_code, COUNT(vendor_id) AS num_trips nyc_data-# FROM rides WHERE pickup_datetime < '2016-02-01'nyc_data-# GROUP BY rate_code ORDER BY rate_code; rate_code | num_trips -----------+----------- 1 | 10626315 2 | 225019 3 | 16822 4 | 4696 5 | 33688 6 | 102 99 | 216(7 rows)
---2016年第一天每5分钟有多少次乘车?nyc_data=# SELECT time_bucket('5 minute', pickup_datetime) AS five_min, count(*)nyc_data-# FROM rides WHERE pickup_datetime < '2016-01-02 00:00'nyc_data-# GROUP BY five_min ORDER BY five_min; five_min | count ---------------------+------- 2016-01-01 00:00:00 | 703 2016-01-01 00:05:00 | 1482 2016-01-01 00:10:00 | 1959 2016-01-01 00:15:00 | 2200 2016-01-01 00:20:00 | 2285 2016-01-01 00:25:00 | 2291 2016-01-01 00:30:00 | 2349 2016-01-01 00:35:00 | 2328 2016-01-01 00:40:00 | 2440 2016-01-01 00:45:00 | 2372 2016-01-01 00:50:00 | 2388 2016-01-01 00:55:00 | 2473 2016-01-01 01:00:00 | 2395 2016-01-01 01:05:00 | 2510 2016-01-01 01:10:00 | 2412 2016-01-01 01:15:00 | 2482 2016-01-01 01:20:00 | 2428 2016-01-01 01:25:00 | 2433 2016-01-01 01:30:00 | 2337 2016-01-01 01:35:00 | 2366 2016-01-01 01:40:00 | 2325 2016-01-01 01:45:00 | 2257 2016-01-01 01:50:00 | 2316 2016-01-01 01:55:00 | 2250 2016-01-01 02:00:00 | 2303 2016-01-01 02:05:00 | 2259 2016-01-01 02:10:00 | 2253 2016-01-01 02:15:00 | 2176 2016-01-01 02:20:00 | 2138 ......
复制代码


发布于: 4 小时前阅读数: 4
用户头像

提供华为云数据库最新鲜热门技术干货 2021.06.24 加入

汇聚华为云数据库社区热门技术干货,跟踪最新数据库新闻动态,提供问题交流平台,互动共同成长! 传送门:https://developer.huaweicloud.com/techfield/db.html

评论

发布
暂无评论
PostgreSQL插件之TimescaleDB