写点什么

从零开始学 Flink:数据转换的艺术

作者:郝培强
  • 2025-11-30
    广东
  • 本文字数:3728 字

    阅读完需:约 12 分钟

从零开始学Flink:数据转换的艺术

在实时数据处理流程中,数据转换(Transformation)是连接数据源与输出结果的桥梁,也是体现计算逻辑的核心环节。Flink 提供了丰富的数据转换操作,让开发者能够灵活地对数据流进行各种处理和分析。本文将以 Flink DataStream API 为核心,带你探索 Flink 数据转换的精妙世界,并结合之前文章中的 Kafka Source 实现一个完整的数据处理流程。


一、数据转换概览数据转换是指将原始输入数据通过一系列操作转换为所需输出结果的过程。在 Flink 中,这些操作主要分为以下几类:


基本转换:如映射(Map)、过滤(Filter)、扁平映射(FlatMap)等键控转换:如分组(KeyBy)、聚合(Reduce、Aggregate)等多流转换:如联合(Union)、连接(Join)、拆分(Split)等状态转换:如键控状态(Keyed State)、算子状态(Operator State)等这些转换操作就像数据的"加工厂",让原始数据经过一系列"工序"后,变成有价值的信息产品。


二、环境准备与依赖配置为了演示数据转换,我们将继续使用之前文章中的 Kafka Source 环境。如果您已经完成了《从零开始学 Flink:数据源》中的环境搭建,可以直接使用现有配置;如果还没有,请先参考该文章完成环境准备。


  1. 版本说明 Flink:1.20.1Kafka:3.4.0JDK:17+gradle 8.3+

  2. 核心依赖除了基础的 Flink 和 Kafka 依赖外,我们在本文中将引入一些额外的依赖来支持更丰富的数据处理场景:


dependencies {// Flink 核心依赖 implementation 'org.apache.flink:flink-java:1.20.1'implementation 'org.apache.flink:flink-streaming-java_2.12:1.20.1'


// Flink Kafka Connectorimplementation 'org.apache.flink:flink-connector-kafka_2.12:1.20.1'
// 日志依赖implementation 'org.apache.logging.log4j:log4j-api:2.17.1'implementation 'org.apache.logging.log4j:log4j-core:2.17.1'implementation 'org.apache.logging.log4j:log4j-slf4j-impl:2.17.1'
// JSON处理库(用于处理JSON格式数据)implementation 'com.fasterxml.jackson.core:jackson-databind:2.14.2'implementation 'com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.14.2'
复制代码


}三、基本转换操作基本转换是 Flink 中最常用、最简单的数据转换操作,它们对数据流中的每个元素进行独立处理,不涉及状态管理。


  1. 映射(Map)Map 操作将输入流中的每个元素转换为另一个元素。例如,将字符串转换为大写:


// 从 Kafka 读取字符串数据 DataStream<String> kafkaStream = env.fromSource(kafkaSource,WatermarkStrategy.noWatermarks(),"Kafka Source");


// 使用 Map 将字符串转换为大写 DataStream<String> upperCaseStream = kafkaStream.map(s -> s.toUpperCase());


upperCaseStream.print("UppercaseData");2. 过滤(Filter)Filter 操作根据条件过滤掉不需要的元素,只保留满足条件的元素:


// 过滤出包含"flink"关键词的消息 DataStream<String> filteredStream = kafkaStream.filter(s -> s.toLowerCase().contains("flink"));


filteredStream.print("FilteredData");3. 扁平映射(FlatMap)FlatMap 操作类似于 Map,但它可以将一个元素转换为零个、一个或多个元素,常用于数据拆分场景:


// 将每行文本拆分为单词 DataStream<String> wordStream = kafkaStream.flatMap((String value, Collector<String> out) -> {// 按空格拆分字符串 String[] words = value.split(" ");// 将每个单词发送到输出流 for (String word : words) {out.collect(word);}});

https://www.douban.com/review/17254120/

https://www.douban.com/location/drama/review/17254120/

https://movie.douban.com/review/17254120/

https://music.douban.com/review/17254120/

https://book.douban.com/review/17254120/

https://read.douban.com/review/17254120/

https://www.douban.com/review/17254120

https://www.douban.com/location/drama/review/17254120

https://movie.douban.com/review/17254120

https://music.douban.com/review/17254120

https://book.douban.com/review/17254120

https://read.douban.com/review/17254120

https://www.douban.com/review/17254118/

https://www.douban.com/location/drama/review/17254118/

https://movie.douban.com/review/17254118/

https://music.douban.com/review/17254118/

https://book.douban.com/review/17254118/

https://read.douban.com/review/17254118/

https://www.douban.com/review/17254118

https://www.douban.com/location/drama/review/17254118

https://movie.douban.com/review/17254118

https://music.douban.com/review/17254118

https://book.douban.com/review/17254118

https://read.douban.com/review/17254118

https://www.douban.com/review/17254113/

https://www.douban.com/location/drama/review/17254113/

https://movie.douban.com/review/17254113/

https://music.douban.com/review/17254113/

https://book.douban.com/review/17254113/

https://read.douban.com/review/17254113/

https://www.douban.com/review/17254113

https://www.douban.com/location/drama/review/17254113

https://movie.douban.com/review/17254113

https://music.douban.com/review/17254113

https://book.douban.com/review/17254113

https://read.douban.com/review/17254113

https://www.douban.com/review/17254110/

https://www.douban.com/location/drama/review/17254110/

https://movie.douban.com/review/17254110/

https://music.douban.com/review/17254110/

https://book.douban.com/review/17254110/

https://read.douban.com/review/17254110/

https://www.douban.com/review/17254110

https://www.douban.com/location/drama/review/17254110

https://movie.douban.com/review/17254110

https://music.douban.com/review/17254110

https://book.douban.com/review/17254110

https://read.douban.com/review/17254110

https://www.douban.com/review/17254104/

https://www.douban.com/location/drama/review/17254104/

https://movie.douban.com/review/17254104/

https://music.douban.com/review/17254104/

https://book.douban.com/review/17254104/

https://read.douban.com/review/17254104/

https://www.douban.com/review/17254104

https://www.douban.com/location/drama/review/17254104

https://movie.douban.com/review/17254104

https://music.douban.com/review/17254104

https://book.douban.com/review/17254104

https://read.douban.com/review/17254104

https://www.douban.com/review/17254101/

https://www.douban.com/location/drama/review/17254101/

https://movie.douban.com/review/17254101/

https://music.douban.com/review/17254101/

https://book.douban.com/review/17254101/

https://read.douban.com/review/17254101/

https://www.douban.com/review/17254101

https://www.douban.com/location/drama/review/17254101

https://movie.douban.com/review/17254101

https://music.douban.com/review/17254101

https://book.douban.com/review/17254101

https://read.douban.com/review/17254101

https://www.douban.com/review/17254068/

https://www.douban.com/location/drama/review/17254068/

https://movie.douban.com/review/17254068/

https://music.douban.com/review/17254068/

https://book.douban.com/review/17254068/

https://read.douban.com/review/17254068/

https://www.douban.com/review/17254068

https://www.douban.com/location/drama/review/17254068

https://movie.douban.com/review/17254068

https://music.douban.com/review/17254068

https://book.douban.com/review/17254068

https://read.douban.com/review/17254068

https://www.douban.com/review/17254064/

https://www.douban.com/location/drama/review/17254064/

https://movie.douban.com/review/17254064/

https://music.douban.com/review/17254064/

https://book.douban.com/review/17254064/

https://read.douban.com/review/17254064/

https://www.douban.com/review/17254064

https://www.douban.com/location/drama/review/17254064

https://movie.douban.com/review/17254064

https://music.douban.com/review/17254064

https://book.douban.com/review/17254064

https://read.douban.com/review/17254064

https://www.douban.com/review/17254059/

https://www.douban.com/location/drama/review/17254059/

https://movie.douban.com/review/17254059/

https://music.douban.com/review/17254059/

https://book.douban.com/review/17254059/

https://read.douban.com/review/17254059/

https://www.douban.com/review/17254059

https://www.douban.com/location/drama/review/17254059

https://movie.douban.com/review/17254059

https://music.douban.com/review/17254059

https://book.douban.com/review/17254059

https://read.douban.com/review/17254059


wordStream.print("WordData");四、键控转换操作键控转换是基于键(Key)对数据进行分组和聚合的操作,是实现复杂业务逻辑的基础。

用户头像

郝培强

关注

还未添加个人签名 2025-11-19 加入

还未添加个人简介

评论

发布
暂无评论
从零开始学Flink:数据转换的艺术_郝培强_InfoQ写作社区