Flink 源码分析之 - 如何保存 offset
发布于: 2020 年 06 月 07 日
Flink对Offset的管理,有两种方式:
1.Checkpointing disabled 完全依赖于kafka自身的API
2.Checkpointing enabled 当checkpoint做完的时候,会将offset提交给kafka or zk
本文只针对于第二种,Checkpointing enabled
FlinkKafkaConsumerBase中的 notifyCheckpointComplete
@Override//当checkpoint完成的时候,此方法会被调用 public final void notifyCheckpointComplete(long checkpointId) throws Exception { if (!running) { LOG.debug("notifyCheckpointComplete() called on closed source"); return; } final AbstractFetcher<?, ?> fetcher = this.kafkaFetcher; if (fetcher == null) { LOG.debug("notifyCheckpointComplete() called on uninitialized source"); return; } if (offsetCommitMode == OffsetCommitMode.ON_CHECKPOINTS) { // only one commit operation must be in progress if (LOG.isDebugEnabled()) { LOG.debug("Committing offsets to Kafka/ZooKeeper for checkpoint " + checkpointId); } try { final int posInMap = pendingOffsetsToCommit.indexOf(checkpointId); if (posInMap == -1) { LOG.warn("Received confirmation for unknown checkpoint id {}", checkpointId); return; } @SuppressWarnings("unchecked") Map<KafkaTopicPartition, Long> offsets = (Map<KafkaTopicPartition, Long>) pendingOffsetsToCommit.remove(posInMap); // remove older checkpoints in map for (int i = 0; i < posInMap; i++) { pendingOffsetsToCommit.remove(0); } if (offsets == null || offsets.size() == 0) { LOG.debug("Checkpoint state was empty."); return; } //通过kafkaFetcher提交offset fetcher.commitInternalOffsetsToKafka(offsets, offsetCommitCallback); } catch (Exception e) { if (running) { throw e; } // else ignore exception if we are no longer running } } }
跳转到kafkaFetcher
@Override protected void doCommitInternalOffsetsToKafka( Map<KafkaTopicPartition, Long> offsets, @Nonnull KafkaCommitCallback commitCallback) throws Exception { @SuppressWarnings("unchecked") List<KafkaTopicPartitionState<TopicPartition>> partitions = subscribedPartitionStates(); Map<TopicPartition, OffsetAndMetadata> offsetsToCommit = new HashMap<>(partitions.size()); for (KafkaTopicPartitionState<TopicPartition> partition : partitions) { Long lastProcessedOffset = offsets.get(partition.getKafkaTopicPartition()); if (lastProcessedOffset != null) { checkState(lastProcessedOffset >= 0, "Illegal offset value to commit"); // committed offsets through the KafkaConsumer need to be 1 more than the last processed offset. // This does not affect Flink's checkpoints/saved state. long offsetToCommit = lastProcessedOffset + 1; offsetsToCommit.put(partition.getKafkaPartitionHandle(), new OffsetAndMetadata(offsetToCommit)); partition.setCommittedOffset(offsetToCommit); } } // record the work to be committed by the main consumer thread and make sure the consumer notices that consumerThread.setOffsetsToCommit(offsetsToCommit, commitCallback); }
可以看到调用consumerThread.setOffsetsToCommit方法
void setOffsetsToCommit( Map<TopicPartition, OffsetAndMetadata> offsetsToCommit, @Nonnull KafkaCommitCallback commitCallback) { // record the work to be committed by the main consumer thread and make sure the consumer notices that /* !=null的时候,说明kafkaConsumerThread更新的太慢了,新的将会覆盖old 当此处执行的时候,kafkaconsumerThread中consumer.commitAsync() 这个方法还是关键的方法,直接给nextOffsetsToCommit赋值了nextOffsetsToCommit,我们可以看到是AtomicReference,可以原子更新对象的引用 */ if (nextOffsetsToCommit.getAndSet(Tuple2.of(offsetsToCommit, commitCallback)) != null) { log.warn("Committing offsets to Kafka takes longer than the checkpoint interval. " + "Skipping commit of previous offsets because newer complete checkpoint offsets are available. " + "This does not compromise Flink's checkpoint integrity."); } // if the consumer is blocked in a poll() or handover operation, wake it up to commit soon handover.wakeupProducer(); synchronized (consumerReassignmentLock) { if (consumer != null) { consumer.wakeup(); } else { // the consumer is currently isolated for partition reassignment; // set this flag so that the wakeup state is restored once the reassignment is complete hasBufferedWakeup = true; } } }
nextOffsetsToCommit已经有值了,接下我们来看一下KafkaConsumerThread的run方法
@Override public void run() { // early exit check if (!running) { return; } ...... // main fetch loop while (running) { // check if there is something to commit//default false if (!commitInProgress) { // get and reset the work-to-be committed, so we don't repeatedly commit the same//setCommittedOffset方法已经给nextOffsetsToCommit赋值了,这里进行获取,所以commitOffsetsAndCallback is not null final Tuple2<Map<TopicPartition, OffsetAndMetadata>, KafkaCommitCallback> commitOffsetsAndCallback = nextOffsetsToCommit.getAndSet(null); if (commitOffsetsAndCallback != null) { log.debug("Sending async offset commit request to Kafka broker"); // also record that a commit is already in progress // the order here matters! first set the flag, then send the commit command. commitInProgress = true; consumer.commitAsync(commitOffsetsAndCallback.f0, new CommitCallback(commitOffsetsAndCallback.f1)); } } .... }
至此offset就更新完毕了,我们可以很清楚的看到,当checkpoint完成时,调用相关的commit方法,将kafka offset提交至kafka broker
划线
评论
复制
发布于: 2020 年 06 月 07 日阅读数: 77
版权声明: 本文为 InfoQ 作者【shengjk1】的原创文章。
原文链接:【http://xie.infoq.cn/article/4106f684280aa60ac7cb7c269】。
本文遵守【CC-BY 4.0】协议,转载请保留原文出处及本版权声明。
shengjk1
关注
还未添加个人签名 2018.04.26 加入
博客 https://blog.csdn.net/jsjsjs1789
评论