How to get Kafka offsets for structured query for manual and reliable offset management?
Spark 2.2 introduced a Kafka’s structured streaming source. As I understand, it’s relying on HDFS checkpoint dir to store offsets and guarantee an “exactly-once” message delivery. Correct. Every trigger Spark Structured Streaming will save offsets to offset directory in the checkpoint location (defined using checkpointLocation option or spark.sql.streaming.checkpointLocation Spark property or randomly assigned) that is … Read more