It is a bit more complex than you described.
The auto.offset.reset
config kicks in ONLY if your consumer group does not have a valid offset committed somewhere (2 supported offset storages now are Kafka and Zookeeper), and it also depends on what sort of consumer you use.
If you use a high-level java consumer then imagine following scenarios:
-
You have a consumer in a consumer group
group1
that has consumed 5 messages and died. Next time you start this consumer it won’t even use thatauto.offset.reset
config and will continue from the place it died because it will just fetch the stored offset from the offset storage (Kafka or ZK as I mentioned). -
You have messages in a topic (like you described) and you start a consumer in a new consumer group
group2
. There is no offset stored anywhere and this time theauto.offset.reset
config will decide whether to start from the beginning of the topic (earliest
) or from the end of the topic (latest
)
One more thing that affects what offset value will correspond to earliest
and latest
configs is log retention policy. Imagine you have a topic with retention configured to 1 hour. You produce 5 messages, and then an hour later you post 5 more messages. The latest
offset will still remain the same as in previous example but the earliest
one won’t be able to be 0
because Kafka will already remove these messages and thus the earliest available offset will be 5
.
Everything mentioned above is not related to SimpleConsumer
and every time you run it, it will decide where to start from using the auto.offset.reset
config.
If you use Kafka version older than 0.9, you have to replace earliest
, latest
with smallest
,largest
.