ffmpeg wrong audio file after conversion in AAC

There is a padding frame in the audio stream which is needed by the decoder in order to decode the first frame. This is technical requirement of MDCT audio codecs like AAC. In a timed sample container like MP4/MKV, that first frame has a negative presentation timestamp. In a raw AAC bitstream, that first frame is naively decoded. Each frame has 1024 samples and so has a duration of 21-23 ms. Your difference in timing is due to that offset. Rewrap to a container like M4A to avoid this.

For background, from Apple:

AAC requires data beyond the source PCM audio samples in order to
correctly encode and decode audio samples due to the nature of the
encoding algorithm. AAC encoding uses a transform over consecutive
sets of 2048 audio samples, applied every 1024 audio samples
(overlapped). For correct audio to be decoded, both transforms for any
period of 1024 audio samples are needed. For this reason, encoders add
at least 1024 samples of silence before the first ‘true’ audio sample,
and often add more. This is called variously “priming”, “priming
samples”, or “encoder delay”.

and

The lack of explicit representation for encoder delay and remainder
samples is not a problem unique to AAC encoding. With MPEG-4 and
ADTS/MPEG-2 bitstreams and file containers, there is still no
satisfactory, explicit representation for either the encoder delay or
remainder samples. MP3 also has these data dependencies and delays in
its bitstream, as do proprietary codecs such as AC-3 and others.

Leave a Comment