iOS Stream Audio from one iOS Device to Another

The API you need to look at is “Audio Queue Services”.

Right, here goes for a basic overview of what you need to do.

When you playback audio, you set up a queue or a service. That queue will ask for some audio data. Then, when it has played all that back, it will ask for some more. This goes on until you either stop the queue, or there’s no more data to playback.

The two main lower level APIs in iOS are Audio Unit and Audio Queue. By lower level, I mean an API that is a bit more nitty gritty than saying “just play back this mp3” or whatever.

My experience has been that Audio Unit is lower latency, but that Audio Queue is more suited to streaming audio. So, I think for you the latter is a better option.

A key part of what you need to do is buffering. That means loading data sufficiently so that there are no gaps in your playback. You might want to handle this by initially loading a larger amount of data. Then you are playing ahead. You’ll have a sufficiently large buffer in memory whilst simultaneously receiving more data on a background thread.

The sample project I would recommend studying closely is SpeakHere. In particular look at the classes SpeakHereController.mm and AQPlayer.mm.

The controller handles things like starting and stopping AQPlayer. AQPlayer represents an AudioQueue. Look closely at AQPlayer::AQBufferCallback. That’s the callback method that is invoked when the queue wants more data.

You’ll need to make sure that the set up of the queue data, and the format of the data you receive matches exactly. Checkout things like number of channels (mono or stereo?), number of frames, integers or floats, and sample rate. If anything doesn’t match up, you’ll either get EXC_BAD_ACCESS errors as you work your way through the respective buffers, or you’ll get white noise, or – in the case of wrong sample rates – audio that sounds slowed down or sped up.

Note that SpeakHere runs two audio queues; one for recording, and one for playback. All audio stuff works using buffers of data. So you’re always passing round pointers to the buffers. So, for example during playback you will have say a memory buffer that has 20 seconds of audio. Perhaps every second your callback will be invoked by the queue, essentially saying “give me another second’s worth of data please”. You could think of it as a playback head that moves through your data requesting more information.

Let’s look at this in a bit more detail. Differently to SpeakHere, you’re going to be working with in memory buffers rather than writing out the audio to a temporary file.

Note that if you’re dealing with large amounts of data, on an iOS device, you’ll have no choice but to hold the bulk of that on disk. Especially if the user can replay the audio, rewind it, etc., you’ll need to hold it all somewhere!

Anyway, assuming that AQPlayer will be reading from memory, we’ll need to alter it as follows.

First, somewhere to hold the data, in AQPlayer.h:

void SetAudioBuffer(float *inAudioBuffer) { mMyAudioBuffer = inAudioBuffer; }

You already have that data in an NSData object, so you can just pass in the pointer returned from a call to [myData bytes].

What provides that data to the audio queue? That’s the call back method set up in AQPlayer:

void AQPlayer::AQBufferCallback(void *                  inUserData,
                            AudioQueueRef           inAQ,
                            AudioQueueBufferRef     inCompleteAQBuffer) 

The method that we’ll use to add part of our data to the audio queue is AudioQueueEnqueueBuffer:

AudioQueueEnqueueBuffer(inAQ, inCompleteAQBuffer, 0, NULL);

inAQ is the reference to the queue as received by our callback.
inCompleteAQBuffer is the pointer to an audio queue buffer.

So how do you get your data – that is the pointer returned by calling the bytes method on your NSData object – into the audio queue buffer inCompleteAQBuffer?

Using a memcpy:

memcpy(inCompleteAQBuffer->mAudioData, THIS->mMyAudioBuffer + (THIS->mMyPlayBufferPosition / sizeof(float)), numBytesToCopy);

You’ll also need to set the buffer size:

        inCompleteAQBuffer->mAudioDataByteSize =  numBytesToCopy;   

numBytesToCopy is always going to be the same, unless you’re just about to run out of data. For example if your buffer is 2 seconds worth of audio data and you have 9 seconds to playback, then for the first four callbacks you will pass 2 second’s worth. For the final callback you will only have 1 second’s worth of data left. numBytesToCopy must reflect that.

    // Calculate how many bytes are remaining? It could be less than a normal buffer
    // size. For example, if the buffer size is 0.5 seconds and recording stopped
    // halfway through that. In which case, we copy across only the recorded bytes
    // and we don't enqueue any more buffers.
    SInt64 numRemainingBytes = THIS->mPlayBufferEndPosition - THIS->mPlayBufferPosition;

    SInt64 numBytesToCopy =  numRemainingBytes < THIS->mBufferByteSize ? numRemainingBytes : THIS->mBufferByteSize;

Finally, we advance the playback head. In our callback, we’ve given the queue some data to play. What happens next time we get the callback? We don’t want to give the same data again. Not unless you’re doing some funky dj loop stuff!

So we advance the head, which is basically just a pointer to our audio buffer. The pointer moves through the buffer like the needle on the record:

    SELF->mPlayBufferPosition += numBytesToCopy;

That’s it! There’s some other logic but you can get that from studying the full callback method in SpeakHere.

A couple of points I must emphasis. First, don’t just copy and paste my code above. Absolutely make sure you understand what you are doing. Undoubtably you’ll hit problems and you’ll need to understand what’s happening.

Secondly, make sure the audio formats are the same, and even better that you understand the audio format. This is covered in the Audio Queue Services Programming Guide in Recording Audio. Look at Listing 2-8 Specifying an audio queue’s audio data format.

It’s crucial to understand that you have the most primitive unit of data, either an integer or a float. Mono or stereo you have one or two channels in a frame. That defines how many integers or floats are in that frame. Then you have frames per packet (probably 1). Your sample rate determines how many of those packets you have per second.

It’s all covered in the docs. Just make sure everything matches up or you will have some pretty strange sounds!

Good luck!

Leave a Comment