Windows Media Foundation recording audio

I apologize for the late response, and I hope you can still find this valuable. I recently completed a project similar to yours (recording webcam video along with a selected microphone to a single video file with audio). The key is to creating an aggregate media source.

// http://msdn.microsoft.com/en-us/library/windows/desktop/dd388085(v=vs.85).aspx
HRESULT CreateAggregateMediaSource(IMFMediaSource *videoSource,
                                   IMFMediaSource *audioSource,
                                   IMFMediaSource **aggregateSource)
{
    *aggregateSource = nullptr;
    IMFCollection *pCollection = nullptr;

    HRESULT hr = ::MFCreateCollection(&pCollection);

    if (S_OK == hr)
        hr = pCollection->AddElement(videoSource);

    if (S_OK == hr)
        hr = pCollection->AddElement(audioSource);

    if (S_OK == hr)
        hr = ::MFCreateAggregateSource(pCollection, aggregateSource);

    SafeRelease(&pCollection);
    return hr;
}

When configuring the sink writer, you will add 2 streams (one for audio and one for video).
Of course, you will also configure the writer correctly for the input stream types.

HRESULT        hr                  = S_OK;
IMFMediaType  *videoInputType      = nullptr;
IMFMediaType  *videoOutputType     = nullptr;
DWORD          videoOutStreamIndex = 0u;
DWORD          audioOutStreamIndex = 0u;
IMFSinkWriter *writer              = nullptr;

// [other create and configure writer]

if (S_OK == hr))
    hr = writer->AddStream(videoOutputType, &videoOutStreamIndex);    

// [more configuration code]

if (S_OK == hr)
    hr = writer->AddStream(audioOutputType, &audioOutStreamIndex);

Then when reading the samples, you will need to pay close attention to the reader streamIndex, and sending them to the writer appropriately. You will also need to pay close attention to the format that the codec expects. For instance, IEEE float vs PCM, etc. Good luck, and I hope it is not too late.

Leave a Comment