Help with SAPI v5.1 SpeechRecognitionEngine always gives same wrong result with C#

How did you create your WAV file? It looks like it has a high bitrate. There are only certain formats supported by the recognizer. Try:

  • 8 bits per sample
  • single channel mono
  • 22,050 samples per second
  • PCM encoding

You have about 3 seconds of audio and the file size is 520 KB. That seems too big for the supported formats.

You can use the RecognizerInfo class to find the supported audio formats (SupportedAudioFormats) for your recognizer – RecognizerInfo.SupportedAudioFormats Property.

Update:

Your audio file is kind of a mess. It is very noisy. It is also in an unsupported format. Audacity reports it as stereo, 44.1 kHz, and 32-bit float. I silenced the noise in the beginning and end, resampled to 22.050 kHz, removed the stereo track, and then exported as uncompressed 8-bit unsigned WAV. It then works fine.

On my Windows 7 machine, my default recognizer supports only the following audio formats:

  0:
  Encodingformat = Pcm
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 16000

  1:
  Encodingformat = Pcm
  BitsPerSample = 16
  BlockAlign = 2
  ChannelCount = 1
  SamplesPerSecond  = 16000

  2:
  Encodingformat = Pcm
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

  3:
  Encodingformat = Pcm
  BitsPerSample = 16
  BlockAlign = 2
  ChannelCount = 1
  SamplesPerSecond  = 22050

  4:
  Encodingformat = ALaw
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

  5:
  Encodingformat = ULaw
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

You should also remove the numeric choices from the grammar. Right now the recognizer returns two alternates: “three” and “3”. This probably isn’t what you want. You could use a semantic result value in your grammar to return the number 3 for the word “three”.

Leave a Comment