The possible reason could be that the recognizer_instance.energy_threshold
property is probably set to a value that is too high to start off with. You should decrease this threshold, or call recognizer_instance.adjust_for_ambient_noise(source, duration = 1)
. You can learn more about it at Speech Recognition