How to create or convert text to audio at chromium browser?

There are several possible workarounds that have found which provide the ability to create audio from text; two of which require requesting an external resource, the other uses meSpeak.js by @masswerk.

Using approach described at Download the Audio Pronunciation of Words from Google, which suffers from not being able to pre-determine which words actually exist as a file at the resource without writing a shell script or performing a HEAD request to check if a network error occurs. For example, the word “do” is not available at the resource used below.

window.addEventListener("load", () => {

  const textarea = document.querySelector("textarea");

  const audio = document.createElement("audio");

  const mimecodec = "audio/webm; codecs=opus";

  audio.controls = "controls";

  document.body.appendChild(audio);

  audio.addEventListener("canplay", e => {
    audio.play();
  });

  let words = textarea.value.trim().match(/\w+/g);

  const url = "https://ssl.gstatic.com/dictionary/static/sounds/de/0/";

  const mediatype = ".mp3";

  Promise.all(
    words.map(word =>
      fetch(`https://query.yahooapis.com/v1/public/yql?q=select * from data.uri where url="${url}${word}${mediatype}"&format=json&callback=`)
      .then(response => response.json())
      .then(({query: {results: {url}}}) =>
        fetch(url).then(response => response.blob())
        .then(blob => blob)
      )
    )
  )
  .then(blobs => {
    // const a = document.createElement("a");
    audio.src = URL.createObjectURL(new Blob(blobs, {
                  type: mimecodec
                }));
    // a.download = words.join("-") + ".webm";
    // a.click()
  })
  .catch(err => console.log(err));
});
<textarea>what it does my ninja?</textarea>

Resources at Wikimedia Commons Category:Public domain are not necessary served from same directory, see How to retrieve Wiktionary word content?, wikionary API – meaning of words.

If the precise location of the resource is known, the audio can be requested, though the URL may include prefixes other than the word itself.

fetch("https://upload.wikimedia.org/wikipedia/commons/c/c5/En-uk-hello-1.ogg")
.then(response => response.blob())
.then(blob => new Audio(URL.createObjectURL(blob)).play());

Not entirely sure how to use the Wikipedia API, How to get Wikipedia content using Wikipedia’s API?, Is there a clean wikipedia API just for retrieve content summary? to get only the audio file. The JSON response would need to be parsed for text ending in .ogg, then a second request would need to be made for the resource itself.

fetch("https://en.wiktionary.org/w/api.php?action=parse&format=json&prop=text&callback=?&page=hello")
.then(response => response.text())
.then(data => {
  new Audio(location.protocol + data.match(/\/\/upload\.wikimedia\.org\/wikipedia\/commons\/[\d-/]+[\w-]+\.ogg/).pop()).play()
})
// "//upload.wikimedia.org/wikipedia/commons/5/52/En-us-hello.ogg\"

which logs

Fetch API cannot load https://en.wiktionary.org/w/api.php?action=parse&format=json&prop=text&callback=?&page=hello. No 'Access-Control-Allow-Origin' header is present on the requested resource

when not requested from same origin. We would need to try to use YQL again, though not certain how to formulate the query to avoid errors.

The third approach uses a slightly modified version of meSpeak.js to generate the audio without making an external request. The modification was to create a proper callback for .loadConfig() method

fetch("https://gist.githubusercontent.com/guest271314/f48ee0658bc9b948766c67126ba9104c/raw/958dd72d317a6087df6b7297d4fee91173e0844d/mespeak.js")
  .then(response => response.text())
  .then(text => {
    const script = document.createElement("script");
    script.textContent = text;
    document.body.appendChild(script);

    return Promise.all([
      new Promise(resolve => {
        meSpeak.loadConfig("https://gist.githubusercontent.com/guest271314/8421b50dfa0e5e7e5012da132567776a/raw/501fece4fd1fbb4e73f3f0dc133b64be86dae068/mespeak_config.json", resolve)
      }),
      new Promise(resolve => {
        meSpeak.loadVoice("https://gist.githubusercontent.com/guest271314/fa0650d0e0159ac96b21beaf60766bcc/raw/82414d646a7a7ef11bb04ddffe4091f78ef121d3/en.json", resolve)
      })
    ])
  })
  .then(() => {
    // takes approximately 14 seconds to get here
    console.log(meSpeak.isConfigLoaded());
    meSpeak.speak("what it do my ninja", {
      amplitude: 100,
      pitch: 5,
      speed: 150,
      wordgap: 1,
      variant: "m7"
    });
})
.catch(err => console.log(err));

one caveat of the above approach being that it takes approximately 14 and a half seconds for the three files to load before the audio is played back. However, avoids external requests.

It would be a positive to either or both 1) create a FOSS, developer maintained database or directory of sounds for both common and uncommon words; 2) perform further development of meSpeak.js to reduce load time of the three necessary files; and use Promise based approaches to provide notifications of the progress of of the loading of the files and readiness of the application.

In this users’ estimation, it would be a useful resource if developers themselves created and contributed to an online database of files which responded with an audio file of the specific word. Not entirely sure if github is the appropriate venue to host audio files? Will have to consider the possible options if interest in such a project is shown.

Leave a Comment