Browser Speech Synthesis API with play, pause and stop functionality

The Browser Speech Synthesis API is a powerful tool that allows developers to add text-to-speech capabilities to their websites and applications with JavaScript. This API can be used to enhance the accessibility of your content by allowing users to listen to text rather than reading it.

You don't need any third party or library to implement this and it is fully cross browser compatible.

We will also be adding play, pause and stop buttons to control the reading of an article.

Example

Introduction to the Speech Synthesis API

The Speech Synthesis API is part of the Web Speech API, which includes features for both speech recognition and speech synthesis. The speechSynthesis object is a global variable that controls speech synthesis, and SpeechSynthesisUtterance is the interface representing the speech request.

HTML and CSS

Create buttons for play, pause and stop functionality. Wrap the content in a div and assign an id to both the buttons and the content wrapper for easy access in JavaScript.

<style>
#play.played, #pause.paused, #stop.stopped {
  background-color: #4CAF50; /* Green color for active state */
  color: white;
}
</style>

<div style="display:flex; gap: 12px;">
  <button id=play>Play</button>
  <button id=pause>Pause</button>
  <button id=stop>Stop</button>
</div>

<div id="article-content">
  <p>The content text that will be converted to from text to speech.</p>
</div>

JavaScript

Next, we write the JavaScript code to handle the speech control, text extraction and button functionality.

onload Function

This function is executed when the window has finished loading and checks if the browser supports the Web Speech API's speech synthesis feature.

onload = function() {
    if ('speechSynthesis' in window) {
        // Code for speech synthesis goes here
    } else {
        // Code to handle lack of support
        msg = document.createElement('h5');
        msg.textContent = "Detected no support for Speech Synthesis";
        msg.style.textAlign = 'center';
        msg.style.backgroundColor = 'red';
        msg.style.color = 'white';
        msg.style.marginTop = msg.style.marginBottom = 0;
        document.body.insertBefore(msg, document.querySelector('div'));
    }
}

If it does, the rest of the code will execute; otherwise, it will display a message indicating lack of support.

Initialize Controls and Variables

These lines select the play, pause, and stop buttons from the DOM using their IDs.

var playEle = document.querySelector('#play');
var pauseEle = document.querySelector('#pause');
var stopEle = document.querySelector('#stop');
var flag = false;

Add Event Listeners

We add click event listeners to our buttons to handle play, pause, and stop actions.

playEle.addEventListener('click', onClickPlay);
pauseEle.addEventListener('click', onClickPause);
stopEle.addEventListener('click', onClickStop);

Define the Play Function

The play function initiates the speech synthesis and handles resuming paused speech.

function onClickPlay() {
    if(!flag){
        flag = true;
        utterance = new SpeechSynthesisUtterance(document.querySelector('#article-content').textContent);
        utterance.voice = getVoices()[0];
        utterance.onend = function(){
            flag = false; 
            playEle.className = pauseEle.className = ''; 
            stopEle.className = 'stopped';
        };
        playEle.className = 'played';
        stopEle.className = '';
        speak(utterance);
    }
    if (paused) { // Unpause/resume narration
        playEle.className = 'played';
        pauseEle.className = '';
        resume();
    } 
}

The line utterance.voice = getVoices()[0]; selects the third available voice from the list of voices returned by getVoices(). Here's a breakdown:

utterance: This is an instance of SpeechSynthesisUtterance, which contains the text to be spoken and properties related to the speech.
voice: This property of the SpeechSynthesisUtterance instance allows you to specify which voice to use for the speech synthesis.
getVoices(): This method returns an array of SpeechSynthesisVoice objects, representing the voices available on the device.
[0]: This selects the first voice in the array returned by getVoices().

Define the Pause Function

The pause function pauses the speech synthesis if it is currently speaking.

function onClickPause() {
    if(speaking && !paused){ // Pause narration
        pauseEle.className = 'paused';
        playEle.className = '';
        pause();
    }
}

Define the Stop Function

The stop function stops the speech synthesis and resets the state.

function onClickPause() {
    if(speaking && !paused){ // Pause narration
        pauseEle.className = 'paused';
        playEle.className = '';
        pause();
    }
}

Putting It All Together

The complete code looks like this:

onload = function() {
    if ('speechSynthesis' in window) with(speechSynthesis) {

        var playEle = document.querySelector('#play');
        var pauseEle = document.querySelector('#pause');
        var stopEle = document.querySelector('#stop');
        var flag = false;

        playEle.addEventListener('click', onClickPlay);
        pauseEle.addEventListener('click', onClickPause);
        stopEle.addEventListener('click', onClickStop);

        function onClickPlay() {
            if(!flag){
                flag = true;
                utterance = new SpeechSynthesisUtterance(document.querySelector('#article-content').textContent);
                utterance.voice = getVoices()[0];
                utterance.onend = function(){
                    flag = false; 
                    playEle.className = pauseEle.className = ''; 
                    stopEle.className = 'stopped';
                };
                playEle.className = 'played';
                stopEle.className = '';
                speak(utterance);
            }
            if (paused) { // Unpause/resume narration
                playEle.className = 'played';
                pauseEle.className = '';
                resume();
            } 
        }

        function onClickPause() {
            if(speaking && !paused){ // Pause narration
                pauseEle.className = 'paused';
                playEle.className = '';
                pause();
            }
        }

        function onClickStop() {
            if(speaking){ // Stop narration
                stopEle.className = 'stopped';
                playEle.className = pauseEle.className = '';
                flag = false;
                cancel();
            }
        }

    } else { // Speech synthesis not supported
        msg = document.createElement('h5');
        msg.textContent = "Detected no support for Speech Synthesis";
        msg.style.textAlign = 'center';
        msg.style.backgroundColor = 'red';
        msg.style.color = 'white';
        msg.style.marginTop = msg.style.marginBottom = 0;
        document.body.insertBefore(msg, document.querySelector('div'));
    }
}

Explanation

Checking for Speech Synthesis Support: The script first checks if the browser supports the Speech Synthesis API.
Initializing Variables and Event Listeners: It then gets references to the play, pause, and stop buttons and attaches event listeners to them.
Handling Play, Pause, and Stop Actions: The functions onClickPlay, onClickPause, and onClickStop handle the respective actions, changing the button styles and controlling the speech synthesis accordingly.
Fallback for Unsupported Browsers: If the Speech Synthesis API is not supported, a message is displayed to the user.

Conclusion

Depending on your requirements for a text-to-speech solution, this may not provide the comprehensive features you're seeking. However, it offers an easy-to-implement option that can help you evaluate whether this approach could work for your website.