Getting started with Speech Synthesis Markup Language (SSML)

    The World Wide Web Consortium (W3C) created Speech Synthesis Markup Language (SSML) as an XML-based markup language to assist in generating natural-sounding synthesized speech. The Plivo Speak XML element supports the generation of SSML-based speech, powered by Amazon Polly. It supports 27 languages and more than 40 voices, and allows developers to  control pronunciation, pitch, and volume.

    Here‘s how SSML appears within Plivo Speak XML elements:

    1
    2
    3
    4
    5
    6
    
    <Response>
        <Speak voice="MAN">Go Green, Go Plivo</Speak> //Basic Text-to-Speech
        <Speak voice="Polly.Joey">
            <emphasis level="moderate">Go Green, Go Plivo</emphasis> //Text-to-Speech using SSML
        </Speak>
    </Response>
    

    To synthesize SSML speech on Plivo, specify one of the Amazon Polly voices in the voice attribute of Plivo’s <Speak> XML tag. Note that Polly voices must be namespaced with a Polly prefix.

    For example:

    1
    2
    3
    4
    5
    
    <Response>
        <Speak voice="Polly.Joey">
            <emphasis level="moderate">Go Green, Go Plivo</emphasis>
        </Speak>
    </Response>
    

    SSML tags

    You can use these SSML tags within Plivo XML.

    SSML TagActionDescription
    <break>Add a pauseUse this tag to include a pause in the speech.
    <emphasis>Emphasize wordsUse this tag to change the rate and voice of the speech.
    <lang>Specify another language for specific wordsUse this tag to set the natural language of the text.
    <p>Add a pause between paragraphsUse this tag to represent a paragraph.
    <phoneme>Use phonetic pronunciationUse this tag to set phonetic pronunciation for specific text.
    <prosody>Control volume, speaking rate, and pitchUse this tag to modify the volume, speaking rate, and pitch of the tagged text.
    <s>Add a pause between sentencesUse this tag to represent a sentence. This adds a strong break before and after the tag.
    <say-as>Control how special types of words are spokenUse this tag to describe how to interpret the text.
    <sub>Pronounce acronyms and abbreviationsUse this tag to pronounce the specified words or phrases as different words or phrases.
    <w>Improve pronunciation by specifying parts of speechUse this tag to customize the pronunciation of words by specifying the part of speech they are.

    Note: Plivo doesn’t support these Amazon Polly-specific tags in Plivo XML:

    • <amazon:auto-breaths>
    • <amazon:effect name=”drc”>
    • <amazon:effect phonation=”soft”>
    • <amazon:effect vocal-tract-length>
    • <amazon: effect name=”whispered”>

    SSML voices

    Plivo supports these Amazon Polly voices for use with Plivo XML:

    LanguageFemaleMale
    Australian English (en-AU)Polly.NicolePolly.Russell
    Brazilian Portuguese (pt-BR)Polly.VitóriaPolly.Ricardo
    Canadian French (fr-CA)Polly.Chantal-
    Danish (da-DK)Polly.NajaPolly.Mads
    Dutch (nl-NL)Polly.LottePolly.Ruben
    French (fr-FR)Polly.Lea Polly.Celine
     Polly.Mathieu-
    German (de-DE)Polly.VickiPolly.Hans
     Polly.Marlene-
    Hindi (hi-IN)Polly.Aditi-
    Icelandic (is-IS)Polly.DoraPolly.Karl
    Indian English (en-IN)Polly.Raveena -
     Polly.Aditi-
    Italian (it-IT)Polly.CarlaPolly.Giorgio
    Japanese (ja-JP)Polly.MizukiPolly.Takumi
    Korean (ko-KR)Polly.Seoyeon-
    Mandarin Chinese (cmn-CN)Polly.Zhiyu-
    Norwegian (nb-NO)Polly.Liv-
    Polish (pl-PL)Polly.EwaPolly.Jacek
     Polly.MajaPolly.Jan
    Portuguese - Iberic (pt-PT)Polly.InesPolly.Cristiano
    Romanian (ro-RO)Polly.Carmen-
    Russian (ru-RU)Polly.TatyanaPolly.Maxim
    Spanish - Castilian (es-ES)Polly.ConchitaPolly.Enrique
    Spanish - Mexican (es-MX)Polly.Mia-
    US - Spanish (es-US)Polly.PenelopePolly.Miguel
     Polly.Lupe-Standard-
    Swedish (sv-SE)Polly.Astrid-
    Turkish (tr-TR)Polly.Filiz-
    UK English (en-GB)Polly.AmyPolly.Brian
     Polly.Emma-
    US English (en-US)Polly.JoannaPolly.Matthew
     Polly.SalliPolly.Justin
     Polly.KendraPolly.Joey
     Polly.Kimberly-
     Polly.Ivy-
    Welsh (cy-GB)Polly.Gwyneth-
    Welsh English (en-GB-WLS)-Polly.Geraint

    Character limit

    To ensure quick synthesis, Plivo caps the length of text that can be synthesized in one <Speak> tag at 3,000 characters.

    Pricing

    Support for SSML-based speech synthesis is currently in beta and free for all Plivo users. We expect to eventually charge for text-to-speech on the basis of the number of characters synthesized.

    SSML support in Plivo Server SDKs

    SSML tags are supported in all of our Server SDKs.

    Example

    This example use the Joey voice for US English (en-US). Use the <Speak voice> tag to specify the voice for your text.

    say-as

    The say-as tag describes how to interpret the text.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    
    from flask import Flask, Response, request, url_for
    from plivo import plivoxml
    
    app = Flask(__name__)
    
    @app.route("/ssml/", methods=["GET", "POST"])
    def ssml():
        element = plivoxml.ResponseElement()
        response = (
            element.add(
                plivoxml.SpeakElement(content="The date is", voice="Polly.Joey", language="en-US")
                .add_say_as("20200626", interpret_as="date")
            )
            .to_string(False)
        )
        print(response)
        return Response(response, mimetype="text/xml")
    
    if __name__ == "__main__":
        app.run(host="0.0.0.0", debug=True)
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    class PlivoController < ApplicationController
      def ssml
        response = Plivo::XML::Response.new
        speak_elem = response.addSpeak('The date is', voice: 'Polly.Joey', language: 'en-US')
        speak_elem.addSayAs('20200626', 'interpret-as' => 'date')
        xml = Plivo::XML::PlivoXML.new(response)
        puts xml.to_xml()
        render xml: xml.to_xml
      end
    end
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    
    var plivo = require('plivo');
    var express = require('express');
    var app = express();
    app.set('port', (process.env.PORT || 5000));
    app.use(express.static(__dirname + '/public'));
    
    app.all('/ssml/', function (request, response) {
        if (request.method == "GET") {
            var r = new plivo.Response();
            const speakElem = r.addSpeak('The date is', {
                'voice': 'Polly.Joey',
                'language': 'en-US'
            });
            speakElem.addSayAs('20200626', {
                'interpret-as': 'date',
            });
            console.log(r.toXML());
            response.set({
                'Content-Type': 'text/xml'
            });
            response.end(r.toXML());
        }
    });
    
    app.listen(app.get('port'), function () {
        console.log('Node app is running on port', app.get('port'));
    });
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    
    <?php
    
    namespace App\Http\Controllers;
    
    require '../vendor/autoload.php';
    use Plivo\RestClient;
    use Plivo\XML\Response;
    use Illuminate\Http\Request;
    
    class ReceivecallController extends Controller
    {
        public function ssml()
        {
            $response = new Response();
            $speak_elem = $response->addSpeak('The date is', ['language'=>"en-US", 'voice'=>"Polly.Joey"]);
            $speak_elem->addSayAs('20200626', ['interpret-as'=>"date"]);
            $xml_response = $response->toXML(); 
            return response($xml_response, 200)->header('Content-Type', 'application/xml');
        }
    }
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    
    package com.example.SsmlHandler;
    import com.plivo.api.exceptions.PlivoXmlException;
    import com.plivo.api.xml.*;
    import org.springframework.boot.SpringApplication;
    import org.springframework.boot.autoconfigure.SpringBootApplication;
    import org.springframework.web.bind.annotation.*;
    @SpringBootApplication
    @RestController
    public class SsmlApplication {
    	public static void main(String[] args) {
    		SpringApplication.run(SsmlHandlerApplication.class, args);
    	}
    	@RequestMapping(value = "/ssml/", produces = { "application/xml" }, method = { RequestMethod.GET, RequestMethod.POST })
    	public Response SsmlHandler() throws PlivoXmlException {
    		Response response = new Response().children(new Speak("The date is").
    						children(new SayAs("20200626", "date")));
    		System.out.println(response.toXmlString());
    		return response;
    	}
    }
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    
    package main
    
    import (
    	"net/http"
    
    	"github.com/go-martini/martini"
    	"github.com/plivo/plivo-go/xml"
    )
    
    func main() {
    	m := martini.Classic()
    	m.Any("/ssml/", func(w http.ResponseWriter, r *http.Request) string {
    		w.Header().Set("Content-Type", "application/xml")
    		response := xml.ResponseElement{
    			Contents: []interface{}{
    				new(xml.SpeakElement).
    					AddSpeak("The date is", "Polly.Joey", "en-US", 1).
    					AddSayAs("20200626", "date", ""),
    			},
    		}
    		return response.String()
    	})
    
    	m.Run()
    }
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    
    using System.Collections.Generic;
    using Plivo.XML;
    using Microsoft.AspNetCore.Mvc;
    
    namespace Voicemail.Controllers
    {
        public class SsmlController : Controller
        {
            // GET: /<controller>/
            public IActionResult Index()
            {
                var resp = new Response();
                Speak speak_elem = new Speak("The date is", new Dictionary<string, string>() {
                    {"voice","Polly.Joey"},
                    {"language","en-US"},
                });
                resp.Add(speak_elem);
                speak_elem.AddSayAs("20200626", new Dictionary<string, string>() {
                    { "interpret-as", "date" }
                });
                var output = resp.ToString();
                return this.Content(output, "text/xml");
            }
        }
    }
    

    The rendered XML document would be:

    <Response>
        <Speak voice="Polly.Joey">The date is
          <say-as interpret-as="date">20200626</say-as>
        </Speak>
    </Response>
    

    w

    The w tag lets you customize the pronunciation of a word by specifying its part of speech.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    
    from flask import Flask, Response, request, url_for
    from plivo import plivoxml
    
    app = Flask(__name__)
    
    @app.route("/ssml/", methods=["GET", "POST"])
    def ssml():
        element = plivoxml.ResponseElement()
        response = (
            element.add(
                plivoxml.SpeakElement(content="The word", voice="Polly.Joey", language="en-US")
                .add_say_as("read", interpret_as="characters")
                .add_s("may be interpreted as either the present simple form")
                .add_w("read", role="amazon:VB")
                .add_s("or the past participle form")
                .add_w("read", role="amazon:VBD")
            )
            .to_string(False)
        )
        print(response)
        return Response(response, mimetype="text/xml")
    
    if __name__ == "__main__":
        app.run(host="0.0.0.0", debug=True)
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    
    class PlivoController < ApplicationController
      def ssml
        response = Plivo::XML::Response.new
        speak_elem = response.addSpeak('The word', voice: 'Polly.Joey', language: 'en-US')
        speak_elem.addSayAs('read', 'interpret-as' => 'characters')
        speak_elem.addS('may be interpreted as either the present simple form')
        speak_elem.addW('read', 'role' => 'amazon:VB')
        speak_elem.addS('or the past participle form')
        speak_elem.addW('read', 'role' => 'amazon:VBD')
        xml = Plivo::XML::PlivoXML.new(response)
        puts xml.to_xml()
        render xml: xml.to_xml
      end
    end
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    
    var plivo = require('plivo');
    var express = require('express');
    var app = express();
    app.set('port', (process.env.PORT || 5000));
    app.use(express.static(__dirname + '/public'));
    
    app.all('/ssml/', function(request, response) {
        if (request.method == "GET") {
            var r = new plivo.Response();
            const speakElem = r.addSpeak('The word', {
                'voice': 'Polly.Joey',
                'language': 'en-US'
            });
            speakElem.addSayAs('read', {
                'interpret-as': 'characters'
            });
            speakElem.addS('may be interpreted as either the present simple form');
            speakElem.addW('read', {
                'role': 'amazon:VB'
            });
            speakElem.addS('or the past participle form');
            speakElem.addW('read', {
                'role': 'amazon:VBD'
            });
            console.log(r.toXML());
            response.set({
                'Content-Type': 'text/xml'
            });
            response.end(r.toXML());
        }
    });
    
    app.listen(app.get('port'), function() {
        console.log('Node app is running on port', app.get('port'));
    });
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    
    <?php
    
    namespace App\Http\Controllers;
    
    require '../vendor/autoload.php';
    use Plivo\RestClient;
    use Plivo\XML\Response;
    use Illuminate\Http\Request;
    
    class ReceivecallController extends Controller
    {
        public function ssml()
        {
            $response = new Response();
            $speak_elem = $response->addSpeak('The word', ['language'=>"en-US", 'voice'=>"Polly.Joey"]);
            $speak_elem->addSayAs('read', ['interpret-as'=>"characters"]);
            $speak_elem->addS('may be interpreted as either the present simple form');
            $speak_elem->addW('read', ['role'=>"amazon:VB"]);
            $speak_elem->addS('or the past participle form');
            $speak_elem->addW('read', ['role'=>"amazon:VBD"]);
            $xml_response = $response->toXML(); 
            return response($xml_response, 200)->header('Content-Type', 'application/xml');
        }
    }
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    
    package com.example.SsmlHandler;
    import com.plivo.api.exceptions.PlivoXmlException;
    import com.plivo.api.xml.*;
    import org.springframework.boot.SpringApplication;
    import org.springframework.boot.autoconfigure.SpringBootApplication;
    import org.springframework.web.bind.annotation.*;
    @SpringBootApplication
    @RestController
    public class SsmlApplication {
    	public static void main(String[] args) {
    		SpringApplication.run(SsmlHandlerApplication.class, args);
    	}
    	@RequestMapping(value = "/ssml/", produces = { "application/xml" }, method = { RequestMethod.GET, RequestMethod.POST })
    	public Response Ssml() throws PlivoXmlException {
    		Response response = new Response().children(new Speak("The word","Polly.Joey","en-US",1)
    				.children(new SayAs("read", "characters"))
    				.addS("may be interpreted as either the present simple form")
    				.addW("read", "amazon:VB")
    				.addS("or the past participle form")
    				.addW("read", "amazon:VBD"));
    		System.out.println(response.toXmlString());
    		return response;
    	}
    }
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    
    package main
    
    import (
    	"net/http"
    
    	"github.com/go-martini/martini"
    	"github.com/plivo/plivo-go/xml"
    )
    
    func main() {
    	m := martini.Classic()
    	m.Any("/ssml/", func(w http.ResponseWriter, r *http.Request) string {
    		w.Header().Set("Content-Type", "application/xml")
    		response := xml.ResponseElement{
    			Contents: []interface{}{
    				new(xml.SpeakElement).
    					AddSpeak("The word", "Polly.Joey", "en-US", 1).
    					AddSayAs("read", "characters", "").
    					AddS("may be interpreted as either the present simple form").
    					AddW("read", "amazon:VB").
    					AddS("or the past participle form").
    					AddW("read", "amazon:VBD"),
    			},
    		}
    		return response.String()
    	})
    
    	m.Run()
    }
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    
    using System.Collections.Generic;
    using Plivo.XML;
    using Microsoft.AspNetCore.Mvc;
    
    namespace Voicemail.Controllers
    {
        public class SsmlController : Controller
        {
            // GET: /<controller>/
            public IActionResult Index()
            {
                var resp = new Response();
                Speak speak_elem = new Speak("The word", new Dictionary<string, string>() {
                    {"voice","Polly.Joey"},
                    {"language","en-US"},
                });
                resp.Add(speak_elem);
                speak_elem.AddSayAs("read", new Dictionary<string, string>() {
                    { "interpret-as", "characters" }
                });
                speak_elem.AddS("may be interpreted as either the present simple form");
                speak_elem.AddW("read", new Dictionary<string, string>() {
                    { "role", "amazon:VB" }
                });
                speak_elem.AddS("or the past participle form");
                speak_elem.AddW("read", new Dictionary<string, string>() {
                    { "role", "amazon:VBD" }
                });
                var output = resp.ToString();
                return this.Content(output, "text/xml");
            }
        }
    }
    

    The rendered XML document would be:

    <Response>
        <Speak voice="Polly.Joey">The word
          <say-as interpret-as="characters">read</say-as>
          <s>
              may be interpreted as either the present simple form
          </s>
          <w role="amazon:VB">read</w>
          <s>or the past participle form</s>
          <w role="amazon:VBD">read</w>
        </Speak>
    </Response>
    

    More examples

    <Response>
        <Speak>I can speak in a 
          <prosody pitch="high">higher pitched voice</prosody>
          , or I can speak 
          <prosody pitch="low">in a lower pitched voice</prosody>
        </Speak>
    </Response>
    
    <Response>
        <Speak>I can speak 
          <prosody rate="x-slow">really slowly</prosody>
          , or  I can speak 
          <prosody rate="x-fast">really fast</prosody>
        </Speak>
    </Response>
    
    <Response>
        <Speak>I can also speak 
          <prosody volume="x-loud">very loudly</prosody>
          , or I can speak <prosody volume="x-soft">very quietly</prosody>. 
        </Speak>
    </Response>