Receive DTMF/Speech Input using PHP

    Overview

    Capturing user inputs is a critical capability in any phone system. User inputs, captured in the form of Dual-tone multi-frequency (DTMF), or digit press, inputs and speech inputs, are useful in many use-cases such as IVR phone systems, Conversational IVRs, Virtual Assistants, Voice-based forms and surveys, etc. Plivo offers powerful features on the Voice platform that you can use to implement your business use-cases that involve secure capture of DTMF inputs & speech inputs.

    Set Up Your PHP Dev Environment

    Operating SystemInstructions
    OS XYou can install PHP using the official installer. You can also install it from here.
    LinuxTo install PHP on Linux you can find the instructions here.
    WindowsTo install PHP on Windows you can use the official installer.

    Install Composer

    Composer is a dependency manager for PHP that is used in all modern PHP frameworks, such as Symfony and Laravel. We highly recommend using Composer as the package manager for your web project.

    1. Download the latest version of Composer.
    2. Run the following command in Terminal in order to run the composer:

       $ php ~/Downloads/composer.phar --version
      

      Note: PHAR (PHP archive) is an archive format for PHP that can be run on the command line

    3. Run the following command to make it executable:

       $ cp ~/Downloads/composer.phar /usr/local/bin/composer
       $ sudo chmod +x /usr/local/bin/composer
       $ Make sure you move the file to bin directory.
      
    4. To check if the path has /usr/local/bin, use

       $ echo $PATH
      

      If the path is different, use the following command to update the $PATH:

       $ export PATH = $PATH:/usr/local/bin
       $ source ~/.bash_profile
      

      Note: If your PATH doesn’t include /usr/local/bin directory, we recommend adding it so that you can access it globally.

    5. You can also check the version of Composer by running the following command:

       $ composer --version.       
      

    1. Run the following command:

       $ curl -sS https://getcomposer.org/installer | php
      
    2. Run the following command to make the composer.phar file as executable:

       $ chmod +x composer.phar
      

      Note: PHAR (PHP archive) is an archive format for PHP that can be run on the command line

    3. Run the following command to make Composer globally available for all system users:

       $ mv composer.phar /usr/local/bin/composer
      

    1. Download and run the Windows Installer for Composer.

      Note: Make sure to allow Windows Installer for Composer to make changes to your php.ini file.

    2. If you have any terminal windows open, close all instances and open a fresh terminal instance.
    3. Run the Composer command.

       $ composer -V
      

    Install Laravel & Create a Laravel Project

    • Use the below command to install Laravel:

      $ composer require laravel/installer
      

    As we have Laravel and its dependencies installed, we can use them to create a new Laravel project. As the initial step, using Laravel we can auto-generate code in the Laravel folder structure.

    • Change the directory to our project directory in the command line:

      $ cd mylaravelapp
      
    • Use the below command to start your Laravel project:

      $ composer create-project laravel/laravel quickstart --prefer-dist
      
    • To install the stable release, run the following command in the project directory:

      $ composer require plivo/plivo-php
      

    This will create a quickstart directory with the necessary folders & files for development.

    Install Plivo

    • To install a specific release, run the following command in the project directory:

      $ composer require plivo/plivo-php:4.15.0
      
    • Alternatively, you can download this source and run

      $ composer install
      

    This generates the autoload files, which you can include using the following line in your PHP source code to start using the SDK.

    <?php
    require 'vendor/autoload.php'
    

    Detect DTMF inputs

    Outline

    In this section, we will show you how to implement a multi-level IVR phone system and capture digit press inputs (DTMF) on the Plivo voice platform.

    Receive DTMF

    The example IVR phone tree below has been implemented using the GetInput XML feature:

    1. Caller dials a phone number, and a virtual assistant answers the call.
    2. The first branch of the IVR phone tree will include three choices, such as “Press 1 for your account balance. Press 2 for your account status. Press 3 to speak to a representative.”
    3. Options 1 and 2 will automatically retrieve the information and play the caller a text-to-speech message, and option 3 will redirect the caller to the second branch of the IVR.
    4. The second branch of the IVR will have two options, such as “Press 1 for Sales. Press 2 for Support.”
    5. If the caller press “1”, then the call will be connected to the sales representative, or if the caller press “2”, then the call will be connected to the support representative.

    Create a Laravel Controller to Detect DTMF Inputs

    Change the directory to our newly created project directory, i.e, quickstart directory and run the below command to create a Laravel controller for inbound calls.

    $ php artisan make:controller MultilevelivrController
    

    This will generate a controller named MultilevelivrController in the app/http/controllers/ directory. Now, You have to open the app/http/controllers/MultilevelivrController.php file and add the following code:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    
    <?php
    
    namespace App\Http\Controllers;
    require '../../vendor/autoload.php';
    use Plivo\RestClient;
    use Plivo\XML\Response;
    use Illuminate\Http\Request;
    
    class MultilevelivrController extends Controller
    {
        // GetInput XML to handle the incoming call
        public function detectDtmf()
        {
            $welcome_message = "Welcome to the demo app, Press 1 for your account balance. Press 2 for your account status. Press 3 to talk to our representative";//  Welcome message - firstbranch
            $no_input = "Sorry, I didn't catch that. Please hangup and try again later."; // This is the message that Plivo reads when the caller does nothing at all
            $response = new Response();
            $get_input = $response->addGetInput(
                        [
                            'action' => "https://dc40fc05d7dc.ngrok.io/firstBranch/",
                            'method' => "POST",
                            'digitEndTimeout' => "5",
                            'inputType' => "dtmf",
                            'redirect' => "true",
                        ]);
            $get_input->addSpeak($welcome_message, ['language'=>"en-US", 'voice'=>"Polly.Salli"]);
            $response->addSpeak($no_input);
            Header('Content-type: text/xml');
            echo $response->toXML();
        }
    
        // Action URL block for DTMF 
        public function firstBranch(Request $request)
        {
            $representative_branch = "Press 1 to talk to our Sales representative. Press 2 to talk to our Support representative"; // Message for Second branch
            $no_input = "Sorry, I didn't catch that. Please hangup and try again later."; // This is the message that Plivo reads when the caller does nothing at all
            $digit = $request->query('Digits');
            $response = new Response();
            
            if ($digit=="1") {
                $bal_message = "Your account balance is $20.";
                $response->addSpeak($bal_message);
            } elseif($digit=="2") {
                $stat_message = "Your account status is active.";
                $response->addSpeak($stat_message);
            } elseif($digit=="3") {
                $get_input = $response->addGetInput(
                            [
                                'action' => "https://dc40fc05d7dc.ngrok.io/secondBranch/",
                                'method' => "POST",
                                'digitEndTimeout' => "5",
                                'inputType' => "dtmf",
                                'redirect' => "true",
                            ]);
                $get_input->addSpeak($representative_branch, ['language'=>"en-US", 'voice'=>"Polly.Salli"]);
            } else {
                $response->addSpeak($no_input);
            } 
            Header('Content-type: text/xml');
            echo $response->toXML();
        }
    
        // Action URL block for Sales and Support branch 
        public function secondBranch(Request $request)
        {
            $wrong_input = "Sorry, it's a wrong input."; // This is the message that Plivo reads when the caller inputs a wrong digit.
            $digit = $request->query('Digits');
            $from_number = $request->query('From');
            $response = new Response();
            $params = array(
                'callerId' => $from_number
            );
            if ($digit=="1") {
                $dial = $response->addDial($params);
                $number = "<Number 1>";
                $dial->addNumber($number);
            } elseif($digit=="2") {
                $dial = $response->addDial($params);
                $number = "<Number 2>";
                $dial->addNumber($number);
            } else {
                $response->addSpeak($wrong_input);
            } 
            Header('Content-type: text/xml');
            echo $response->toXML();
        }
    }
    

    Add a Route

    Now, you need to add a route for all the functions in the MultilevelivrController class, open the routes/web.php file and add the below line at the end of the file:

    Route::match(['get', 'post'], '/detectdtmf', 'MultilevelivrController@detectDtmf');
    Route::match(['get', 'post'], '/firstbranch', 'MultilevelivrController@firstBranch');
    Route::match(['get', 'post'], '/secondbranch', 'MultilevelivrController@secondBranch');
    

    Control the gathering of DTMF inputs

    You can improve the functionality of DTMF collection by using the various attributes available for GetInput XML, such as digitEndTimeout, numDigit, finishOnKey, executionTimeout.

    digitEndTimeout: You can use this attribute to set the time interval between successive digit inputs. The default value is auto and the allowed values are 2 to 10 seconds or auto. If the end-user has not provided any new digit input within the digitEndTimeout period, the digits entered to that point will be processed.

    numDigits: You can use this attribute to set the maximum number of digits the end-user has to provide on the call in the current operation. The default value is 32 and the allowed values are 1 to 32.

    If the end-user provides more digit inputs than the numDigits allows, Plivo will only send the maximum number of digits specified as numDigits to the action URL and the rest of the digit inputs will be ignored. For example, if numDigits is specified as ‘4’ and if the user provides 5 digits, then the last digit input will be ignored.

    finishOnKey: You can use this attribute to define the key that end-users need to press to submit their digit input. The default value is # and the allowed values are 0-9, *, # OR <empty string>,‘none’. When you set the value as <empty string> or ‘none,’ the DTMF input collection will end depending on the timeout or the numDigits attribute.

    Note: The above three attributes apply to input types dtmf and dtmf speech and do not apply to the speech input type. Also, if all these three attributes are specified, the priority is for finishOnKey.

    executionTimeout: You may use this attribute to configure the maximum execution time during which the input detection will be performed. You can use this to process the next element in the XML response when the end-user does not provide any input on the call. The default value is 15seconds, and the allowed values are 5 to 60 seconds.

    Detect speech inputs

    In this segment, you can learn how to use the GetInput XML feature to capture speech inputs and implement a simple IVR phone system.

    Outline

    Receive DTMF

    Let’s consider the simple IVR phone tree below:

    1. Caller dials a phone number, and a virtual assistant answers the call.
    2. The first branch of the IVR phone tree will include two choices, such as “Say Sales to talk to our Sales representative. Say Support to talk to our Support representative”.
    3. If the caller says “sales” then the call will be connected to the sales representative or if the caller says “support” then the call will be connected to the support representative.

    Create a Laravel Controller to Detect Speech Inputs

    Change the directory to our newly created project directory, i.e, quickstart directory and run the below command to create a Laravel controller for inbound calls.

    $ php artisan make:controller SpeechdetectionController
    

    This will generate a controller named SpeechdetectionController in the app/http/controllers/ directory. Now, You have to open the app/http/controllers/SpeechdetectionController.php file and add the following code:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    
    <?php
    
    namespace App\Http\Controllers;
    require '../../vendor/autoload.php';
    use Plivo\RestClient;
    use Plivo\XML\Response;
    use Illuminate\Http\Request;
    
    class SpeechdetectionController extends Controller
    {
        // GetInput XML to handle the incoming call
        public function detectSpeech()
        {
            $welcome_message = "Welcome to the demo app, Say Sales to talk to our Sales representative. Say Support to talk to our Support representative"; //  Welcome message - firstbranch
            $no_input = "Sorry, I didn't catch that. Please hangup and try again later."; // This is the message that Plivo reads when the caller does nothing at all
            $response = new Response();
            $get_input = $response->addGetInput(
                        [
                            'action' => "https://dc40fc05d7dc.ngrok.io/repBranch/",
                            'method' => "POST",
                            'interimSpeechResultsCallback' => 'https://dc40fc05d7dc.ngrok.io/repBranch/',
                            'interimSpeechResultsCallbackMethod' => 'POST',
                            'inputType' => "speech",
                            'redirect' => "true",
                        ]);
            $get_input->addSpeak($welcome_message, ['language'=>"en-US", 'voice'=>"Polly.Salli"]);
            $response->addSpeak($no_input);
            Header('Content-type: text/xml');
            echo $response->toXML();
        }
    
        // Action URL block for Sales and Support branch 
        public function repBranch(Request $request)
        {
            $wrong_input = "Sorry, it's a wrong input."; // This is the message that Plivo reads when the caller inputs a wrong digit.
            $speech = $request->query('Speech');
            $from_number = $request->query('From');
            $response = new Response();
            $params = array(
                'callerId' => $from_number
            );
            if ($speech=="sales") {
                $dial = $response->addDial($params);
                $number = "<Number 1>";
                $dial->addNumber($number);
            } elseif($speech=="support") {
                $dial = $response->addDial($params);
                $number = "<Number 2>";
                $dial->addNumber($number);
            } else {
                $response->addSpeak($wrong_input);
            } 
            Header('Content-type: text/xml');
            echo $response->toXML();
        }
    }
    

    Add a Route

    Now, you need to add a route for all the functions in the MultilevelivrController class, open the routes/web.php file and add the below line at the end of the file:

    Route::match(['get', 'post'], '/detectspeech', 'SpeechdetectionController@detectSpeech');
    Route::match(['get', 'post'], '/repbranch', 'SpeechdetectionController@repBranch');
    

    Speech recognition model & hints

    Speech Model

    You can select the type of Automatic Speech Recognition (ASR) Model using the speechModel attribute. Note that it is useful to select a speech recognition model based on your use-case.

    • You can set the speechModel as “command_and_search” for shorter audio clips. For example, if you expect callers to use voice commands or voice search, then you can use this model.
    • If you want to transcribe the audio from a phone call, you can set the model as “phone_call”.
    • You can explore both these models and see which one is best suited to your use-case.
    • You can set the model as “default” if your use-case does not suit the above models.

    Example XML:

    <Response>
    <GetInput action="https://example.com/action/" method="POST" inputType="speech" speechModel="command_and_search" redirect="true">
    <Speak>Welcome to the demo app, Say Sales to talk to our Sales representative. Say Support to talk to our Support representative</Speak>
    </GetInput>
    <Speak>Sorry, I didn't catch that. Please hangup and try again later.</Speak>
    </Response>
    

    Hints

    You can use the Hints attribute to improve speech transcription results. Using this attribute, you can define the words and phrases that would be common in your use-case. For example, if your use-case is a call-center, and callers would mostly use voice commands to connect to support & sales, you can use these keywords “support” & “sales” as hints.

    • Allowed values: a non-empty string of comma-separated phrases.
    • Limitations are:
      • Phrases per request: 500.
      • Characters per request: 10000.
      • Characters per phrase: 100.

    Example XML:

    <Response>
    <GetInput action="https://example.com/action/" method="POST" inputType="speech" hints="sales,support" redirect="true">
    <Speak>Welcome to the demo app, Say Sales to talk to our Sales representative. Say Support to talk to our Support representative</Speak>
    </GetInput>
    <Speak>Sorry, I didn't catch that. Please hangup and try again later.</Speak>
    </Response>
    

    Control the gathering of speech inputs

    You can improve the functionality of speech input collection by using the various attributes available for GetInput XML, such as speechEndTimeout, executionTimeout.

    speechEndTimeout: You can use this attribute to set the time that Plivo has to wait for more speech inputs once silence is detected. The default value is auto and the allowed values are 2 to 10 seconds or auto. If the end-user has not provided any new speech input within the speechEndTimeout period, the speech collected to that point will be processed.

    language: You can use this attribute to specify the language(along with the national/regional dialect) of the audio to be recognized on calls. The default language for speech detection is en-US. You can choose your preferred language from the language list available here.<hyperlink to the languages section in the same doc>.

    profanityFilter: If any profane words are used by end-users while providing speech inputs, Plivo will filter them out during transcription if you define this attribute as “true”. The profanity filter is used for single words and does not work for a combination of words. If you set this attribute to “false” or do not define this attribute, Plivo will not filter profane words by default, as the default value is “false.”

    Note: The above three attributes apply to input types speech and dtmf speech and do not apply to the dtmf input type.

    executionTimeout: You may use this attribute to configure the maximum execution time during which the speech detection will be performed. You can use this to process the next element in the XML response when the end-user does not provide any input on the call. The default value is 15seconds, and the allowed values are 5 to 60 seconds.

    Example XML

    <Response>
    <GetInput action="https://example.com/action/" method="POST" inputType="speech" speechEndTimeout="5" language="en-US" profanityFilter="true" executionTimeout="25" redirect="true">
    <Speak>Welcome to the demo app, Say Sales to talk to our Sales representative. Say Support to talk to our Support representative</Speak>
    </GetInput>
    <Speak>Sorry, I didn't catch that. Please hangup and try again later.</Speak>
    </Response>
    

    Real-time Speech Recognition

    You can use the interimSpeechResultsCallback attribute to perform real-time speech recognition. You can define the URL of your application server to this attribute and receive real-time callbacks of the user’s recognized speech while the user is still speaking on the call. Plivo sends the transcribed result to your server URL with attributes such as StableSpeech, UnstableSpeech, Stability, & SequenceNumber.

    • UnstableSpeech: This will hold the interim transcribed result of the user’s speech, which may be refined when more speech is collected from the user.
    • StableSpeech: This will hold the stable transcribed result of the user’s speech.
    • Stability: This field holds the UnstableSpeech stability score. Values range from 0.0 to 1.0, with 0.0 being completely unstable and 1.0 being completely stable. This value depicts the estimation of the probability that the recognizer will not change its guess about the interim speech result.
    • SequenceNumber: This argument will hold the sequence number of the interim speech callback that will help you to order the incoming callback requests.

    Example XML

    <Response>
    <GetInput action="https://example.com/action/" method="POST" interimSpeechResultsCallback="https://example.com/interimcallback/" interimSpeechResultsCallbackMethod="POST" inputType="speech" redirect="true">
    <Speak>Welcome to the demo app, Say Sales to talk to our Sales representative. Say Support to talk to our Support representative</Speak>
    </GetInput>
    <Speak>Sorry, I didn't catch that. Please hangup and try again later.</Speak>
    </Response>
    

    Data logging preferences

    You can use the log attribute of the GetInput XML to manage input logging preferences. If you define this attribute as “false” then logging will be disabled and Plivo will not log the digit and speech inputs. The default value for this is “true”.