Receive DTMF/Speech Input using .NET

    Overview

    Capturing user inputs is a critical capability in any phone system. User inputs, captured in the form of Dual-tone multi-frequency (DTMF), or digit press, inputs and speech inputs, are useful in many use-cases such as IVR phone systems, Conversational IVRs, Virtual Assistants, Voice-based forms and surveys, etc. Plivo offers powerful features on the Voice platform that you can use to implement your business use-cases that involve secure capture of DTMF inputs & speech inputs.

    Set Up Your .NET Dev Environment

    In this section, we’ll walk you through how to set up a .NET Framework app in under five minutes and start handling incoming calls & callbacks.

    Install .NET Framework

    You must set up and install Dotnet Framework(.NET Framework 4.6 or higher) and Plivo’s Dotnet SDK to receive incoming calls. Here’s how.

    Operating SystemInstructions
    OS X & LinuxTo see if you already have Dotnet Framework installed, run the command dotnet --version in the terminal. If you do not have it installed, you can install it from here.
    WindowsTo install Dotnet Framework on Windows follow the instructions listed here.

    Install Plivo .NET Package using Visual Studio

    • Create a MVC web app:

    Create a MVC app

    • Configure the MVC app and provide a project name:

    Configure the MVC app

    • Install the Plivo Nuget package

    Install Plivo Nuget Package

    Detect DTMF inputs

    Outline

    In this section, we will show you how to implement a multi-level IVR phone system and capture digit press inputs (DTMF) on the Plivo voice platform.

    Receive DTMF

    The example IVR phone tree below has been implemented using the GetInput XML feature:

    1. Caller dials a phone number, and a virtual assistant answers the call.
    2. The first branch of the IVR phone tree will include three choices, such as “Press 1 for your account balance. Press 2 for your account status. Press 3 to speak to a representative.”
    3. Options 1 and 2 will automatically retrieve the information and play the caller a text-to-speech message, and option 3 will redirect the caller to the second branch of the IVR.
    4. The second branch of the IVR will have two options, such as “Press 1 for Sales. Press 2 for Support.”
    5. If the caller press “1”, then the call will be connected to the sales representative, or if the caller press “2”, then the call will be connected to the support representative.

    Create a MVC Controller to Detect DTMF inputs

    Navigate to Controllers directory in “Receivecall” app and create a Controller named MultilevelIvrController.cs paste the following code.

    Create a MVC Controller

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    
    using System;
    using System.Collections.Generic;
    using Plivo.XML;
    using System.Diagnostics;
    using Microsoft.AspNetCore.Mvc;
    
    namespace Receivecall.Controllers {
      public class MultilevelIvrController: Controller {
    
        //  Welcome message - firstbranch
        String WelcomeMessage = "Welcome to the demo app, Press 1 for your account balance. Press 2 for your account status. Press 3 to talk to our representative";
        // Message for Second branch
        String RepresentativeBranch = "Press 1 to talk to our Sales representative. Press 2 to talk to our Support representative";
        // This is the message that Plivo reads when the caller does nothing at all
        String NoInput = "Sorry, I didn't catch that. Please hangup and try again later.";
        // This is the message that Plivo reads when the caller inputs a wrong digit.
        String WrongInput = "Sorry, it's a wrong input.";
    
        // GET: /<controller>/
        public IActionResult Index() {
          var resp = new Response();
          GetInput get_input = new GetInput("", new Dictionary < string, string > () {
            {
              "action",
              "https://abf0444f4108.ngrok.io/multilevelivr/firstbranch/"
            },
            {
              "method",
              "POST"
            },
            {
              "digitEndTimeout",
              "5"
            },
            {
              "inputType",
              "dtmf"
            },
            {
              "redirect",
              "true"
            },
          });
          resp.Add(get_input);
          get_input.AddSpeak(WelcomeMessage, new Dictionary < string, string > () {});
          resp.AddSpeak(NoInput, new Dictionary < string, string > () {});
    
          var output = resp.ToString();
          return this.Content(output, "text/xml");
        }
        // First branch of IVR phone tree
        public IActionResult FirstBranch() {
          String digit = Request.Form["Digits"];
          Debug.WriteLine("Digit pressed : {0}" + digit);
    
          var resp = new Response();
    
          if (digit == "1") {
            // Add Speak XML Tag
            resp.AddSpeak("Your account balance is $20.", new Dictionary < string, string > () {});
          }
          else if (digit == "2") {
            // Add Speak XML Tag
            resp.AddSpeak("Your account status is active.", new Dictionary < string, string > () {});
          }
          else if (digit == "3") {
            String getinput_action_url = "https://abf0444f4108.ngrok.io/multilevelivr/secondbranch/";
    
            // Add GetInput XML Tag
            GetInput get_input = new GetInput("", new Dictionary < string, string > () {
              {
                "action",
                getinput_action_url
              },
              {
                "method",
                "POST"
              },
              {
                "digitEndTimeout",
                "5"
              },
              {
                "inputType",
                "dtmf"
              },
              {
                "redirect",
                "true"
              },
            });
            resp.Add(get_input);
            get_input.AddSpeak(RepresentativeBranch, new Dictionary < string, string > () {});
            resp.AddSpeak(NoInput, new Dictionary < string, string > () {});
          }
          else {
            // Add Speak XML Tag
            resp.AddSpeak(WrongInput, new Dictionary < string, string > () {});
          }
    
          Debug.WriteLine(resp.ToString());
    
          var output = resp.ToString();
          return this.Content(output, "text/xml");
        }
        // Second branch of IVR phone tree
        public IActionResult SecondBranch() {
          String FromNumber = Request.Form["From"];
          var resp = new Response();
          String digit = Request.Form["Digits"];
          Debug.WriteLine("Digit pressed : {0}" + digit);
    
          // Add Speak XMLTag
          if (digit == "1") {
            Dial dial = new Dial(new
            Dictionary < string, string > () {
              {
                "callerId",
                FromNumber
              },
              {
                "action",
                "https://abf0444f4108.ngrok.io/multilevelivr/vmdrop/"
              },
              {
                "method",
                "POST"
              },
              {
                "redirect",
                "false"
              }
            });
    
            dial.AddNumber("14156667777", new Dictionary < string, string > () {});
            resp.Add(dial);
          }
          else if (digit == "2") {
            Dial dial = new Dial(new
            Dictionary < string, string > () {
              {
                "callerId",
                FromNumber
              },
              {
                "action",
                "https://abf0444f4108.ngrok.io/multilevelivr/vmdrop/"
              },
              {
                "method",
                "POST"
              },
              {
                "redirect",
                "false"
              }
            });
    
            dial.AddNumber("14156667778", new Dictionary < string, string > () {});
            resp.Add(dial);
          }
          else {
            resp.AddSpeak(WrongInput, new Dictionary < string, string > () {});
          }
    
          Debug.WriteLine(resp.ToString());
    
          var output = resp.ToString();
          return this.Content(output, "text/xml");
        }
      }
    }
    

    Before starting the app, you have to update Properties/launchSettings.json by setting the applicationUrl as

    "applicationUrl": "http://localhost:5000/"
    

    Run the project and you should see your basic server app in action on http://localhost:5000/multilevelivr/

    Control the gathering of DTMF inputs

    You can improve the functionality of DTMF collection by using the various attributes available for GetInput XML, such as digitEndTimeout, numDigit, finishOnKey, executionTimeout.

    digitEndTimeout: You can use this attribute to set the time interval between successive digit inputs. The default value is auto and the allowed values are 2 to 10 seconds or auto. If the end-user has not provided any new digit input within the digitEndTimeout period, the digits entered to that point will be processed.

    numDigits: You can use this attribute to set the maximum number of digits the end-user has to provide on the call in the current operation. The default value is 32 and the allowed values are 1 to 32.

    If the end-user provides more digit inputs than the numDigits allows, Plivo will only send the maximum number of digits specified as numDigits to the action URL and the rest of the digit inputs will be ignored. For example, if numDigits is specified as ‘4’ and if the user provides 5 digits, then the last digit input will be ignored.

    finishOnKey: You can use this attribute to define the key that end-users need to press to submit their digit input. The default value is # and the allowed values are 0-9, *, # OR <empty string>,‘none’. When you set the value as <empty string> or ‘none,’ the DTMF input collection will end depending on the timeout or the numDigits attribute.

    Note: The above three attributes apply to input types dtmf and dtmf speech and do not apply to the speech input type. Also, if all these three attributes are specified, the priority is for finishOnKey.

    executionTimeout: You may use this attribute to configure the maximum execution time during which the input detection will be performed. You can use this to process the next element in the XML response when the end-user does not provide any input on the call. The default value is 15seconds, and the allowed values are 5 to 60 seconds.

    Detect speech inputs

    In this segment, you can learn how to use the GetInput XML feature to capture speech inputs and implement a simple IVR phone system.

    Outline

    Receive DTMF

    Let’s consider the simple IVR phone tree below:

    1. Caller dials a phone number, and a virtual assistant answers the call.
    2. The first branch of the IVR phone tree will include two choices, such as “Say Sales to talk to our Sales representative. Say Support to talk to our Support representative”.
    3. If the caller says “sales” then the call will be connected to the sales representative or if the caller says “support” then the call will be connected to the support representative.

    Create a MVC Controller to Detect Speech inputs

    Navigate to Controllers directory in “Receivecall” app and create a Controller named IvrspeechController.cs paste the following code.

    Create a MVC Controller

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    
    using System;
    using System.Collections.Generic;
    using Plivo.XML;
    using System.Diagnostics;
    using Microsoft.AspNetCore.Mvc;
    
    namespace Receivecall.Controllers {
      public class IvrspeechController: Controller {
        //  Welcome message - firstbranch
        String WelcomeMessage = "Welcome to the demo app, Say Sales to talk to our Sales representative. Say Support to talk to our Support representative";
        // This is the message that Plivo reads when the caller does nothing at all
        String NoInput = "Sorry, I didn't catch that. Please hangup and try again later.";
        // This is the message that Plivo reads when the caller inputs a wrong digit.
        String WrongInput = "Sorry, it's a wrong input.";
    
        public IActionResult Index() {
          var resp = new Response();
          GetInput get_input = new GetInput("", new Dictionary < string, string > () {
            {
              "action",
              "https://3273948bbc57.ngrok.io/ivrspeech/firstbranch/"
            },
            {
              "method",
              "POST"
            },
            {
              "interimSpeechResultsCallback",
              "https://3273948bbc57.ngrok.io/ivrspeech/firstbranch/"
            },
            {
              "interimSpeechResultsCallbackMethod",
              "POST"
            },
            {
              "inputType",
              "speech"
            },
            {
              "redirect",
              "true"
            },
          });
          resp.Add(get_input);
          get_input.AddSpeak(WelcomeMessage, new Dictionary < string, string > () {});
          resp.AddSpeak(NoInput, new Dictionary < string, string > () {});
    
          var output = resp.ToString();
          return this.Content(output, "text/xml");
        }
        // First branch of IVR phone tree
        public IActionResult FirstBranch() {
          String speech = Request.Form["Speech"];
          String FromNumber = Request.Form["From"];
          Debug.WriteLine("Speech Input is :" + speech);
          Dial dial = new Dial(new
          Dictionary < string, string > () {
            {
              "callerId",
              FromNumber
            }
          });
    
          var resp = new Response();
    
          if (speech == "sales") {
            dial.AddNumber("14156667777", new Dictionary < string, string > () {});
            resp.Add(dial);
          }
          else if (speech == "support") {
            dial.AddNumber("14156667778", new Dictionary < string, string > () {});
            resp.Add(dial);
          }
          else {
            // Add Speak XML Tag
            resp.AddSpeak(WrongInput, new Dictionary < string, string > () {});
          }
    
          Debug.WriteLine(resp.ToString());
    
          var output = resp.ToString();
          return this.Content(output, "text/xml");
        }
      }
    }
    

    Before starting the app, you have to update Properties/launchSettings.json by setting the applicationUrl as

    "applicationUrl": "http://localhost:5000/"
    

    Run the project and you should see your basic server app in action on http://localhost:5000/ivrspeech/

    Speech recognition model & hints

    Speech Model

    You can select the type of Automatic Speech Recognition (ASR) Model using the speechModel attribute. Note that it is useful to select a speech recognition model based on your use-case.

    • You can set the speechModel as “command_and_search” for shorter audio clips. For example, if you expect callers to use voice commands or voice search, then you can use this model.
    • If you want to transcribe the audio from a phone call, you can set the model as “phone_call”.
    • You can explore both these models and see which one is best suited to your use-case.
    • You can set the model as “default” if your use-case does not suit the above models.

    Example XML:

    <Response>
    <GetInput action="https://example.com/action/" method="POST" inputType="speech" speechModel="command_and_search" redirect="true">
    <Speak>Welcome to the demo app, Say Sales to talk to our Sales representative. Say Support to talk to our Support representative</Speak>
    </GetInput>
    <Speak>Sorry, I didn't catch that. Please hangup and try again later.</Speak>
    </Response>
    

    Hints

    You can use the Hints attribute to improve speech transcription results. Using this attribute, you can define the words and phrases that would be common in your use-case. For example, if your use-case is a call-center, and callers would mostly use voice commands to connect to support & sales, you can use these keywords “support” & “sales” as hints.

    • Allowed values: a non-empty string of comma-separated phrases.
    • Limitations are:
      • Phrases per request: 500.
      • Characters per request: 10000.
      • Characters per phrase: 100.

    Example XML:

    <Response>
    <GetInput action="https://example.com/action/" method="POST" inputType="speech" hints="sales,support" redirect="true">
    <Speak>Welcome to the demo app, Say Sales to talk to our Sales representative. Say Support to talk to our Support representative</Speak>
    </GetInput>
    <Speak>Sorry, I didn't catch that. Please hangup and try again later.</Speak>
    </Response>
    

    Control the gathering of speech inputs

    You can improve the functionality of speech input collection by using the various attributes available for GetInput XML, such as speechEndTimeout, executionTimeout.

    speechEndTimeout: You can use this attribute to set the time that Plivo has to wait for more speech inputs once silence is detected. The default value is auto and the allowed values are 2 to 10 seconds or auto. If the end-user has not provided any new speech input within the speechEndTimeout period, the speech collected to that point will be processed.

    language: You can use this attribute to specify the language(along with the national/regional dialect) of the audio to be recognized on calls. The default language for speech detection is en-US. You can choose your preferred language from the language list available here.<hyperlink to the languages section in the same doc>.

    profanityFilter: If any profane words are used by end-users while providing speech inputs, Plivo will filter them out during transcription if you define this attribute as “true”. The profanity filter is used for single words and does not work for a combination of words. If you set this attribute to “false” or do not define this attribute, Plivo will not filter profane words by default, as the default value is “false.”

    Note: The above three attributes apply to input types speech and dtmf speech and do not apply to the dtmf input type.

    executionTimeout: You may use this attribute to configure the maximum execution time during which the speech detection will be performed. You can use this to process the next element in the XML response when the end-user does not provide any input on the call. The default value is 15seconds, and the allowed values are 5 to 60 seconds.

    Example XML

    <Response>
    <GetInput action="https://example.com/action/" method="POST" inputType="speech" speechEndTimeout="5" language="en-US" profanityFilter="true" executionTimeout="25" redirect="true">
    <Speak>Welcome to the demo app, Say Sales to talk to our Sales representative. Say Support to talk to our Support representative</Speak>
    </GetInput>
    <Speak>Sorry, I didn't catch that. Please hangup and try again later.</Speak>
    </Response>
    

    Real-time Speech Recognition

    You can use the interimSpeechResultsCallback attribute to perform real-time speech recognition. You can define the URL of your application server to this attribute and receive real-time callbacks of the user’s recognized speech while the user is still speaking on the call. Plivo sends the transcribed result to your server URL with attributes such as StableSpeech, UnstableSpeech, Stability, & SequenceNumber.

    • UnstableSpeech: This will hold the interim transcribed result of the user’s speech, which may be refined when more speech is collected from the user.
    • StableSpeech: This will hold the stable transcribed result of the user’s speech.
    • Stability: This field holds the UnstableSpeech stability score. Values range from 0.0 to 1.0, with 0.0 being completely unstable and 1.0 being completely stable. This value depicts the estimation of the probability that the recognizer will not change its guess about the interim speech result.
    • SequenceNumber: This argument will hold the sequence number of the interim speech callback that will help you to order the incoming callback requests.

    Example XML

    <Response>
    <GetInput action="https://example.com/action/" method="POST" interimSpeechResultsCallback="https://example.com/interimcallback/" interimSpeechResultsCallbackMethod="POST" inputType="speech" redirect="true">
    <Speak>Welcome to the demo app, Say Sales to talk to our Sales representative. Say Support to talk to our Support representative</Speak>
    </GetInput>
    <Speak>Sorry, I didn't catch that. Please hangup and try again later.</Speak>
    </Response>
    

    Data logging preferences

    You can use the log attribute of the GetInput XML to manage input logging preferences. If you define this attribute as “false” then logging will be disabled and Plivo will not log the digit and speech inputs. The default value for this is “true”.