United States Patent5414796
Jacobs , ; et al.May 9, 1995

Title

Variable rate vocoder

Abstract

An apparatus and method for performing speech signal compression, by variable rate coding of frames of digitized speech samples. The level of speech activity for each frame of digitized speech samples is determined and an output data packet rate is selected from a set of rates based upon the determined level of frame speech activity. A lowest rate of the set of rates corresponds to a detected minimum level of speech activity, such as background noise or pauses in speech, while a highest rate corresponds to a detected maximum level of speech activity, such as active vocalization. Each frame is then coded according to a predetermined coding format for the selected rate wherein each rate has a corresponding number of bits representative of the coded frame. A data packet is provided for each coded frame with each output data packet of a bit rate corresponding to the selected rate.


Inventors:Jacobs; Paul E. (San Diego, CA), Gardner; William R.  (San Diego, CA), Lee; Chong U.  (San Diego, CA), Gilhousen; Klein S.  (San Diego, CA), Lam; S. Katherine  (San Diego, CA), Tsai; Ming-Chang  (San Diego, CA)
Assignee:QUALCOMM Incorporated (San Diego, CA)
Appl. No.:004484
Filed:January 14, 1993

Current U.S. Class:704/221 704/222 704/224 704/500 704/201 
Field of Search:381/30-31,35-36,38,40-41,47 395/2,2.1,2.3,2.31,2.33 375/27,122

U.S. Patent Documents
32580January 1888Atal et al.
3633107January 1972Brady
4012595March 1977Ota
4076958February 1978Fulghum
4214125July 1980Mozer et al.
4360708November 1982Taguchi et al.
4379949April 1983Chen et al.
4535472August 1985Tomcik
4610022September 1986Kitayama et al.
4672669January 1987DesBlache et al.
4672670June 1987Wang et al.
4677671June 1987Galand et al.
4771465September 1988Bronson et al.
4797925January 1989Lin
4797929January 1989Gerson et al.
4827517May 1989Atal et al.
4831636May 1989Taniguchi et al.
4843612June 1989Brusch et al.
4850022June 1989Honda et al.
4852179July 1989Fette
4856068August 1989Quatieri, Jr. et al.
4864561September 1989Ashenfelter et al.
4868867September 1989Davidson et al.
4885790December 1989McAulay et al.
4890327December 1989Bertrand et al.
4896361January 1990Gerson
4899384February 1990Crouse et al.
4899385February 1990Ketchum et al.
4903301February 1990Kondo et al.
4905288February 1990Gerson et al.
4918734April 1990Muramatsu et al.
4933957June 1990Bottau et al.
4937873June 1990McAulay et al.
4965789October 1990Bottau et al.
4991214February 1991Freeman et al.
5007092April 1991Galand et al.
5023910June 1991Thomson
5054072October 1991McAulay et al.
5060269October 1991Zinser
5077798December 1991Ichikawa et al.
5091945February 1992Kleijn
5093863March 1992Galand et al.
5103459April 1992Gilhousen et al.
5113448May 1992Nomura et al.
5140638August 1992Moulsley et al.
5159611October 1992Tomita et al.
5161210November 1992Druyvesteyn et al.
5187745February 1993Yip et al.
5202953April 1993Taguchi
5214741May 1993Akamine et al.
5222189June 1993Fielder
5235671August 1993Mazor
Other References
W Bastiaan Kleijn, Daniel J. Krasinski, and Richard H. Ketchum, "Fast Methods for the CLEP Speech Coding Algorithm". IEEE Transaction on Signal Processing vol. 38, No. 8 Aug. 1990. .
Richard C. Rose, Thomas P. Barnwell, III, "Design and Performance of an Analysis-by-Synthesis Class of Predictive Speech Coders". IEEE Transactions on Signal Processing, vol. 38, No. 9 Sep. 1990. .
Ioannis S. Dedes, Dhadesugoor R. Vaman, and Cadathur V. Chakravarthy, "Variable Rate Adaptive Predictive Filter". IEEE Transactions on Signal Processing. vol. 40, No. 3 Mar. 1992. .
"DSP Chips can Produce Random Numbers Using Proven Algorithm" by Paul Menner, EDN, Jan. 21, 1991, pp. 141-145. .
"Variable Rate Speech Coding: A Review" by N. S. Jayans, 1984 IEEE. .
"Variable Rate Speech Coding with Onling Segmentation and Fast Algebraic Codes" by R. DiFrancesco et al, 1990 IEEE pp. 233-236. .
"Finite State CELP for Variable Rate Speech Coding" by Saeed V. Vaseghi, 1990 IEEE, pp. 37-40. .
"Variable Rate Speech Coding for Asynchronous Transfer Mode" by Hiroshi Nakada et al., 1990 IEEE Transaction on Communications, vol. 38, No. 3, Mar. 1990, pp. 277-284. .
"A 4.8 KBPS Code Excited Linear Predictive Coder" by Thomas E. Tremain et al., Proceedings of the Mobile Satellite Conference, 1988. .
"Stochastic Coding of Speech Signals at Very Low Bit Rates" by Bishnu S. Atal, et al., 1984 IEEE. .
Taniguchi et al, "Combined source and channel coding based on multimode coding"; ICASSP 90: 1990 Inter. Conf. on Acoustics, Speech and Signal Processing; 3-6 Apr. 1990, pp. 477-480 vol. 1. .
Swaminathan et al, "Half Rate CELP codec candidate for North American digital cellular systems"; 1992 IEEE Inter. Conf. on Selected Topics in Wireless Communications; pp. 192-194. .
Taniguchi et al, "ADPCM with a multiquantizer for speech coding"; IEEE Journal on Selectid Areas in Communications; Feb. 1988, pp. 410-424, vol. 6 iss. 2. .
Atal et al. "Adaptive Predictive Coding of Speech Signals", The Bell System Technical Journal, Oct. 1970, pp. 229-238. .
Schroeder et al. "Stochastic Coding of Speech Signals at Very Low Bit Rates: the Importance of Speech Perception," Speech Communication 4 (1985), pp. 155-162. .
Schroeder et al. "Code-Excited Linear Prediction (CELP): High-Quality Speech at Very Low Bit Rates," 1985 IEEE Communications, pp. 9370940 (25.1.1-25.1.4). .
Bishnu S. Atal. "Predictive Coding of Speech at Low Bit Rates", IEEE Transactions on Communications, vol. Com-30, No. 4, Apr. 1982, pp. 600-614. .
Shinghai et al. "Improving Performance of Multi-Pulse LPC Coders at Low Bit Rates", IEEE Transactions on Communications, 1984, pp. 1.3.1-1.3.4. .
Atal et al. "Stochastic Coding of Speech Signals at Very Low Bit Rates," 1984 IEEE. .
Wang et al. "Phonetically-Based Vector Excitation Coding of Speech at 3.6 kbps," 1989 IEEE, pp. 49-52..~
Primary Examiner: MacDonald; Allen R.
Assistant Examiner: Hafiz; Tariq
Attorney, Agent or Firm:Miller; Russell B. English; Sean

Parent Case Text



This is a Continuation of application Ser. No. 07/713,661, filed Jun. 11, 1991, now abandoned.

Claims


We claim:
1. A method of speech signal compression, by variable rate coding of frames of digitized speech samples, comprising the steps of:
determining a level of speech activity for a frame of digitized speech samples;
selecting an encoding rate from a set of rates based upon said determined level of speech activity for said frame;
coding said frame according to a coding format of a set of coding formats for said selected rate wherein each rate has a corresponding different coding format and wherein each coding format provides for a different plurality of parameter signals representing said digitized speech samples in accordance with a speech model; and
generating for said frame a data packet of said parameter signals.

2. The method of claim 1 wherein said step of determining said level of frame speech activity comprises the steps of:
measuring speech activity in said frame of digitized speech samples;
comparing said measured speech activity with at least one speech activity threshold level of a predetermined set of activity threshold levels; and
adaptively adjusting in response to said comparison at least one of said at least one speech activity threshold levels with respect to a level of activity of a previous frame of digitized speech samples.

3. The method of claim 1 further comprising the steps of:
providing a rate command indicative of a preselected encoding rate for said frame; and
modifying said selected encoding rate to provide said preselected encoding rate for coding of said frame at said preselected encoding rate.

4. The method of claim 3 wherein said preselected rate is less than a predetermined maximum rate, said method further comprising the steps of:
providing an additional data packet; and
combining said data packet with said additional data packet within a transmission frame for transmission.

5. The method of claim 1 wherein said step of providing said data packet of said parameter signals comprises:
generating a variable number of bits to represent linear predictive coefficient (LPC) vector signals of said frame of digitized speech samples, wherein said variable number of bits representing said LPC vector signals is responsive to said measured speech activity level;
generating a variable number of bits to represent pitch vector signals of said frame of digitized speech samples, wherein said variable number of bits representing said pitch vector signals is responsive to said measured speech activity level; and
generating variable number of bits to represent codebook excitation vector signals of said frame of digitized speech samples, wherein said variable number of bits representing said codebook excitation vector signals is responsive to said measured speech activity level.

6. The method of claim 1 wherein said step coding said frame comprises:
generating for said frame a variable number of linear prediction coefficients wherein said variable number of said linear prediction coefficients is responsive to said selected encoding rate;
generating for said frame a variable number of pitch coefficients wherein said variable number of said pitch coefficients is responsive to said selected encoding rate; and
generating for said frame a variable number of codebook excitation values wherein said variable number of said codebook excitation values is responsive to said selected encoding rate.

7. The method of claim 1 wherein said step of determining a level of speech activity comprises summing the squares of the values of said digitized speech samples.

8. The method of claim 7 further comprising the step of generating error protection bits for said data packet.

9. The method of claim 8 wherein said step of generating error protection bits for said data packet wherein the number of said protection bits is responsive to said frame of speech activity level.

10. The method of claim 1 wherein said step of adaptively adjusting speech activity threshold levels comprises the steps of:
comparing said measured speech activity to said at least one of speech activity thresholds and incrementally increasing said at least one of speech activity thresholds toward the level of said frame speech activity when said frame speech activity exceeds said at least one of said speech activity thresholds; and
comparing said measured speech activity to said at least one of speech activity thresholds and decreasing said at least one of speech activity thresholds to the level of said frame speech activity when said frame speech activity is less than said at least one of speech activity thresholds.

11. The method of claim 10 wherein said step of selecting an encoding rate is responsive to an external rate signal.

12. The method of claim 8 wherein said step of generating error protection for said data packet further comprises determining the values of said error protection bits in accordance with a cyclic block code.

13. The method of claim 1 further comprising the step of pre-multiplying said digitized speech samples by a predetermined windowing function.

14. The method of claim 1 further comprising the step of converting said LPC coefficients to line spectral pair (LSP) values.

15. The apparatus of claim 1 wherein said input frame of digitized samples comprises digitized values for approximately twenty milliseconds of speech.

16. The apparatus of claim 1 wherein said input frame of digitized samples comprises approximately 160 digitized samples.

17. The apparatus of claim 1 wherein said output data packet comprises:
one hundred and seventy one bits comprised of forty bits for LPC data, forty bits for pitch data, eighty bits for excitation vector data and eleven bits for error protection when said output data rate is full rate;
eighty bits comprised of twenty bits for LPC information, twenty bits for pitch information and forty bits for excitation vector data when said output data rate is half rate;
forty bits comprised of ten bits for LPC information, ten bits for pitch information and twenty bits for excitation vector data when said output data rate is quarter rate; and
sixteen bits comprised of ten bits for LPC information and six bits for excitation vector information when said output data rate is eighth rate.

18. An apparatus for compressing an acoustical signal into variable rate data comprising:
means for determining a level of audio activity for an input frame of digitized samples of said acoustical signal;
means for selecting an output data rate from a predetermined set of rates based upon said determined level of audio activity within said frame;
means for coding said frame according to a coding format of a set of coding formats for said selected rate to provide a plurality of parameter signals wherein each rate has a corresponding different coding format with each coding format providing a different plurality of parameter signals representing said digitized speech samples in accordance with a speech model; and
means for providing for said frame a corresponding data packet at a data rate corresponding to said selected rate.

19. The apparatus for compressing an acoustical signal of claim 18 wherein said output data packet comprises:
a variable number of bits to represent LPC vector signals of said frame of digitized speech samples, wherein said variable number of bits for representing said LPC vector signals is responsive to said level of audio activity;
a variable number of bits to represent pitch vector signals of said frame of digitized speech samples, wherein said variable number of bits for representing said pitch vector signals is responsive to said level of audio activity; and
a variable number of bits to represent codebook excitation vector signals of said frame of digitized speech samples, wherein said variable number of bits for representing said codebook excitation vector signals is responsive to said level of audio activity.

20. The apparatus for compressing an acoustical signal of claim 18 wherein said means for determining said level of audio activity comprises:
means for determining an energy value for said input frame;
means for comparing said input frame energy with said at least one audio activity thresholds; and
means for providing an indication when said input frame activity exceeds each corresponding one of said at least one audio activity thresholds.

21. The apparatus of claim 20 further comprising a means for adaptively adjusting said at least one of said at least one audio activity thresholds.

22. The apparatus of claim 18 wherein said means for determining said energy of said input frame comprises:
squaring means for squaring said digitized audio samples of a frame; and
summing means for summing said squares of digitized audio samples of a frame.

23. The apparatus of claim 18 wherein said means for determining a level of audio activity comprises:
means for calculating a set of linear predictive coefficients for said input frame of digitized samples of said acoustical signals; and
means for determining said level of audio activity in accordance with at least one of said linear predictive coefficients.

24. The apparatus of claim 18 further comprising means for providing error protection bits for said data packet responsive to said selected output data rate.

25. The apparatus of claim 24 wherein said means for providing error protection bits provides the values of said error protection bits in accordance with a cyclic block code.

26. The apparatus of claim 18 further comprising a means for converting said LPC coefficients to line spectral pair (LSP) values.

27. The apparatus of claim 18 wherein said set of rates comprises full rate, half rate, quarter rate and eighth rate:

28. The apparatus of claim 18 wherein said set of rates comprises 16 Kbps, 8 Kbps, 4 Kbps and 2 Kbps.

29. A circuit for compressing an acoustical signal into variable rate data comprising:
a circuit for determining a level of audio activity for an input frame of digitized samples of said acoustical signal;
a circuit for selecting an output data rate from a predetermined set of rates based upon said determined level of audio activity within said frame;
a circuit for coding said frame according to a coding format of a set of coding formats for said selected rate to provide a plurality of parameter signals wherein each rate has a corresponding different coding format with each coding format providing a different plurality of parameter signals representing said digitized speech samples in accordance with a speech model; and
a circuit for providing for said frame a corresponding data packet at a data rate corresponding to said selected rate.

30. The circuit of claim 29 wherein said circuit for determining said level of audio activity comprises:
a circuit for determining an energy value for said input frame;
a circuit for comparing said input frame energy with said at least one audio activity thresholds; and
a circuit for providing an indication when said input frame activity exceeds each corresponding one of said at least one audio activity thresholds.

31. The circuit of claim 30 wherein said circuit for determining said level of audio activity further comprises a circuit for adaptively adjusting said at least one of said at least one audio activity thresholds.

32. The circuit of claim 29 wherein said circuit for determining said energy of said input frame determines said energy by summing the squares of the values of said digitized samples.

33. The circuit of claim 29 wherein said circuit for determining a level of audio activity determines said energy by calculating a set of linear predictive coefficients for said input frame and determines said level of audio activity in accordance with at least one of said linear predictive coefficients.

34. The circuit of claim 29 wherein said input frame of digitized samples comprises digitized speech for a duration of approximately twenty milliseconds.

35. The circuit of claim 29 wherein said input frame of digitized samples comprises 160 digitized samples.

36. The circuit of claim 29 further comprising means for providing error protection bits for said data packet responsive to said selected output data rate.

37. The circuit of claim 36 further comprises a means for determining the values of said error protection bits in accordance with a cyclic block code.

38. The circuit of claim 37 wherein said cyclic block code operates in accordance with a generator polynomial of 1+x.sup.3 +x.sup.5 +x.sup.6 +x.sup.8 +x.sup.9 +x.sup.10.

39. The circuit of claim 29 further comprising means for pre-multiplying said digitized samples by a predetermined windowing function.

40. The circuit claim 39 wherein said predetermined windowing function is a Hamming window.

41. The circuit of claim 29 wherein said set of rates comprises full rate, half rate, quarter rate and eighth rate.

42. The circuit for claim 29 wherein said set of rates comprises 16 Kbps, 8 Kbps, 4 Kbps and 2 Kbps.

43. The circuit of claim 29 wherein said output data packet comprises:
a variable number of bits to represent LPC vector signals of said frame of digitized speech samples, wherein said variable number of bits for representing said LPC vector signals is responsive to said level of audio activity;
a variable number of bits to represent pitch vector signals of said frame of digitized speech samples, wherein said variable number of bits for representing said pitch vector signals is responsive to said level of audio activity; and
a variable number of bits to represent codebook excitation vector signals of said frame of digitized speech samples, wherein said variable number of bits for representing said codebook excitation vector signals is responsive to said level of audio activity.

44. The circuit of claim 43 wherein said output data packet further comprises a variable number of bits for error protection, wherein said variable number of bits for error protection is responsive to said level of audio activity.

45. The circuit of claim 29 wherein said output data packet comprises:
one hundred and seventy one bits comprised of forty bits for LPC data, forty bits for pitch data, eighty bits for excitation vector data and eleven bits for error protection when said output data rate is full rate;
eighty bits comprised of twenty bits for LPC information, twenty bits for pitch information and forty bits for excitation vector data when said output data rate is half rate;
forty bits comprised of ten bits for LPC information, ten bits for pitch information and twenty bits for excitation vector data when said output data rate is quarter rate; and
sixteen bits comprised of ten bits for LPC information and six bits for excitation vector information when said output data rate is eighth rate.

46. The circuit of claim 29 wherein said means for selecting an encoding rate is responsive to an external rate signal.

47. The circuit of claim 29 further comprising a means for converting said LPC coefficients to line spectral pair (LSP) values.

48. A method of speech signal compression by variable rate coding of frames of digitized speech samples comprising the steps of:
multiplying one frame of digitized speech samples in a sequence of said frames of digitized speech samples by a windowing function to provide a windowed frame of speech data;
calculating a set of autocorrelation coefficients from said windowed frame of speech;
determining an encoding rate from said set of autocorrelation coefficients;
calculating from said set of autocorrelation coefficients a set of linear predictive coding (LPC) coefficients;
converting said set of LPC coefficients to a set of line spectral pair values;
quantizing said set of line spectral pair (LSP) coefficients in accordance with said rate command and said encoding rate;
selecting a pitch value from a predetermined set of pitch values to provide a selected pitch value for each pitch subframe in each frame of digitized speech;
quantizing said selected pitch value in accordance with said encoding rate and said rate command;
selecting a codebook value from a predetermined set of pitch values to provide a selected pitch value for a pitch frame;
quantizing said selected codebook value in accordance with said encoding rate and said rate command; and
generating an output data packet comprising said quantized line spectral pair values, quantized selected pitch value, and quantized selected codebook value.

Description

BACKGROUND OF THE INVENTION

I. Field of the Invention

The present invention relates to speech processing. Specifically, the present invention relates to a novel and improved method and system for compressing speech wherein the amount of compression dynamically varies while minimally impacting the quality of the reconstructed speech. Furthermore, since the compressed speech data is intended to be sent over a channel which may introduce errors, the method and system of the present invention also minimizes the impact of channel errors on voice quality.

II. Description of the Related Art

Transmission of voice by digital techniques has become widespread, particularly in long distance and digital radio telephone applications. This, in turn, has created interest in determining the least amount of information which can be sent over the channel which maintains the perceived quality of the reconstructed speech. If speech is transmitted by simply sampling and digitizing, a data rate on the order of 64 kilobits per second (kbps) is required to achieve a speech quality of conventional analog telephone. However, through the use of speech analysis, followed by the appropriate coding, transmission, and resynthesis at the receiver, a significant reduction in the data rate can be achieved.

Devices which employ techniques to compress voiced speech by extracting parameters that relate to a model of human speech generation are typically called vocoders. Such devices are composed of an encoder, which analyzes the incoming speech to extract the relevant parameters, and a decoder, which resynthesizes the speech using the parameters which it receives over the transmission channel. In order to be accurate, the model must be constantly changing. Thus the speech is divided into blocks of time, or analysis flames, during which the parameters are calculated. The parameters are then updated for each new frame.

Of the various classes of speech coders the Code Excited Linear Predictive Coding (CELP), Stochastic Coding or Vector Excited Speech Coding are of one class. An example of a coding algorithm of this particular class is described in the paper "A
4.8 kbps Code Excited Linear Predictive Coder" by Thomas E. Tremain et al., Proceedings of the Mobile Satellite Conference, 1988.

The function of the vocoder is to compress the digitized speech signal into a low bit rate signal by removing all of the natural redundancies inherent in speech. Speech typically has short term redundancies due primarily to the filtering operation of the vocal tract, and long term redundancies due to the excitation of the vocal tract by the vocal cords. In a CELP coder, these operations are modelled by two filters, a short term formant filter and a long term pitch filter. Once these redundancies are removed, the resulting residual signal can be modelled as white gaussian noise, which also must be encoded. The basis of this technique is to compute the parameters of a filter, called the LPC filter, which performs short-term prediction of the speech waveform using a model of the human vocal tract. In addition, long-term effects, related to the pitch of the speech, are modeled by computing the parameters of a pitch filter, which essentially models the human vocal chords. Finally, these filters must be excited, and this is done by determining which one of a number of random excitation waveforms in a codebook results in the closest approximation to the original speech when the waveform excites the two filters mentioned above. Thus the transmitted parameters relate to three items (1) the LPC filter, (2) the pitch filter and (3) the codebook excitation.

Although the use of vocoding techniques further the objective in attempting to reduce the amount of information sent over the channel while maintaining quality reconstructed speech, other techniques need be employed to achieve further reduction. One technique previously used to reduce the amount of information sent is voice activity gating. In this technique no information is transmitted during pauses in speech. Although this technique achieves the desired result of data reduction, it suffers from several deficiencies.

In many cases, the quality of speech is reduced due to clipping of the initial parts of word. Another problem with gating the channel off during inactivity is that the system users perceive the lack of the background noise which normally accompanies speech and rate the quality of the channel as lower than a normal telephone call. A further problem with activity gating is that occasional sudden noises in the background may trigger the transmitter when no speech occurs, resulting in annoying bursts of noise at the receiver.

In an attempt to improve the quality of the synthesized speech in voice activity gating systems, synthesized comfort noise is added during the decoding process. Although some improvement in quality is achieved from adding comfort noise, it does not substantially improve the overall quality since the comfort noise does not model the actual background noise at the encoder.

A more preferred technique to accomplish data compression, so as to result in a reduction of information that needs to be sent, is to perform variable rate vocoding. Since speech inherently contains periods of silence, i.e. pauses, the amount of data required to represent these periods can be reduced. Variable rate vocoding most effectively exploits this fact by reducing the data rate for these periods of silence. A reduction in the data rate, as opposed to a complete halt in data transmission, for periods of silence overcomes the problems associated with voice activity gating while facilitating a reduction in transmitted information.

It is therefore an object of the present invention to provide a novel and improved method and system for compressing speech using a variable rate vocoding technique.

SUMMARY OF THE INVENTION

The present invention implements a vocoding algorithm of the previously mentioned class of speech coders, Code Excited Linear Predictive Coding (CELP), Stochastic Coding or Vector Excited Speech Coding. The CELP technique by itself does provide a significant reduction in the amount of data necessary to represent speech in a manner that upon resynthesis results in high quality speech. As mentioned previously the vocoder parameters are updated for each frame. The vocoder of the present invention provides a variable output data rate by changing the frequency and precision of the model parameters.

The present invention differs most markedly from the basic CELP technique by producing a variable output data rate based on speech activity. The structure is defined so that the parameters are updated less often, or with less precision, during pauses in speech. This technique allows for an even greater decrease in the amount of information to be transmitted. The phenomenon which is exploited to reduce the data rate is the voice activity factor, which is the average percentage of time a given speaker is actually talking during a conversation. For typical two-way telephone conversations, the average data rate is reduced by a factor of 2 or more. During pauses in speech, only background noise is being coded by the vocoder. At these times, some of the parameters relating to the human vocal tract model need not be transmitted.

As mentioned previously a prior approach to limiting the amount of information transmitted during silence is called voice activity gating, a technique in which no information is transmitted during moments of silence. On the receiving side the period may be filled in with synthesized "comfort noise". In contrast, a variable rate vocoder is continuously transmitting data which in the preferred embodiment is at rates which range between approximately 8 kbps and 1 kbps. A vocoder which provides a continuous transmission of data eliminates the need for synthesized "comfort noise", with the coding of the background noise providing a more natural quality to the resynthesized speech. The present invention therefore provides a significant improvement in resynthesized speech quality over that of voice activity gating by allowing a smooth transition between speech and background.

The present invention further incorporates a novel technique for masking the occurrence of errors. Because the data is intended for transmission over a channel that may be noisy, a radio link for example, it must accommodate errors in the data. Previous techniques using channel coding to reduce the number of errors encountered can provide some success in reducing errors. However, channel coding alone does not fully provide the level of errors protection necessary to ensure high quality in the reconstructed speech. In the variable rate vocoder where vocoding is occurring continuously, an error may destroy data relating to some interesting speech event, such as the start of a word or a syllable. A typical problem with linear prediction coding (LPC) based vocoders, is that errors in the parameters relating to the vocal tract model will cause sounds which are vaguely human-like, and which may change the sound of the original word enough to confuse the listener. In the present invention, errors are masked to decrease their perceptibility to the listener. Error masking thus as implemented in the present invention provides a drastic decrease in the affect of errors on speech intelligibility.

Because the maximum amount that any parameter can change is limited to smaller ranges at low rates, errors in the parameters transmitted at these rates will affect speech quality less. Since errors in the different rates have different perceived effects on speech quality, the transmission system can be optimized to give more protection to the higher rate data. Therefore as an added feature, the present invention provides a robustness to channel errors.

The present invention in implementing a variable rate output version of the CELP algorithm results in speech compression which dynamically varies from 8:1 to 64:1 depending on the voice activity. The just mentioned compression factors are cited with reference to a .mu.law input, with the compression factors higher by a factor of 2 for a linear input. Rate determination is made on a frame by frame basis so as to take full advantage of the voice activity factor. Even though less data is produced for pauses in speech, the perceived degradation of the resynthesized background noise is minimized. Using the techniques of the present invention, near-toll quality speech can be achieved at a maximum data rate of 8 kbps and an average data rate on the order of 3.5 kbps in normal conversation.

Since the present invention enables short pauses in speech to be detected, a decrease in the effective voice activity factor is realized. Rate decisions can be made on a frame by frame basis with no hangover, so the data rate may be lowered for pauses in speech as short as the frame duration, typically 20 msec. in the preferred embodiment. Therefore pauses such as those between syllables may be captured. This technique decreases the voice activity factor beyond what has traditionally been considered, as not only long duration pauses between phrases, but also shorter pauses can be encoded at lower rates.

Since rate decisions are made on a frame basis, there is no clipping of the initial part of the word, such as in a voice activity gating system. Clipping of this nature occurs in voice activity gating system due to a delay between detection of the speech and a restart in transmission of data. Use of a rate decision based upon each frame results in speech where all transitions have a natural sound.

With the vocoder always transmitting, the speaker's ambient background noise will continually be heard on the receiving end thereby yielding a more natural sound during speech pauses. The present invention thus provides a smooth transition to background noise. What the listener hears in the background during speech will not suddenly change to a synthesized comfort noise during pauses as in a voice activity gating system.

Since background noise is continually vocoded for transmission, interesting events in the background can be sent with full clarity. In certain cases the interesting background noise may even be coded at the highest rate. Maximum rate coding may occur, for example, when there is someone talking loudly in the background, or if an ambulance drives by a user standing on a street corner. Constant or slowly varying background noise will, however, be encoded at low rates.

The use of variable rate vocoding has the promise of increasing the capacity of a Code Division Multiple Access (CDMA) based digital cellular telephone system by more than a factor of two. CDMA and variable rate vocoding are uniquely matched, since, with CDMA, the interference between channels drops automatically as the rate of data transmission over any channel decreases. In contrast, consider systems in which transmission slots are assigned, such as TDMA or FDMA. In order for such a system to take advantage of any drop in the rate of data transmission, external intervention is required to coordinate the reassignment of unused slots to other users. The inherent delay in such a scheme implies that the channel may be reassigned only during long speech pauses. Therefore, full advantage cannot be taken of the voice activity factor. However, with external coordination, variable rate vocoding is useful in systems other than CDMA because of the other mentioned reasons.

In a CDMA system speech quality can be slightly degraded at times when extra system capacity is desired. Abstractly speaking, the vocoder can be thought of as multiple vocoders all operating at different rates with different resultant speech qualities. Therefore the speech qualities can be mixed in order to further reduce the average rate of data transmission. Initial experiments show that by mixing full and half rate vocoded speech, e.g. the maximum allowable data rate is varied on a frame by frame basis between 8 kbps and 4 kbps, the resulting speech has a quality which is better than half rate variable, 4 kbps maximum, but not as good as full rate variable, 8 kbps maximum.

It is well known that in most telephone conversations, only one person talks at a time. As an additional function for full-duplex telephone links a rate interlock may be provided. If one direction of the link is transmitting at the highest transmission rate, then the other direction of the link is forced to transmit at the lowest rate. An interlock between the two directions of the link can guarantee no greater than 50% average utilization of each direction of the link. However, when the channel is gated off, such as the case for a rate interlock in activity gating, there is no way for a listener to interrupt the talker to take over the talker role in the conversation. The present invention readily provides the capability of a rate interlock by control signals which set the vocoding rate.

Finally, it should be noted that by using a variable rate vocoding scheme, signalling information can share the channel with speech data with a very minimal impact on speech quality. For example, a high rate frame may be split into two pieces, half for sending the lower rate voice data and the other half for the signalling data. In the vocoder of the preferred embodiment only a slight degradation in speech quality between full and half rate vocoded speech is realized. Therefore, the vocoding of speech at the lower rate for shared transmission with other data results in an almost imperceptible difference in speech quality to the user.

BRIEF DESCRIPTION OF THE DRAWINGS

The features, objects, and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout and wherein:

FIGS. 1a-1e illustrates in graphical form the vocoder analysis frames and subframes for various rates;

FIGS. 2a-2d are a series of charts illustrating the vocoder output bit distribution for various rates;

FIG. 3 is a generalized block diagram of an exemplary encoder;

FIG. 4 is an encoder flow chart;

FIG. 5 is a generalized block diagram of an exemplary decoder;

FIG. 6 is a decoder flow chart;

FIG. 7 is a more detailed functional block diagram of the encoder;

FIG. 8 is a block diagram of an exemplary Hamming window and autocorrelation subsystems;

FIG. 9 is a is a block diagram of an exemplary rate determination subsystem;

FIG. 10 is a block diagram of an exemplary LPC analysis subsystem;

FIG. 11 is a block diagram of an exemplary LPC to LSP transformation subsystem;

FIG. 12 is a block diagram of an exemplary LPC quantization subsystem;

FIG. 13 is a block diagram of exemplary LSP interpolation and LSP to LPC transformation subsystems;

FIG. 14 is a block diagram of the adaptive codebook for the pitch search;

FIG. 15 is a block diagram of the encoder' decoder;

FIG. 16 is a block diagram of the pitch search subsystem;

FIG. 17 is a block diagram of the codebook search subsystem;

FIG. 18 is a block diagram of the data packing subsystem;

FIG. 19 is a more detailed functional block diagram of the decoder;

FIGS. 20a-20d are charts illustrating the decoder received parameters and subframe decoding data for various rates;

FIGS. 21a-21c are charts further illustrating the decoder received parameters and subframe decoding data for special conditions;

FIG. 22 is a block diagram of the LSP inverse quantization subsystem;

FIG. 23 is a block diagram in greater detail of the decoder with postfiltering and automatic gain control; and

FIG. 24 is a chart illustrating the adaptive brightness filter characteristics.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

In accordance with the present invention, sounds such as speech and/or background noise are sampled and digitized using well known techniques. For example the analog signal may be converted to a digital format by the standard 8 bit/.mu.law format followed by a .mu.law/uniform code conversion. In the alternative, the analog signal may be directly converted to digital form in a uniform pulse code modulation (PCM) format. Each sample in the preferred embodiment is thus represented by one 16
bit word of data. The samples are organized into frames of input data wherein each frame is comprised of a predetermined number of samples. In the exemplary embodiment disclosed herein an 8 kHz sampling rate is considered. Each frame is comprised of
160 samples or of 20 msec. of speech at the 8 kHz sampling rate. It should be understood that other sampling rates and frame sizes may be used.

The field of vocoding includes many different techniques for speech coding, one of which is the CELP coding technique. A summary of the CELP coding technique is described in the previously mentioned paper "A 4.8 kbps Code Excited Linear Predictive Coder". The present invention implements a form of the CELP coding techniques so as to provide a variable rate in coded speech data wherein the LPC analysis is performed upon a constant number of samples, and the pitch and codebook searchs are performed on varying numbers of samples depending upon the transmission rate. In concept the CELP coding techniques as applied to the present invention are discussed with reference to FIGS. 3 and 5.

In the preferred embodiment of the present invention, the speech analysis frames are 20 msec. in length, implying that the extracted parameters are transmitted in a burst 50 times per second. Furthermore the rate of data transmission is varied from roughly 8 kbps to 4 kbps to 2 kbps, and to 1 kbps. At full rate (also referred to as rate 1), data transmission is at an 8.55 kbps rate with the parameters encoded for each frame using 171 bits including an 11 bit internal CRC (Cyclic Redundancy Check). Absent the CRC bits the rate would be 8 kbps. At half rate (also referred to as rate 1/2), data transmission is at a 4 kbps rate with the parameters encoded for each frame using 80 bits. At quarter rate (also referred to as rate 1/4), data transmission is at a 2 kbps rate with the parameters encoded for each frame using 40 bits. At eighth rate (also referred to as rate 1/8), data transmission is slightly less than a 1 kbps rate with the parameters encoded for each frame using 16 bits.

FIG. 1 graphically illustrates an exemplary analysis frame of speech data 10 and the relationship of a Hamming window 12 used in LPC analysis. LPC analysis frame, and pitch and codebook subframes for the different rates are illustrated in graphical form in FIGS. 2a-2d. It should be understood that the LPC analysis frame for all rates is the same size.

Referring now to the drawings, and in particular FIG. 1a, LPC analysis is accomplished using the 160 speech data samples of frame 10 which are windowed using Hamming window 12. As illustrated in FIG. 1a, the samples, s(n) are numbered 0-159
within each frame. Hamming window 12 is positioned such that it is offset within frame 10 by 60 samples. Thus Hamming window 12 starts at the 60.sup.th sample, s(59), of the current data frame 10 and continues through and inclusive of the 59.sup.th sample, s(58), of a following data frame 14. The weighted data generated for a current frame, frame 10, therefore also contains data that is based on data from the next frame, frame 14.

Depending upon the data transmission rate, searches are performed to compute the pitch filter and codebook excitation parameters multiple times on different subframes of data frame 10 as shown in FIGS. 1b-1e. It should be understood that in the preferred embodiment that only one rate is selected for frame 10 such that the pitch and codebook searches are done in various size subframes corresponding to the selected rate as described below. However for purposes of illustration, the subframe structure of the pitch and codebook searches for the various allowed rates of the preferred embodiment for frame 10 are shown in FIGS. 1b-1e.

At all rates, there is one LPC computation per frame 10 as illustrated in FIG. 1a. As illustrated in FIG. 1b, at full rate there are two codebook subframes 18 for each pitch subframe 16. At full rate there are four pitch updates, one for each of the four pitch subframes 16, each 40 samples long (5 msec.). Furthermore at full rate there are eight codebook updates, one for each of the eight codebook subframes 18, each 20 samples long (2.5 msec.).

At half rate, as illustrated in FIG. 1c, there are two codebook subframes 22 for each pitch subframe 20. Pitch is updated twice, once for each of the two pitch frames 20 while the codebook is updated four times, once for each of the four codebook subframe 22. At quarter rate, as illustrated in FIG. 1d, there are two codebook subframes 26 for the single pitch subframe 20. Pitch is updated once for pitch subframe 24 while the codebook twice, once for each of the two codebook subframe 26. As illustrated in FIG. 1e, at eighth rate, pitch is not determined and the codebook is updated only once in frame 28 which corresponds to frame 10.

Additionally, although the LPC coefficients are computed only once per frame, they are linearly interpolated, in a Line Spectral Pair (LSP) representation, up to four times using the resultant LSP frequencies from the previous frame to approximate the results of LPC analysis with the Hamming window centered on each subframe. The exception is that at full rate, the LPC coefficients are not interpolated for the codebook subframes. Further details on the LSP frequency computation is described later herein.

In addition to performing the pitch and codebook searches less often at lower rates, less bits are also allocated for the transmission of the LPC coefficients. The number of bits allocated at the various rates is shown in FIGS. 2a-2d. Each one of FIGS. 2a-2d represents the number of vocoder encoded data bits allocated to each 160 sample frame of speech. In FIGS. 2a-2d, the number in the respective LPC block 30a-30d is the number of bits used at the corresponding rate to encode the short term LPC coefficients. In the preferred embodiment the number of bits used to encode the LPC coefficients at full, half, quarter and eighth rates are respectively 40, 20, 10 and 10.

In order to implement variable rate coding, the LPCs are first transformed into Line Spectrum Pairs (LSP) and the resulting LSP frequencies are individually encoded using DPCM coders. The LPC order is 10, such that there are 10 LSP frequencies and 10 independent DPCM coders. The bit allocation for the DPCM coders is according to Table I.

TABLE I ______________________________________ DPCM CODER NUMBER 1 2 3 4 5 6 7 8 9 10 ______________________________________ RATE 1 4 4 4 4 4 4 4 4 4 4 RATE 1/2 2 2 2 2 2 2 2 2 2 2 RATE 1/4 1 1 1 1 1 1 1 1 1 1 RATE 1/8 1 1 1 1 1 1 1 1 1
1 ______________________________________

Both at the encoder and the decoder the LSP frequencies are converted back to LPC filter coefficients before for use in the pitch and codebook searches.

With respect to the pitch search, at full rate as illustrated in FIG. 2a, the pitch update is computed four times, once for each quarter of the speech frame. For each pitch update at the full rate, 10 bits are used to encode the new pitch parameters. Pitch updates are done a varying numbers of times for the other rates as shown in FIGS. 2b-2d. As the rate decreases the number of pitch updates also decreases. FIG. 2b illustrates the pitch updates for half rate which are computed twice, once for each half of the speech frame. Similarly FIG. 2c illustrates the pitch updates for quarter rate which is computed once every full speech frame. As was for full rate, 10 bits are used to encode the new pitch parameters for each half and quarter rate pitch update. However for eighth rate, as illustrated in FIG. 2d, no pitch update is computed since this rate is used to encode frames when little or no speech is present and pitch redundancies do not exist.

For each 10 bit pitch update, 7 bits represent the pitch lag and 3 bits represent the pitch gain. The pitch lag is limited to be between 17 and 143. The pitch gain is linearly quantized to between 0 and 2 for representation by the 3 bit value.

With respect to the codebook search, at full rate as illustrated in FIG. 2a, the codebook update is computed eight times, once for each eighth of the speech frame. For each codebook update at the full rate, 10 bits are used to encode the new codebook parameters. Codebook updates are done a varying number of times in the other rates as shown in FIGS. 2b-2d. However, as the rate decreases the number of codebook updates also decreases. FIGS. 2b illustrates the codebook updates for half rate which is computed four times, once for each quarter of the speech frame. FIG. 2c illustrates the codebook updates for quarter rate which is computed twice, once for each half of the speech frame. As was for full rate, 10 bits are used to encode the new codebook parameters for each half and quarter rate pitch update. Finally, FIG. 2d illustrates the codebook updates for eighth rate which is computed once every full speech frame. It should be noted that at eighth rate 6 are transmitted, 2 bits representative of the codebook gain while the other 4 bits are random bits. Further discussion on the bit allocations for the codebook updates are described in further detail below.

The bits allocated for the codebook updates represent the data bits needed to vector quantize the pitch prediction residual. For full, half and quarter rates, each codebook update is comprised of 7 bits of codebook index plus 3 bits of codebook gain for a total of 10 bits. The codebook gain is encoded using a differential pulse code modulation (DPCM) coder operating in the log domain. Although a similar bit arrangement can be used for eighth rate, an alternate scheme is preferred. At eighth rate codebook gain is represented by 2 bits while 4 randomly generated bits are used with the received data as a seed to a pseudorandom number generator which replaces the codebook.

Referring to the encoder block diagram illustrated in FIG. 3, the LPC analysis is done in an open-loop mode. From each frame of input speech samples s(n) the LPC coefficients (.alpha..sub.1 -.alpha..sub.10) are computed, as described later, by LPC analysis/quantization 50 for use in formant synthesis filter 60.

The computation of the pitch search, however, is done in a closed-loop mode, often referred to as an analysis-by-synthesis method. However, in the exemplary implementation novel hybrid closed-loop/open-loop technique is used in conducting the pitch search. In the pitch search encoding is performed by selecting parameters which minimize the mean square error between the input speech and the synthesized speech. For purposes of simplification in this portion of the discussion the issue of rate is not considered. However further discussion on the effect of the selected rate on pitch and codebook searches is discussed in more detail later herein.

In the conceptual embodiment illustrated in FIG. 3, perceptual weighting filter 52 is characterized by the following equations: ##EQU1## is the formant prediction filter and .mu. is a perceptual weighting parameter, which in the exemplary embodiment .mu.=0.8. Pitch synthesis filter 58 is characterized by the following equation: ##EQU2## Formant synthesis filter 60, a weighted filter as discussed below, is characterized by the following equation: ##EQU3##

The input speech samples s(n) are weighted by perceptual weighting filter 52 so that the weighted speech samples x(n) are provided to a sum input of adder 62. Perceptual weighting is utilized to weight the error at the frequencies where there is less signal power. It is at these low signal power frequencies that the noise is more perceptually noticeable. The synthesized speech samples x'(n) are output from formant synthesis filter 60 to a difference input of adder 62 where subtracted from the x(n) samples. The difference in samples output from adder 62 are input to mean square error (MSE) element 64 where they are squared and then summed. The results of MSE element 64 are provided to minimization element 66 which generates values for pitch lag L, pitch gain b, codebook index I and codebook gain.

In minimization element 66 all possible values for L, the pitch lag parameter in P(z), are input to pitch synthesis filter 58 along with the value c(n) from multiplier 56. During the pitch search there is no contribution from the codebook, i.e. c(n)=0. The values of L and b that minimize the weighted error between the input speech and the synthesized speech are chosen by minimization element 66. Pitch synthesis filter 58 generates and outputs the value p(n) to formant synthesis filter 60. Once the pitch lag L and the pitch gain b for the pitch filter are found, the codebook search is performed in a similar manner.

It should be understood that FIG. 3 is a conceptual representation of the analysis-by-synthesis approach taken in the present invention. In the exemplary implementation of the present invention, the filters are not used in the typical closed loop feedback configuration. In the present invention, the feedback connection is broken during the search and replaced with an open loop formant residual, the details of which are provided later herein.

Minimization element 66 then generates values for codebook index I and codebook gain G. The output values from codebook 54, selected from a plurality of random gaussian vector values according to the codebook index I, are multiplied in multiplier
56 by the codebook gain G to produce the sequence of values c(n) used in pitch synthesis filter 58. The codebook index I and the codebook gain G that minimize the mean square error are chosen for transmission.

It should be noted that perceptual weighting W(z) is applied to both the input speech by perceptual weighting filter 52 and the synthesized speech by the weighting function incorporated within formant synthesis filter 60. Formant synthesis filter 60 is therefore actually a weighted formant synthesis filter, which combines the weighting function of equation 1 with the typical formant prediction filter characteristic 1/A(z) to result in the weighted formant synthesis function of equation 3.

It should be understood that in the alternative, perceptual weighting filter 52 may be placed between adder 62 and MSE element 64. In this case formant synthesis filter 60 would have the normal filter characteristic of 1/A(z).

FIG. 4 illustrates a flow chart of the steps involved in encoding speech with the encoder of FIG. 3. For purposes of explanation steps involving rate decision are included in the flow chart of FIG. 4. The digitized speech samples are obtained, block 80, from the sampling circuitry from which the LPC coefficients are then calculated, block 82. As part of the LPC coefficient calculation Hamming window and autocorrelation techniques are used. An initial rate decision is made, block 84, for the frame of interest based on frame energy in the preferred embodiment.

In order to efficiently code the LPC coefficients in a small number of bits, the LPC coefficients are transformed into Line Spectrum Pair (LSP) frequencies, block 86, and then quantized, block 88, for transmission. As an option an additional rate determination may be made, block 90, with an increase in the rate being made if the quantization of the LSPs for the initial rate is deemed insufficient, block 92.

For the first pitch subframe of the speech frame under analysis the LSP frequencies are interpolated and transformed to LPC coefficients, block 94, for use in conducting the pitch search. In the pitch search the codebook excitation is set to zero. In the pitch search, blocks 96 and 98, which is an analysis by synthesis method as previously discussed, for each possible pitch lag L the synthesized speech is compared with the original speech. For each value of L, an integer value, the optimum pitch gain b is determined. Of the sets of L and b values, the optimal L and b value set provide the minimum perceptually weighted mean square error between the synthesized speech and the original speech. For the determined optimum values of L and b for that pitch subframe, the value b is quantized, block 100, fox; transmission along with the corresponding L value. In an alternate implementation of the pitch search, the values b and L may be quantized values as part of the pitch search with these quantized values being used in conducting the pitch search. Therefore, in this implementation the need for quantization of the selected b value after the pitch search, block 100, is eliminated.

For the first codebook subframe of the speech frame under analysis the LSP frequencies are interpolated and transformed to LPC coefficients, block 102, for use in conducting the codebook search. In the exemplary embodiment however, at full rate the LSP frequencies are interpolated only down to the pitch subframe level. This interpolation and transformation step is performed for the codebook search in addition to that of the pitch search due to a difference in pitch and codebook subframe sizes for each rate, except for rate 1/8 where the issue is moot since no pitch data is computed. In the codebook search, blocks 104 and 106, the optimum pitch lag L and pitch gain b values are used in the pitch synthesis filter such that for each possible codebook index I the synthesized speech is compared with the original speech. For each value of I, an integer value, the optimum codebook gain G is determined. Of the sets of I and G values, the optimal I and G value set provides the minimum error between the synthesized speech and the original speech. For the determined optimum values of I and G for that codebook subframe, the value G is quantized, block 108, for transmission along with the corresponding I value. Again in an alternate implementation of the codebook search, the value of G may quantized as part of the codebook search with these quantized values being used in conducting the codebook search. In this alternate implementation the need for quantization of the selected G value after the codebook search, block 108, is eliminated.

After the codebook search a decoder within the encoder is run on the optimal values of I, G, L and b. Running of the encoder's decoder reconstructs the encoder filter memories for use in future subframes.

A check is then made block 110, to determine whether the codebook subframe upon which analysis was just completed was the last codebook subframe of the set of codebook subframes corresponding to the pitch subframe for which the pitch search was conducted. In other words a determination is made as to whether there are any more codebook subframes which correspond to the pitch subframe. In the exemplary embodiment there are only two codebook subframes per pitch subframe. If it is determined that there is another codebook subframe which corresponds to the pitch frame, steps 102-108 are repeated for that codebook subframe.

Should there be no more codebook subframes corresponding to the pitch frame, a check is made, block 112, to determine whether any other pitch subframes exist within the speech frame under analysis. If there is another pitch subframe in the current speech frame under analysis, steps 94-110 are repeated for each pitch subframe and corresponding codebook subframes. When all computations for the current speech frame under analysis are completed, values representative of the LPC coefficients for the speech frame, the pitch lag L and gain b for each pitch subframe, and the codebook index I and gain G for each codebook subframe are packed for transmission, block 114.

Referring to FIG. 5, a decoder block diagram is illustrated wherein the received values for the LPC coefficients (.alpha..sub.i 's), pitch lags and gains (L & b), and codebook indices and gains (I & G) are used to synthesize the speech. Again in FIG. 5, as is FIG. 3, rate information is not considered for purposes in simplification of the discussion. Data rate information can be sent as side information and in some instances can be derived at the channel demodulation stage.

The decoder is comprised of codebook 130 which is provided with the received codebook indices, or for eighth rate the random seed. The output from codebook 130 is provided to one input of multiplier 132 while the other input of multiplier 132
receives the codebook gain G. The output of multiplier 132 is provided along with the pitch lag L and gain b to pitch synthesis filter 134. The output from pitch synthesis filter 134 is provided along with the LPC coefficients .alpha..sub.i to formant synthesis filter 136. The output from formant synthesis filter 136 is provided to adaptive postfilter 138 where filtered and output therefrom is the reconstructed speech. As discussed later herein, a version of the decoder is implemented within the encoder. The encoder's decoder does not include adaptive postfilter 138, but does include a perceptual weighting filter.

FIG. 6 is a flow chart corresponding to the operation of the decoder of FIG. 5. At the decoder, speech is reconstructed from the received parameters, block 150. In particular, the received value of the codebook index is input to the codebook which generates a codevector, or codebook output value, block 152. The multiplier receives the codevector along with the received codebook gain G and multiplies these values, block 154, with the resulting signal provided to the pitch synthesis filter. It should be noted that the codebook gain G is reconstructed by decoding and inverse quantizing the received DPCM parameters. The pitch synthesis filter is provided with the received pitch lag L and gain b values along with the multiplier output signal so as to filter the multiplier output, block 156.

The values resulting from filtering the codebook vector by the pitch synthesis filter are input to the formant synthesis filter. Also provided to the formant synthesis filter are LPC coefficients .alpha..sub.i 's for use in filtering the pitch synthesis filter output signal, block 158. The LPC coefficients are reconstructed at the decoder for interpolation by decoding the received DPCM parameters into quantized LSP frequencies, inverse quantizing the LSP frequencies and transforming the LSP frequencies to LPC coefficients .alpha..sub.i 's. The output from the formant synthesis filter is provided to the adaptive postfilter where quantization noise is masked, and the reconstructed speech is gain controlled, block 160. The reconstructed speech is output, block 162, for conversion to analog form.

Referring now to the block diagram illustration of FIGS. 7a and 7b, further details on the speech encoding techniques of the present invention are described. In FIG. 7a, each frame of digitized speech samples is provided to a Hamming window subsystem 200 where the input speech is windowed before computation of the autocorrelation coefficients in autocorrelation subsystem 202.

Hamming window subsystem 200 and autocorrelation subsystem 202. are illustrated in an exemplary implementation in FIG. 8. Hamming window subsystem 200 which is comprised of lookup table 250, typically an a 80.times.16 bit Read Only Memory (ROM), and multiplier 252. For each rate the window of speech is centered between the 139th and the 140th sample of each analysis frame which is 160 samples long. The window for computing the autocorrelation coefficients is thus offset from the analysis frame by 60 samples.

Windowing is done using a ROM table containing 80 of the 160 W.sub.H (n) values, since the Hamming window is symmetric around the center. The offset of the Hamming window is accomplished by skewing the address pointer of the ROM by 60 positions with respect to the first sample of an analysis frame. These values are multiplied in single precision with the corresponding input speech samples by multiplier 252. Let s(n) be the input speech signal in the analysis window. The windowed speech signal s.sub.w (n) is thus defined by:

and

Exemplary values, in hexadecimal, of the contents of lookup table 250 are set forth in Table II. These values are interpreted as two's complement numbers having 14 fractional bits with the table being read in the order of left to right, top to bottom.

TABLE II __________________________________________________________________________ 0 .times. 051f 0 .times. 0525 0 .times. 0536 0 .times. 0554 0 .times. 057d 0 .times. 05b1 0 .times. 05f2 0 .times. 063d 0 .times. 0694 0 .times. 06f6
0 .times. 0764 0 .times. 07dc 0 .times. 085e 0 .times. 08ec 0 .times. 0983 0 .times. 0a24 0 .times. 0ad0 0 .times. 0b84 0 .times. 0c42 0 .times. 0d09 0 .times. 0dd9 0 .times. 0eb0 0 .times. 0f90 0 .times. 1077 0 .times. 1166 0 .times. 125b 0 .times. 1357 0 .times. 1459 0 .times. 1560 0 .times. 166d 0 .times. 177f 0 .times. 1895 0 .times. 19af 0 .times. 1acd 0 .times. 1bee 0 .times. 1d11 0 .times. 1e37 0 .times. 1f5e 0 .times. 2087 0 .times. 21b0 0 .times. 22da 0 .times. 2403 0 .times. 252d 0 .times. 2655 0 .times. 277b 0 .times. 28a0 0 .times. 29c2 0 .times. 2ae1 0 .times. 2bfd 0 .times. 2d15 0 .times. 2e29 0 .times. 2f39 0 .times. 3043 0 .times. 3148 0 .times. 3247 0 .times. 333f 0 .times. 3431 0 .times. 351c 0 .times. 3600 0 .times. 36db 0 .times. 37af 0 .times. 387a 0 .times. 393d 0 .times. 39f6 0 .times. 3aa6 0 .times. 3b4c 0 .times. 3be9 0 .times. 3c7b 0 .times. 3d03 0 .times. 3d80 0 .times. 3df3 0 .times. 3e5b 0 .times. 3eb7 0 .times. 3f09 0 .times. 3f4f 0 .times. 3f89 0 .times. 3fb8 0 .times. 3fdb 0 .times. 3ff3 0 .times. 3fff __________________________________________________________________________

Autocorrelation subsystem 202 is comprised of register 254, multiplexer 256, shift register 258, multiplier 260, adder 262, circular shift register 264 and buffer 266. The windowed speech samples s.sub.w (n) are computed every 20 msec. and latched into register 254. On sample s.sub.w (0), the first sample of an LPC analysis frame, shift registers 258 and 264 are reset to 0. On each new sample s.sub.w (n), multiplexer 256 receives a new sample select signal which allows the sample to enter from register 254. The new sample s.sub.w (n) is also provided to multiplier 260 where multiplied by the sample s.sub.w (n-10), which is in the last position SR10 of shift register 258. The resultant value is added in adder 262 with the value in the last position CSR11 of circular shift register 264.

Shift registers 258 and 260 clocked once, replacing s.sub.w (n-1) by s.sub.w (n) in the first position SR1 of shift register 258 and replacing the value previously in position CSR10. Upon clocking of shift register 258 the new sample select signal is removed from input to multiplexer 256 such that the sample s.sub.w (n-9) currently in the position SR10 of shift register 260 is allowed to enter multiplexer 256. In circular shift register 264 the value previously in position CSR11 is shifted into the first position CSR1. With the new sample select signal removed from multiplexer, shift register 258 is set to provide a circular shift of the data in the shift register like that of circular shift register 264.

Shift registers 258 and 264 are both clocked 11 times in all for every sample such that 11 multiply/accumulate operations are performed. After 160 samples have been clocked in, the autocorrelation results, which are contained in circular shift register 264, are clocked into buffer 266 as the values R(0)-R(10). All shift registers are reset to zero, and the process repeats for the next frame of windowed speech samples.

Referring back to FIG. 7a, once the autocorrelation coefficients have been computed for the speech frame, a rate determination subsystem 204 and an LPC analysis subsystem 206 use this data to respectively compute a frame data rate and LPC coefficients. Since these operations are independent from one another they may be computed in any order or even simultaneously. For purposes of explanation herein, the rate determination is described first.

Rate determination subsystem 204 has two functions: (1) to determine the rate of the current frame, and (2) to compute a new estimate of the background noise level. The rate for the current analysis frame is initially determined based on the current frame's energy, the previous estimate of the background noise level, the previous rate, and the rate command from a controlling microprocessor. The new background noise level is estimated using the previous estimate of the background noise level and the current frame energy.

The present invention utilizes an adaptive thresholding technique for rate determination. As the background noise changes so do the thresholds which are used in selecting the rate. In the exemplary embodiment, three thresholds are computed to determine a preliminary rate selection RT.sub.p. The thresholds are quadratic functions of the previous background noise estimate, and are shown below:

and

where B is the previous background noise estimate.

The frame energy is compared to the three thresholds T1(B), T2(B) and T3(B). If the frame energy is below all three thresholds, the lowest rate of transmission (1 kbps), rate 1/8 where RT.sub.p =4, is selected. If the frame energy is below two thresholds, the second rate of transmission (2 kbps), rate 1/4 where RT.sub.p =3, is selected. If the frame energy is below only one threshold, the third rate of transmission (4 kbps), rate 1/2 where RT.sub.p =2, is selected. If the frame energy is above all of the thresholds, the highest rate of transmission (8 kbps), rate 1 where RT.sub.p =1, is selected.

The preliminary rate RT.sub.p may then be modified based on the previous frame final rate RT.sub.r. If the preliminary rate RT.sub.p is less than the previous frame final rate minus one (RT.sub.r -1), an intermediate rate RT.sub.m is set where RT.sub.m =(RT.sub.r -1). This modification process causes the rate to slowly ramp down when a transition from a high energy signal to a low energy signal occurs. However should the initial rate selection be equal to or greater than the previous rate minus one (RT.sub.r -1), the intermediate rate RT.sub.m is set to the same as the preliminary rate RT.sub.p, i.e. RT.sub.m =RT.sub.p. In this situation the rate thus immediately increases when a transition from a low energy signal to a high energy signal occurs.

Finally, the intermediate rate RT.sub.m is further modified by rate bound commands from a microprocessor. If the rate RT.sub.m is greater than the highest rate allowed by the microprocessor, the initial rate RT.sub.i is set to the highest allowable value. Similarly, if the intermediate rate RT.sub.m is less than the lowest rate allowed by the microprocessor, the initial rate RT.sub.i is set to the lowest allowable value.

In certain cases it may be desirable to code all speech at a rate determined by the microprocessor. The rate bound commands can be used to set the frame rate at the desired rate by setting the maximum and minimum allowable rates to the desired rate. The rate bound commands can be used for special rate control situations such as rate interlock, and dim and burst transmission, both described later. In an alternate embodiment the LPC coefficients can be calculated prior to the rate determination. Since the calculated LPC coefficients reflect the spectral properties of the input speech frame, the coefficients can be used as an indication of speech activity. Thus, the rate determination can be done based upon the calculated LPC coefficients.

FIG. 9 provides an exemplary implementation of the rate decision algorithm. To start the computation, register 270 is preloaded with the value 1 which is provided to adder 272. Circular shift registers 274, 276 and 278 are respectively loaded with the first, second and third coefficients of the quadratic threshold equations (7)-(9). For example, the last, middle and first positions of circular shift register 274 are respectively loaded with the first coefficient of the equations from which T1, T2 and T3 are computed. Similarly, the last, middle and first positions of circular shift register 276 are respectively loaded with the second coefficient of the equations from which T1, T2 and T3 are computed. Finally, the last, middle and first positions of circular shift register 278 are respectively loaded with the constant term of the equations from which T1, T2 and T3 are computed. In each of circular shift registers 274, 276 and 278, the value is output from the last position.

In computing the first threshold T1 the previous frame background noise estimate B is squared by multiplying the value by itself in multiplier 280. The resultant B.sup.2 value is multiplied by the first coefficient, 5.544613(10.sup.-6), which is output from the last position of circular shift register 274. This resultant value is added in adder 286 with the product of the background noise B and the second coefficient, 4.047152, output from the last position of circular shift register 276, from multiplier 284. The output value from adder 286 is then added in adder 288 with the constant term, 363.1293, output from the last position of circular shift register 278. The output from adder 288 is the computed value of T1.

The computed value of T1 output from adder 290 is subtracted in adder 288 from the frame energy value E.sub.f which in the exemplary embodiment is the value R(0) in the linear domain, provided from the autocorrelation subsystem.

In an alternative implementation, frame energy E.sub.f may also be represented in the log domain in dB where it is approximated by the log of the first autocorrelation coefficient R(0) normalized by the effective window length: ##EQU4## where L.sub.A is the autocorrelation window length. It should also be understood that voice activity may also be measured from various other parameters including pitch prediction gain or formant prediction gain G.sub.a : ##EQU5## where E.sup.(10) is the prediction residual energy after the 10th iteration and E.sup.(0) is the initial LPC prediction residual energy, as described later with respect to LPC analysis, which is the same as R(0).

From the output of adder 290, the complement of the sign bit of the resulting two's complement difference is extracted by comparator or limiter 292 and provided to adder 272 where added with the output of register 270. Thus, if the difference between R(0) and T1 is positive, register 270 is incremented by one. If the difference is negative, register 270 remains the same.

Circular registers 274, 276 and 278 are then cycled so the coefficients of the equation for T2, equation (8) appear at the output thereof. The process of computing the threshold value T2 and comparing it with the frame energy is repeated as was discussed with respect to the process for threshold value T1. Circular registers 274, 276 and 278 are then again cycled so the coefficients of the equation for T3, equation (9) appear at the output thereof. The computation for threshold value T3 and comparison to the frame energy as was described above. After completion of all three threshold computations and comparisons, register 270 contains the initial rate estimate RT.sub.i. The preliminary rate estimate RT.sub.p is provided to rate ramp down logic 294. Also provided to logic 294 is the previous frame final rate RT.sub.r from LSP frequency quantization subsystem that is stored in register 298. Logic 296 computes the value (RT.sub.r -1) and provides as an output the larger of the preliminary rate estimate RT.sub.p and the value RT.sub.r -1). The value RT.sub.m is provided to rate limiter logic 296.

As mentioned previously, the microprocessor provides rate bound commands to the vocoder, particularly to logic 296. In a digital signal processor implementation, this command is received in logic 296 before the LPC analysis portion of the encoding process is completed. Logic 296 ensures that the rate does not exceed the rate bounds and modifies the value RT.sub.m should it exceed the bounds. Should the value RT.sub.m be within the range of allowable rates it is output from logic 296 as the initial rate value RT.sub.i. The initial rate value RT.sub.i is output from logic 296 to LSP quantization subsystem 210 of FIG. 7a.

The background noise estimate as mentioned previously is used in computing the adaptive rate thresholds. For the current frame the previous frame background noise estimate B is used in establishing the rate thresholds for the current frame. However for each frame the background noise estimate is updated for use in determining the rate thresholds for the next frame. The new background noise estimate B' is determined in the current frame based on the previous frame background noise estimate B and the current frame energy E.sub.f.

In determining the new background noise estimate B' for use during the next frame (as the previous frame background noise estimate B) two values are computed. The first value V.sub.1 is simply the current frame energy E.sub.f. The second value V.sub.2 is the larger of B+1 and KB, where K=1.00547. To prevent the second value from growing too large, it is forced to be below a large constant M=160,000. The smaller of the two values V.sub.1 or V.sub.2 is chosen as the new background noise estimate B'.

Mathematically,

and the new background noise estimate B' is:

where min (x,y) is the minimum of x and y, and max (x,y) is the maximum of x and y.

FIG. 9 further shows an exemplary implementation of the background noise estimation algorithm. The first value V.sub.1 is simply the current frame energy E.sub.f provided directly to one input of multiplexer 300.

The second value V.sub.2 is computed from the values KB and B+1, which are first computed. In computing the values KB and B+1, the previous frame background noise estimate B stored in register 302 is output to adder 304 and multiplier 306. It should be noted that the previous frame background noise estimate B stored in register 302 for use in the current frame is the same as the new background noise estimate B' computed in the previous frame. Adder 304 is also provided with an input value of
1 for addition with the value B so as to generate the term B+1. Multiplier 304 is also provided with an input value of K for multiplication with the value B so as to generate the term KB. The terms B+1 and KB are output respectively from adder 304 and multiplier 306 to separate inputs of both multiplexer 308 and adder 310.

Adder 310 and comparator or limiter 312 are used in selecting the larger of the terms B+1 and KB. Adder 310 subtracts the term B+1 from KB and provides the resulting value to comparator or limiter 312. Limiter 312 provides a control signal to multiplexer 308 so as to select an output thereof as the larger of the terms B+1 and KB. The selected term B+1 or KB is output from multiplexer 308 to limiter 314 which is a saturation type limiter which provides either the selected term if below the constant value M, or the value M if above the value M. The output from limiter 314 is provided as the second input to multiplexer 300 and as an input to adder 316.

Adder 316 also receives at another input the frame energy value E.sub.f. Adder 316 and comparator or limiter 318 are used in selecting the smaller of the value E.sub.f and the term output from limiter 314. Adder 316 subtracts the frame energy value from the value output from limiter 314 and provides the resulting value to comparator or limiter 318. Limiter 318 provides a control signal to multiplexer 300 for selecting the smaller of the E.sub.f value and the output from limiter 314. The selected value output from multiplexer 300 is provided as the new background noise estimate B' to register 302 where stored for use during the next frame as the previous frame background noise estimate B.

Referring back to FIG. 7, each of the autocorrelation coefficients R(0)-R(10) are output from autocorrelation subsystem 202 to LPC analysis subsystem 206. The LPC coefficients computed in LPC analysis subsystem 206 in both the perceptual weighting filter 52 and formant synthesis filter 60.

The LPC coefficients may be obtained by the autocorrelation method using Durbin's recursion as discussed in Digital Processing of Speech Signals, Rabiner & Schafer, Prentice-Hall, Inc., 1978. This technique is an efficient computational method for obtaining the LPC coefficients. The algorithm can be stated in the following equations: ##EQU6## The ten LPC coefficients are labeled .alpha..sub.j.sup.(10), for 1<=j<=10

Prior to encoding of the LPC coefficients, the stability of the filter must be ensured. Stability of the filter is achieved by radially scaling the poles of the filter inward by a slight amount which decreases the magnitude of the peak frequency responses while expanding the bandwidth of the peaks. This technique is commonly known as bandwidth expansion, and is further described in the article "Spectral Smoothing in PARCOR Speech Analysis-Synthesis" by Tohkura et.al., ASSP Transactions, December 1978. In the present case bandwidth expansion can be efficiently done by scaling each LPC coefficient. Therefore, as set forth in Table III, the resultant LPC coefficients are each multiplied by a corresponding hex value to yield the final output LPC coefficients .alpha..sub.1 -.alpha..sub.10 of LPC analysis subsystem 206. It should be noted that the values presented in Table III are given in hexadecimal with 15 fractional bits in two's complement notation. In this form the value
0.times.8000 represents -1.0 and the value 0.times.7333 (or 29491) represents 0.899994=29491/32768.

TABLE III ______________________________________ .alpha..sub.1 = .alpha..sub.1.sup.(10) .cndot. 0 .times. 7333 .alpha..sub.2 = .alpha..sub.2.sup.(10) .cndot. 0 .times. 67ae .alpha..sub.3 = .alpha..sub.3.sup.(10) .cndot. 0 .times. 5d4f .alpha..sub.4 = .alpha..sub.4.sup.(10) .cndot. 0 .times. 53fb .alpha..sub.5 = .alpha..sub.5.sup.(10) .cndot. 0 .times. 4b95 .alpha..sub.6 = .alpha..sub.6.sup.(10) .cndot. 0 .times. 4406 .alpha..sub.7 = .alpha..sub.7.sup.(10) .cndot. 0 .times. 3d38 .alpha..sub.8 = .alpha..sub.8.sup.(10) .cndot. 0 .times. 3719 .alpha..sub.9 = .alpha..sub.9.sup.(10) .cndot. 0 .times. 3196 .alpha..sub.10 = .alpha..sub.10.sup.(10) .cndot. 0 .times. ______________________________________ 2ca1

The operations are preferrably performed in double precision, i.e. 32 bit divides, multiplies and additions. Double precision accuracy is preferred in order to maintain the dynamic range of the autocorrelation functions and filter coefficients.

In FIG. 10, a block diagram of an exemplary embodiment of the LPC subsystem 206 is shown which implements equations (15)-(20) above. LPC subsystem 206 is comprised of three circuit portions, a main computation circuit 330 and two buffer update circuits 332 and 334 which are used to update the registers of the main computation circuit 330. Computation is begun by first loading the values R(1)-R(10) into buffer 340. To start the calculation, register 348 is preloaded with the value R(1) via multiplexer 344. Register is initialized with R(0) via multiplexer 350, buffer 352 (which holds 10 .alpha.j.sup.(i-1) values) is initialized to all zeroes via multiplexer 354, buffer 356 (which holds 10 .alpha.j.sup.(i) values) is initialized to all zeroes via multiplexer 358, and i is set to 1 for the computational cycle. For purposes of clarity counters for i and j and other computational cycle control are not shown but the design and integration of this type of logic circuitry is well within the ability of one skilled in the art in digital logic design.

The .alpha.j.sup.(i-1) value is output from buffer 356 to compute the term k.sub.i E.sup.(i-1) as set forth in equation (14). Each value R(i-j) is output from buffer 340 for multiplication with the .alpha.j.sup.(i-1) value in multiplier 360. Each resultant value is subtracted in adder 362 from the value in register 346. The result of each subtraction is stored in register 346 from which the next term is subtracted. There are i-1 multiplications and accumulations in the i.sup.th cycle, as indicated in the summation term of equation (14). At the end of this cycle, the value in register 346 is divided in divider 364 by the value E.sup.(i-1) from register 348 to yield the value k.sub.i.

The value k.sub.i is then used in buffer update circuit 332 to calculate the value E.sup.(i) as in equation (19) above, which is used as the value E.sup.(i-1) during the next computational cycle of k.sup.i. The current cycle value k.sub.i is multiplied by itself in multiplier 366 to obtain the value k.sub.i.sup.2. The value k.sub.i.sup.2 is then subtracted from the value of 1 in adder 368. The result of this addition is multiplied in multiplier 370 with the value E.sup.(i) from register
348. The resulting value E.sup.(i) is input to register 348 via multiplexer 350 for storage as the value E.sup.(i-1) for the next cycle.

The value k.sub.i is then used to calculate the value .alpha..sub.i.sup.(i) as in equation (15). In this case the value k.sub.i is input to buffer 356 via multiplexer 358. The value k.sub.i is also used in buffer update circuit 334 to calculate the values .alpha..sub.j.sup.(i) from the values .alpha..sub.j.sup.(i-1) as in equation (18). The values currently stored in buffer 352 are used in computing the values .alpha..sub.j.sup.(i). As indicated in equation (18), there are i-1 calculations in the i.sup.th cycle. In the i=1 iteration no such calculations are required For each value of j for the i.sup.th cycle a value of .alpha..sub.j.sup.(i) is computed. In computing each value of .alpha..sub.j.sup.(i), each value of .alpha..sub.j.sup.(i-1) is multiplied in multiplier 372 with the value k.sub.i for output to adder 374. In adder 374 the value k.sub.i .alpha..sub.i-j.sup.(i-1) is subtracted from the value .alpha..sub.j.sup.(i-1) also input to adder 374. The result of each multiplication and addition is provided as the value of .alpha..sub.j.sup.(i) to buffer 356 via multiplexer 358.

Once the values .alpha..sub.i.sup.(i) and .alpha..sub.j.sup.(i) are computed for the current cycle, the values just computed and stored in buffer 356 are output to buffer 352 via multiplexer 354. The values stored in buffer 356 are stored in corresponding positions in buffer 352. Buffer 352 is thus updated for computing the value k.sub.i for the i+1 cycle.

It is important to note that data .alpha..sub.j.sup.(i-1) generated at the end of a previous cycle is used during the current cycle to generate updates .alpha..sub.j.sup.(i) for a next cycle. This previous cycle data must be retained in order to completely generate updated data for the next cycle. Thus two buffers 356 and 352 are utilized to preserve this previous cycle data until the updated data is completely generated.

The above description is written with respect to a parallel transfer of data from buffer 356 to buffer 352 upon completion of the calculation of the updated values. This implementation ensures that the old data is retained during the entire process of computing the new data, without loss of the old data before completely used as would occur in a single buffer arrangement. The described implementation is one of several implementations that are readily available for achieving the same result. For example, buffers 352 and 356 may be multiplexed such that upon calculating the value k.sub.i for a current cycle from values stored in a first buffer, the updates are stored in the second buffer for use during the next computational cycle. In this next cycle the value k.sub.i is computed from the values stored in the second buffer. The values in the second buffer and the value k.sub.i are used to generate updates for the next cycle with these updates stored in the first buffer. This alternating of buffers enables the retention of proceeding computational cycle values, from which updates are generated, while storing update values without overwriting the proceeding values which are needed to generate the updates. Usage of this technique can minimize the delay associated with the computation of the value k.sub.i for the next cycle. Therefore the updates for the multiplications/accumulations in computing k.sub.i may be done at the same time as the next value of .alpha..sub.j.sup.(i-1) is computed.

The ten LPC coefficients .alpha..sub.j.sup.(10), stored in buffer 356 upon completion of the last computational cycle (i=10), are scaled to arrive at the corresponding final LPC coefficients .alpha..sub.j. Scaling is accomplished by providing a scale select signal to multiplexers 344, 376 and 378 so that the scaling values stored in lookup table 342, hex values of Table III, are selected for output through multiplexer 344. The values stored in lookup table 342 are clocked out in sequence and input to multiplier 360. Multiplier 360 also receives via multiplexer 376 the .alpha..sub.j.sup.(10) values sequentially output from register 356. The scaled values are output from multiplier 360 via multiplexer 378 as an output to LPC to LSP transformation subsystem 208 (FIG. 7).

In order to efficiently encode each of the ten scaled LPC coefficients in a small number of bits, the coefficients are transformed into Line Spectrum Pair frequencies as described in the article "Line Spectrum Pair (LSP) and Speech Data Compression", by Soong and Juang, ICASSP '84. The computation of the LSP parameters is shown below in equations (21) and (22) along with Table IV.

The LSP frequencies are the ten roots which exist between 0 and .pi. of the following equations:

where the p.sub.n and q.sub.n values for n=1, 2, 3, 4 and are defined recursively in Table IV.

TABLE IV ______________________________________ p.sub.1 = -(.alpha..sub.1 + .alpha..sub.10) - 1 q.sub.1 = -(.alpha..sub.1 - .alpha..sub.10) + 1 p.sub.2 = -(.alpha..sub.2 + .alpha..sub.9) - p.sub.1 q.sub.2 = -(.alpha..sub.2 - .alpha..sub.9) + q.sub.1 p.sub.3 = -(.alpha..sub.3 + .alpha..sub.8) - p.sub.2 q.sub.3 = -(.alpha..sub.3 - .alpha..sub.8) + q.sub.2 p.sub.4 = -(.alpha..sub.4 + .alpha..sub.7) - p.sub.3 q.sub.4 = -(.alpha..sub.4 - .alpha..sub.7) + q.sub.3 p.sub.5 = -(.alpha..sub.5 + .alpha..sub.6) - p.sub.4 q.sub.5 = -(.alpha..sub.5 - .alpha..sub.6) + ______________________________________ q.sub.4

In Table IV, the .alpha..sub.1, . . . , .alpha..sub.10 values are the scaled coefficients resulting from the LPC analysis. The ten roots of equations (21) and (22) are scaled to between 0 and 0.5 for simplicity. A property of the LSP frequencies is that, if the LPC filter is stable, the roots of the two functions alternate; i.e. the lowest root, .omega..sub.1, is the lowest root of P(.omega.), the next lowest root, .omega..sub.2, is the lowest root of Q(.omega.), and so on. Of the ten frequencies, the odd frequencies are the roots of the P(.omega.), and the even frequencies are the roots of the Q(.omega.).

The root search is done as follows. First, the p and q coefficients are computed in double precision by adding the LPC coefficients as shown above. P(.omega.) is then evaluated every .pi./256 radians and these values are then evaluated for sign changes, which identify a root in that subregion. If a root is found, a linear interpolation between the two bounds of this region is then done to approximate the location of the root. One Q root is guaranteed to exist between each pair of P roots (the fifth Q root exists between the fifth P root and .pi.) due to the ordering property of the frequencies. A binary search is done between each pair of P roots to determine the location of the Q roots. For ease in implementation, each P root is approximated by the closest .pi./256 value and the binary search is done between these approximations. If a root is not found, the previous unquantized values of the LSP frequencies from the last frame in which the roots were found are used.

Referring now to FIG. 11, an exemplary implementation of the circuitry used to generate the LSP frequencies is illustrated. The above described operation requires a total of 257 possible cosine values between 0 and .pi., which are stored in double precision in a lookup table, cosine lookup table 400 which is addressed by mod 256 counter 402. For each value of j input to lookup table 400, an output of cos .omega., cos 2.omega., cos 3.omega.cos 4.omega. and cos 5.omega. are provided where:

where j is a count value.

The values cos .omega., cos 2.omega., cos 3.omega. and cos 4.omega. output from lookup table 400 are input to a respective multiplier 404, 406, 408, and 410, while the value cos 5.omega. is input directly to summer 412. These values are multiplied in a respective multiplier 404, 406, 408, and 410 with a respective one of the values p.sub.4, p.sub.3, p.sub.2 and p.sub.1 input thereto via multiplexers 414, 416, 418 and 420. The resultant values from this multiplication are also input to summer 412. Furthermore the value p.sub.5 is provided through multiplexer 422 to multiplier 424 with the constant value 0.5, i.e. 1/2, also provided to multiplier 424. The resultant value output from multiplier 424 is provided as another input to summer 412. Multiplexers 414-422 select between the values p.sub.1 -p.sub.5 or q.sub.1 -q.sub.5 in response to a p/q coefficient select signal, so as to use the same circuitry for computation of both the P(.omega.) and Q(.omega.) values. The circuitry for generating the p.sub.1 -p.sub.5 or q.sub.1 -q.sub.5 values is not shown but is readily implemented using a series of adders for adding and subtracting the LPC coefficients and p.sub.1 -p.sub.5 or q.sub.1 -q.sub.5 values, along with registers for storing the p.sub.1 -p.sub.5 or q.sub.1 -q.sub.5 values.

Summer 412 sums the input values to provide the output P(.omega.) or Q(.omega.) value as the case may be. For purposes of ease in further discussion the case of the values of P(.omega.) will be considered with the values of Q(.omega.) computed in a similar fashion using the q.sub.1 -q.sub.5 values. The current value of P(.omega.) is output from summer 412 where stored in register 426. The preceding value of P(.omega.), previously stored in register 426 is shifted to register 428. The sign bits of the current and previous values of P(.omega.) are exclusive OR'ed in exclusive OR gate 430 to give an indication of a zero crossing or sign change, in the form of an enable signal that is sent to linear interpolator 434. The current and previous value of P(.omega.) are also output from registers 426 and 428 to linear interpolator 434 which is responsive to the enable signal for interpolating the point between the two values of P(.omega.) at which the zero crossing occurs. This linear interpolation fractional value result, the distance from the value j-1, is provided to buffer 436 along with the value j from counter 256. Gate 430 also provides the enable signal to buffer 436 which permits the storage of the value j and the corresponding fractional value FV.sub.j.

The fractional value is subtracted from the value j as output from buffer 436 in adder 438, or in the alternative may be subtracted therefrom as input to buffer 436. In the alternative a register in the j line input to buffer 436 may be used such that the value j-1 is input to buffer 436 with the fractional value input also input thereto. The fractional value may be added to the value j-1 either before storage in register 436 or upon output thereof. In any case the combined value of j+FV.sub.j or (j-1)+FV.sub.j is output to divider 440 where divided by the input constant value of 512. The division operation may be simply be performed by merely changing the binary point location in the representative binary word. This division operation provides the necessary scaling to arrive at a LSP frequency between 0 and 0.5.

Each function evaluation of P(.omega.) or Q(.omega.) requires 5 cosine lookups, 4 double precision multiplications, and 4 additions. The computed roots are typically only accurate to about 13 bits, and are stored in single precision. The LSP frequencies are provided to LSP quantization subsystem 210 (FIG. 7) for quantization.

Once the LSP frequencies have been computed, they must be quantized for transmission. Each of the ten LSP frequencies centers roughly around a bias value. It should be noted that the LSP frequencies approximate the bias values when the input speech has flat spectral characteristics and no short term prediction can be done. The biases are subtracted out at the encoder, and a simple DPCM quantizer is used. At the decoder, the bias is added back. The negative of the bias value, in hexadecimal, for each LSP frequency, .omega..sub.1 -.omega..sub.10, as provided from the LPC to LSP transformation subsystem is set forth in Table V. Again the values given in Table V are in two's complement with 15 fractional bits. The hex value
0.times.8000 (or -32768) represents -1.0. Thus the first value in Table V, the value 0.times.fa2f (or -1489) represents -0.045441=-1489/32768.

TABLE V ______________________________________ Negative LSP Bias frequency Value ______________________________________ .omega..sub.1 0 .times. fa2f .omega..sub.2 0 .times. f45e .omega..sub.3 0 .times. ee8c .omega..sub.4 0 .times. e8bb .omega..sub.5 0 .times. e2e9 .omega..sub.6 0 .times. dd18 .omega..sub.7 0 .times. d746 .omega..sub.8 0 .times. d175 .omega..sub.9 0 .times. cba3 .sub. .omega..sub.10 0 .times. c5d2 ______________________________________

The predictor used in the subsystem is 0.9 times the quantized LSP frequency from the previous frame stored in a buffer in the subsystem. This decay constant of 0.9 is inserted so that channel errors will eventually die off.

The quantizers used are linear, but vary in dynamic range and step size with the rate. Also, in high rate frames more bits are transmitted for each LSP frequency, therefore the number of quantization levels depends upon the rate. In Table VI, the bit allocation and the dynamic range of the quantization are shown for each frequency at each rate. For example, at rate 1, .omega..sub.1 is uniformly quantized using 4 bits (that is, into 16 levels) with the highest quantization level being 0.025
and the lowest being -0.025.

TABLE VI ______________________________________ RATE Full Half Quarter Eighth ______________________________________ .omega..sub.1 4: .+-. .025 2: .+-. .015 1: .+-. .01 1: .+-. .01 .omega..sub.2 4: .+-. .04 2: .+-. .015 1: .+-. .01 1: .+-. .015 .omega..sub.3 4: .+-. .07 2: .+-. .03 1: .+-. .01 1: .+-. .015 .omega..sub.4 4: .+-. .07 2: .+-. .03 1: .+-. .01 1: .+-. .015 .omega..sub.5 4: .+-. .06 2: .+-. .03 1: .+-. .01 1: .+-. .015 .omega..sub.6 4: .+-. .06 2: .+-. .02 1: .+-. .01 1: .+-. .015 .omega..sub.7 4: .+-. .05 2: .+-. .02 1: .+-. .01 1: .+-. .01 .omega..sub.8 4: .+-. .05 2: .+-. .02 1: .+-. .01 1: .+-. .01 .omega..sub.9 4: .+-. .04 2: .+-. .02 1: .+-. .01 1: .+-. .01 .sub. .omega..sub.10 4: .+-. .04 2: .+-. .02 1: .+-. .01 1: .+-. .01 Total 40 bits 20 bits 10 bits 10 bits ______________________________________

If the quantization ranges for the rate chosen by the rate decision algorithm are not large enough or a slope overflow occurs, the rate is bumped up to the next higher rate. The rate continues to be bumped up until the dynamic range is accommodated or full rate is reached. In FIG. 12 an exemplary block diagram illustration of one implementation of the optional rate bump up technique is provided.

FIG. 12 illustrates in block diagram form an exemplary implementation of the LSP quantization subsystem 210 which includes the rate bump up circuitry. In FIG. 12, the current frame LSP frequencies are output from divider 440 (FIG. 11) to register 442 where they are stored for output during a rate bump up determination in the next frame. The previous frame LSP frequencies and the current frame LSP frequencies are output respectfully output from register 440 and divider 440 to rate bump up logic 442 for a current frame rate bump up determination. Rate bump up logic 442 also receives the initial rate decision, along with the rate the rate bound commands from rate determination subsystem 204. In determining whether a rate increase is necessary, logic 442 compares the previous frame LSP frequencies with the current frame LSP frequencies based on the sum of the square of the difference between the current and previous frame LSP frequencies. The resulting value is then compared with a threshold value for which if exceeded is an indication that an increase in rate is necessary to ensure high quality encoding of the speech. Upon exceeding the threshold value, logic 442 increments the initial rate by one rate level so as to provide an output of the final rate used throughout the encoder.

In FIG. 12, each LSP frequency value .omega..sub.1 -.omega..sub.10 is input one at a time to adder 450 along with the corresponding bias value. The bias value is subtracted from the input LSP value and the result thereof output to adder 452. Adder 452 also receives as an input a predictor value, a previous frame corresponding LSP value multiplied by a decay constant. The predictor value is subtracted from the output of adder 450 by adder 452. The output of adder 452 is provided as an input to quantizer 454.

Quantizer 454 is comprised of limiter 456, minimum dynamic range lookup table 458, inverse step size lookup table 460, adder 462, multiplier 464 and bit mask 466. Quantization is performed in quantizer 454 by first determining whether the input value is within the dynamic range of quantizer 454. The input value is provided to limiter 456 which limits the input value to the upper and lower bounds of the dynamic range if the input exceeds the bounds provided by lookup table 458. Lookup table
458 provides the stored bounds, according to Table VI, to limiter 456 in response to the rate input and the LSP frequency index i input thereto. The value output from limiter 456 is input to adder 462 where the minimum of the dynamic range, provided by lookup table 458 is subtracted therefrom. The value output from lookup table 458 is again determined by the rate and LSP frequency index i in accordance with the minimum dynamic range values, disregarding the value sign, set forth in Table VI. For example the value in lookup table 458 for (full rate, .omega..sub.1) is 0.025.

The output from adder 462 is then multiplied in multiplier 464 by a value selected from lookup table 460. Lookup table 460 contains values corresponding to the inverse of the step size for each LSP value at each rate in accordance with the values set forth in Table VI. The value output from lookup table 460 is selected by the rate and LSP frequency index i. For each rate and LSP frequency index i the value stored in lookup table 460 is the quantity ((2.sup.n- 1)/dynamic range), where n is the number of bits representing the quantized value. Again for example, the value in lookup table 460 for (rate 1, .omega..sub.1) is (15/0.05) or 300.

The output from multiplier 464 is a value between 0 and 2.sup.n-1 which is provided to bit mask 466. Bit mask 466 in response to the rate and LSP frequency index extracts from the input value the appropriate number of bits according to Table VI. The extracted bits are the n integer value bits of the input value so as to provide a bit limited output .DELTA..omega..sub.i. The values .DELTA..omega..sub.i are the quantized unbiased differentially encoded LSP frequencies that are transmitted over the channel representative of the LPC coefficients.

The value .DELTA..omega..sub.i is also fed back through a predictor comprised of inverse quantizer 468, adder 470, buffer 472 and multiplier 474. Inverse quantizer 468 is comprised of step size lookup table 476, minimum dynamic range lookup table 478, multiplier 480 and adder 482.

The value .DELTA..omega..sub.i is input to multiplier 480 along with a selected value from lookup table 476. Lookup table 476 contains values corresponding to the step size for each LSP value at each rate in accordance with the values set forth in Table VI. The value output from lookup table 476 is selected by the rate and LSP frequency index i. For each rate and LSP frequency index i the value stored in lookup table 460 is the quantity (dynamic range/2.sup.n-1), where n is the number of bits representing the quantized value. Multiplier 480 multiplies the input values and provides an output to adder 482.

Adder 482 receives as another input a value from lookup table 478. The value output from lookup table 478 is determined by the rate and LSP frequency index i in accordance with the minimum dynamic range values, disregarding the value sign, set forth in Table VI. Adder 482 adds the minimum dynamic range value provided by lookup table 478 with the value output from multiplier 480 with resulting value output to adder 470.

Adder 470 receives as another input the predictor value output from multiplier 474. These values are added in adder 470 and stored in ten word storage buffer 472. Each value previous frame value output from buffer 472 during the current frame is multiplied in multiplier 474 by a constant, 0.9. The predictor values as output from multiplier 474 are provided to both adders 452 and 470 as previously discussed.

In the current frame the value stored in buffer 472 is the previous frame reconstructed LSP values minus the bias value. Similarly in the current frame the value output from adder 470 is the current frame reconstructed LSP values also without bias. In the current frame the output from buffer 472 and adder 470 are respectively provided to adders 484 and 486 where the bias is added into the values. The values output from adders 484 and 486 are respectively the previous frame reconstructed LSP frequency values and the current frame reconstructed LSP frequency values. LSP smoothing is done at the lower rates according to the equation:

where

a=0 for full rate;

a=0.1 for half rate;

a=0.5 for quarter rate; and

a=0.85 for eighth rate.

The previous frame (f-1) reconstructed LSP frequency .omega.'.sub.i,f-1 values and the current frame (f) reconstructed LSP frequency values .omega.'.sub.i,f are output from quantization subsystem 210 to pitch subframe LSP interpolation subsystem
216 and codebook subframe LSP interpolation subsystem 226. The quantized LSP frequency values .DELTA..omega..sub.i are output from LSP quantization subsystem 210 to data assembler subsystem 236 for transmission.

The LPC coefficients used in the weighting filter and the formant synthesis filter described later are appropriate for the pitch subframe which is being encoded. For pitch subframes, the interpolation of the LPC coefficients is done once for each pitch subframe and are as follows in Table VII:

TABLE VII ______________________________________ Rate 1: .omega..sub.i = 0.75.omega.'.sub.i,f-1 + 0.25.omega.'.sub.i,f for pitch subframe 1 .omega..sub.i = 0.5.omega.'.sub.i,f-1 + 0.5.omega.'.sub.i,f for pitch subframe 2 .omega..sub.i =
0.25.omega.'.sub.i,f-1 + 0.75.omega.'.sub.i,f for pitch subframe 3 .omega..sub.i = .omega.'.sub.i,f Rate 1/2: .omega..sub.i = 0.625.omega.'.sub.i,f-1 + 0.375.omeg