Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent
5222189
Fielder
June 22, 1993
Title
Low time-delay transform coder, decoder, and encoder/decoder for high-quality audio
Abstract
A low bit-rate (192 kBits per second) transform encoder/decoder system (44.1 kHz or 48 kHz sampling rate) for high-quality music applications employs short time-domain sample blocks (128 samples/block) so that the system signal propagation delay is short enough for real-time aural feedback to a human operator. Carefully designed pairs of analysis/synthesis windows are used to achieve sufficient transform frequency selectivity despite the use of short sample blocks. A synthesis window in the decoder has characteristics such that the product of its response and that of an analysis window in the encoder produces a composite response which sums to unity for two adjacent overlapped sample blocks. Adjacent time-domain signal samples blocks are overlapped and added to cancel the effects of the analysis and synthesis windows. A technique is provided for deriving suitable analysis/synthesis window pairs. In the encoder, a discrete transform having a function equivalent to the alternate application of a modified Discrete Cosine Transform and a modified Discrete Sine Transform according to the Time Domain Aliasing Cancellation technique or, alternatively, a Discrete Fourier Transform is used to generate frequency-domain transform coefficients. The transform coefficients are nonuniformly quantized by assigning a fixed number of bits and a variable number of bits determined adaptively based on psychoacoustic masking. A technique is described for assigning the fixed bit and adaptive bit allocations. The transmission of side information regarding adaptively allocated bits is not required. Error codes and protected data may be scattered throughout formatted frame outputs from the encoder in order to reduce sensitivity to noise bursts.
Inventors:
Fielder; Loius D.
(Millbrae,
CA
)
Assignee:
Dolby Laboratories Licensing Corporation
(San Francisco,
CA
)
Appl. No.:
582956
Filed:
September 26, 1990
PCT 102e Date:
September 26, 1990
PCT 371 Date:
September 26, 1990
PCT File Date:
January 29, 1990
PCT No:
PCT/US90/00507
Current U.S. Class:
704/229
704/230
Field of Search:
381/29-40 395/2
U.S. Patent Documents
4216354
August 1980
Esteban et al.
4455649
June 1984
Esteban et al.
4703480
October 1987
Westall et al.
4790016
December 1988
Mazor et al.
4914701
April 1990
Zibman
5109417
April 1992
Fielder et al.
5115240
April 1992
Fujiwara et al.
Foreign Patent Documents
0176243
Apr., 1986
EP
0193143
Sep., 1986
EP
0217017
Apr., 1987
EP
0289080
Nov., 1988
EP
3440613
Apr., 1986
DE
3639753
Sep., 1988
DE
87/00723
Nov., 1987
WO
8903574
Apr., 1989
WO
Other References
D Esteban, C. Galand, "32 KBPS CCITT Compatible Split Band Coding Scheme," IEEE Int. Conf. on Acoust., Speech, and Signal Proc., 1978, pp. 320-325. .
Lee, "Effects of Delayed Speech Feedback," J. Acoust. Soc. Am., vol. 22, Nov., 1950, pp. 824-826. .
Cooley, Tukey, "An Algorithm for the Machine Calculation of Complex Fourier Series," Math. Comput., vol. 19, 1965, pp. 297-301. .
Parks, McClellan, "Chebyshev Approximation for Nonrecursive Digital Filters with Linear Phase," IEEE Trans., vol. CT-19, Mar. 1972, pp. 189-194. .
Brigham, The Fast Fourier Transform, Englewood Cliffs, NJ: Prenctice-Hall, Inc., 1974, pp. 166-169. .
Lee, Lipschutz, "Floating-Point Encoding for Transcription of High-Fidelity Audio Signals," J. Audio Eng. Soc., vol. 25, May, 1977, pp. 266-272. .
Harris, "On the Use of Windows for Harmonic Analysis with the Discrete Fourier Transform," Proc. IEEE, vol. 66, Jan., 1978, pp. 51-83. .
Tribolet, Crochiere, "Frequency Domain Coding of Speech," IEEE Trans. Acoust., Speech, and Signal Proc., vol. ASSP-27, Oct., 1979, pp. 512-530. .
Crochiere, "A Weighted Overlap-Add Method of Short-Time Fourier Analysis/Synthesis," IEEE Trans., vol. ASSP-28, Feb., 1980, pp. 99-102. .
S. Prakash, V. V. Rao, "Fixed-Point Error Analysis of Radix-4 FFT," Signal Processing, vol. 3, Apr., 1981, pp. 123-133. .
Brandenburg, Schramm, "A 16 Bit Adaptive Transform Coder for Real-Time Processing of Sound Signals", Signal Processing II, 1983, pp. 359-362. .
Smith, Digital Transmission Systems, New York, NY: Van Nostrand Reinhold Co., 1985, pp. 228-236. .
Fielder, "Pre- and Postemphasis Techniques as Applied to Audio Recording Systems," J. Audio Eng. Soc., vol. 33, 1985, pp. 649-657. .
Press, Flannery, Teukolsky, Vetterling, Numerical Recipes: The Art of Scientific Computing, New York: Cambridge University Press, 1986, pp. 254-259. .
Peterson, Weldon, Error-Correcting Codes, Cambridge, Mass: The M.I.T. Press, 1986, pp. 269-309, 361-362. .
Princen, Bradley, "Analysis/Synthesis Filter Bank Design Based on Time Domain Aliasing Cancellation," IEEE Trans., vol. ASSP-34, Oct., 1986, pp. 1153-1161. .
Stoll, Theile, "New Digital Sound Transmission Methods--How is Sound Quality Assessed," Report, 14th Meeting of Audio Engineers, Munich, Nov., 1986. .
Brandenburg, "OCF--A New Coding Algorithm for High Quality Sound Signals," IEEE Int. Conf. on Acoust., Speech, and Signal Proc., 1987, pp. 141-144. .
Johnson, Bradley, "Adaptive Transform Coding Incorporating Time Domain Aliasing Cancellation," Speech Communications., vol. 6, 1987, pp. 299-308. .
Fielder, "Evaluation of the Audible Distortion and Noise Produced by Digital Audio Converters," J. Audio Eng. Soc., vol. 35, Jul., 1987, pp. 517-534. .
Audio Engineering Handbook, K. B. Benson ed., San Francisco: McGraw-Hill, 1988, pp. 1.40-1.42, 4.8-4.10. .
Johnston, "Transform Coding of Audio Signals Using Perceptual Noise Criteria," IEEE J. on Selected Areas in Comm., vol. 6, Feb., 1988, pp. 314-323. .
Brandenburg, Kapust, et al., "Real-Time Implementation of Low Complexity Transform Coding", AES Preprint 2581, 84th Convention, Paris, 1988. .
Lookabaugh, "Variable Rate and Adaptive Frequency Domain Vector Quantization of Speech," PhD Dissertation, Stanford University, Jun., 1988, pp. 166-182. .
Brandenburg, Kapust, et al., "Low Bit Rate Codecs for Audio Signals Implementation in Real Time," AES Preprint 2707, 85th Convention, Nov., 1988. .
Brandenburg, Seitzer, "OCF: Coding High Quality Audio with Data Rates of 64 kBit/Sec," AES Preprint 2723, 85th Convention, Los Angeles, Nov., 1988. .
Feiten, "Spectral Properties of Audio Signals and Masking with Aspect to Bit Data Reduction," AES Preprint 2795, 86th Convention, Hamburg, Mar., 1989. .
Edler, "Coding of Audio Signals with Overlapping Block Transform and Adaptive Window Functions," Frequenz, vol. 43, No. 9, 1989, pp. 252-256..~
Primary Examiner:
Knepper; David D.
Attorney, Agent or Firm:
Gallagher; Thomas A. Lathrop; David N.
Parent Case Text
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a continuation-in-part of U.S. patent application Ser. No. 07/458,894 filed Dec. 29, 1989, application Ser. No. 07/439,868 filed Nov. 20, 1989, abandoned, and application Ser. No. 07/303,714 filed Jan. 27, 1989, abandoned.
Claims
I claim:
1. An encoder for the encoding of audio information comprising signal samples, said encoder comprising
means for receiving said signal samples,
subband means, including adaptive bit allocation means, for defining subbands and for generating subband information in response to said signal samples, said subband information for each of said subbands including one or more digital words, each of said digital words comprising an adaptive portion and a non-adaptive portion, wherein coding accuracy of said adaptive portion is established by said adaptive bit allocation means, and
formatting means for assembling digital information including said subband information into a digital output having a format suitable for transmission or storage.
2. An encoder according to claim 1 wherein the coding accuracy of said non-adaptive portion is less than the accuracy required to have no audible quantizing noise.
3. An encoder according to claim 1 wherein said subband means generates said subband information by applying a discrete transform function to blocks of said signal samples.
4. An encoder according to claim 1 wherein said subband means comprises filter bank means and means for storing coding information defining the coding accuracy for said non-adaptive portion, wherein said coding information is preestablished by comparing a representative frequency response for said filter bank means for each of said subbands to a corresponding psychoacoustic masking threshold representative of one or more of said subbands.
5. An encoder according to claim 4 wherein a psychoacoustic masking threshold having a relatively high selectivity for frequencies below a masking tone or narrow band of noise is taken as representative of the psychoacoustic masking threshold in lower frequency subbands and a psychoacoustic masking threshold having a relatively low selectivity for frequencies below a masking tone or narrow band of noise is taken as representative of the psychoacoustic masking threshold in higher frequency subbands.
6. An encoder according to claim 5 wherein a psychoacoustic masking threshold for a single tone or very narrow band of noise at about 1 kHz is taken as representative for subbands within the frequency range of about 500 Hz to 2 kHz and a psychoacoustic masking threshold for a single tone or very narrow band of noise at about 4 kHz is taken as representative for subbands above about 2 kHz.
7. An encoder according to claim 4 wherein said coding information defines said coding accuracy for said non-adaptive portion at a level less than the accuracy required to have no quantizing noise in excess of said corresponding psychoacoustic masking threshold.
8. An encoder according to claim 7 wherein said coding information defines said coding accuracy at a level two bits fewer than said accuracy required to have no quantizing noise in excess of said corresponding psychoacoustic masking threshold.
9. An encoder according to claim 1 or 4 wherein said subband means represents said subband information in block-floating-point form comprising one or more mantissas and one or more exponents, wherein said coding accuracy of said adaptive portion is based on an effective exponent value for each of said digital words, said effective exponent value derived from the value or values of said one or more exponents.
10. An encoder according to claim 9 wherein said subband information comprises one or more mantissas and a subband exponent for each of said subbands, each of said mantissas corresponding to a respective one of said digital words, said effective exponent value for each of said digital words equal to the value of the corresponding subband exponent.
11. An encoder according to claim 9 wherein said subband information comprises one or more mantissas and a subband exponent for each of said subbands, and one or more master exponents, each master exponent associated with a set of subbands, each of said mantissas corresponding to a respective one of said digital words, said effective exponent value for each of said digital words derived from a combination of the values of the corresponding subband exponent and the associated master exponent.
12. An encoder according to claim 9, wherein subband information generated in response to an interval of said signal samples constitutes a subband information block, said subband means further comprising means for estimating the relative energy level of each subband represented in a subband information block, wherein said adaptive bit allocation means assigns bits to at least some digital words, said adaptive bit allocation means comprising
means for allocating at most a maximum number of bits to each of the digital words of a first group of subbands possessing the greatest energy levels and stopping when a certain number of bits has been allocated to each of the digital words of said first group of subbands, and
means for allocating bits to the digital words of a second group of subbands adjoining subbands in which each of the digital words have been allocated said certain number of bits, each of the subbands of said second group of subbands constituting one subband of a pair of subbands immediately adjacent to said subbands in which digital words have been allocated said certain number of bits.
13. An encoder according to claim 12 wherein said certain number of bits is equal to said maximum number of bits.
14. An encoder according to claim 12 wherein said means for estimating the relative energy level estimates said relative energy level based upon the effective exponent value of each subband represented in a subband information block.
15. An encoder according to claim 14 wherein said means for estimating the relative energy level comprises
means for ascertaining the effective exponent value of the subband which contains the maximum of the values represented by each mantissa in combination with its associated effective exponent value, and
means for assigning a level number to each of all subbands represented in said subband information block, said level number equal to said maximum number of bits reduced by the absolute value of the difference between the ascertained effective exponent value and the effective exponent value corresponding to the subband for which a level is to be assigned, but in no case assigning a level number less than zero.
16. An encoder according to claim 12 wherein said means for allocating bits to the digital words constituting said second group of subbands allocates bits to the digital words of said adjacent subbands on the low-frequency side before bits are allocated to the digital words of said adjacent subbands on the high-frequency side.
17. An encoder according to claim 12 wherein said adaptive bit allocation means stops allocating bits when the number of bits allocated equals a limited number of adaptively allocatable bits.
18. An encoder according to claim 12 wherein said adaptive bit allocation means stops allocating bits when the number of bits allocated equals or exceeds a limited number of adaptively allocatable bits, said means further comprising a means for reducing the number of bits adaptively allocated to selected digital words until the number of bits adaptively allocated equals said limited number of adaptively allocatable bits.
19. An encoder according to claim 9 wherein said formatting means assembles bits representing said non-adaptive portion of each of said digital words and bits representing said one or more exponents apart from bits representing said adaptive portion of each of said digital words.
20. An encoder according to claim 19 wherein said formatting means assembles said digital information into frames and inserts the bits representing said non-adaptive portion of each of said digital words and the bits representing said one or more exponents into preestablished positions within a respective one of said frames.
21. An encoder according to claim 20 wherein said formatting means inserts into a respective one of said frames the bits representing said non-adaptive portion of each of said digital words and the bits representing said one or more exponents ahead of the bits representing said adaptive portion of each of said digital words.
22. An encoder according to claim 1 or 4, wherein subband information generated in response to an interval of said signal samples constitutes a subband information block, said subband means further comprising means for estimating the relative energy level of each subband represented in a subband information block, wherein said adaptive bit allocation means assigns bits to at least some digital words, said adaptive bit allocation means comprising
means for allocating at most a maximum number of bits to each of the digital words of a first group of subbands possessing the greatest energy levels and stopping when a certain number of bits has been allocated to each of the digital words of said first group of subbands, and
means for allocating bits to the digital words of a second group of subbands adjoining subbands in which each of the digital words have been allocated said certain number of bits, each of the subbands of said second group of subbands constituting one subband of a pair of subbands immediately adjacent to said subbands in which digital words have been allocated said certain number of bits.
23. An encoder according to claim 22 wherein said certain number of bits is equal to said maximum number of bits.
24. An encoder according to claim 22 wherein said means for allocating bits to the digital words constituting said second group of subbands allocates bits to the digital words of said adjacent subbands on the low-frequency side before bits are allocated to the digital words of said adjacent subbands on the high-frequency side.
25. An encoder according to claim 22 wherein said adaptive bit allocation means stops allocating bits when the number of bits allocated equals a limited number of adaptively allocatable bits.
26. An encoder according to claim 22 wherein said adaptive bit allocation means stops allocating bits when the number of bits allocated equals or exceeds a limited number of adaptively allocatable bits, said means further comprising a means for reducing the number of bits adaptively allocated to selected digital words until the number of bits adaptively allocated equals said limited number of adaptively allocatable bits.
27. An encoder according to claim 1 or 4 wherein said adaptive bit allocation means stops allocating bits when the number of bits allocated equals a limited number of adaptively allocatable bits.
28. An encoder according to claim 1 or 4 wherein said adaptive bit allocation means stops allocating bits when the number of bits allocated equals or exceeds a limited number of adaptively allocatable bits, said means further comprising a means for reducing the number of bits adaptively allocated to selected digital words until the number of bits adaptively allocated equals said limited number of adaptively allocatable bits.
29. An encoder according to claim 1 or 4 wherein said formatting means assembles bits representing said non-adaptive portion of each of said digital words apart from bits representing said adaptive portion of each of said digital words.
30. An encoder according to claim 29 wherein said formatting means assembles said digital information into frames and inserts the bits representing said non-adaptive portion of each of said digital words into pre-established positions within a respective one of said frames.
31. An encoder according to claim 30 wherein said formatting means inserts into a respective one of said frames the bits representing said non-adaptive portion of each of said digital words ahead of the bits representing said adaptive portion of each of said digital words.
32. An encoder for the encoding of audio information comprising signal samples, said encoder having a short signal propagation delay, comprising
means for receiving and grouping said signal samples into overlapping signal sample blocks, the length of the overlap constituting an overlap interval, said signal sample blocks having a time period resulting in a signal propagation delay short enough so that an encoding/decoding system employing the encoder is usable for real-time aural feedback to a human operator,
analysis-window means for weighting each signal sample block by an analysis window, wherein said analysis window constitutes one window of an analysis-synthesis window pair, wherein the product of both windows in said window pair is equal to a product window prederived from an analysis-only window permitting the design of a filter bank in which transform-based digital filters have the ability to trade off steepness of transition band rolloff against depth of stopband rejection in the filter characteristics, and wherein said product window overlapped with itself sums to a constant value across the overlap interval,
means for generating transform coefficients by applying a discrete transform function to each of said analysis-window weighted signal sample blocks,
means for quantizing each of said transform coefficients, and
formatting means for assembling the quantized transform coefficients into a digital output having a format suitable for transmission or storage.
33. An encoder according to claim 32 wherein said product window is derived from an analysis-only window selected from the set of the Kaiser-Bessel window, the Dolph-Chebyshev window, and windows derived from finite impulse filter coefficients using the Parks-McClellan method.
34. An encoder according to claim 32 wherein said means for generating transform coefficients alternately applies a modified Discrete Cosine Transform and a modified Discrete Sine Transform in accordance with the Time-Domain Aliasing Cancellation technique and wherein said product window is derived from a Kaiser-Bessel window having an alpha value in the range of four through seven.
35. An encoder according to claim 32 wherein said means for generating transform coefficients applies a Discrete Fourier Transform and wherein said product window is derived from a Kaiser-Bessel window having an alpha value in the range of one and one-half through three.
36. An encoder according to claim 32 wherein said product window is prederived by
(1) defining an initial window comprising substantially any window in said class of analysis windows having a length equal to one plus the number of samples in the overlap interval,
(2) defining a first unit pulse function, the duration of which is equal to the length of said signal blocks less the overlap interval,
(3) obtaining an interim window by convolving said initial window with said first unit pulse function,
(4) defining a scaling factor by convolving said initial window with a second unit pulse function of duration equal to one, and
(5) obtaining said product window by dividing each element of said interim window by said scaling factor.
37. An encoder according to claim 32 wherein said steepness of transition band rolloff is maximized for a desired depth of stopband rejection.
38. An encoder according to claim 37 wherein the desired depth of stopband rejection is determined empirically by listening tests.
39. An encoder according to claim 37 wherein said transition band rolloff generally follows the lower slope of the human ear's psychoacoustic masking curve within a critical band.
40. A decoder for the reproduction of audio information comprising signal samples from a coded signal including digital information, said decoder comprising
deformatting means, including adaptive bit allocation means, for defining subbands and for deriving subband information in response to said coded signal, and for reconstructing digital words using said derived subband information, said digital words comprising an adaptive portion and a non-adaptive portion, wherein coding accuracy of said adaptive portion is established by said adaptive bit allocation means,
inverse subband means for generating signal samples in response to said subband information, and
means for generating said reproduction of audio information in response to said signal samples.
41. A decoder according to claim 40 wherein the coding accuracy of said non-adaptive portion is less than the accuracy required to have no audible quantizing noise.
42. A decoder according to claim 40 wherein said inverse subband means generates said signal samples by applying an inverse discrete transform function to blocks of said subband information.
43. A decoder according to claim 40 wherein said inverse subband means comprises inverse filter bank means and means for storing coding information defining the coding accuracy for said non-adaptive portion, wherein said coding information is preestablished by comparing a representative frequency response for said inverse filter bank means for each of said subbands to a corresponding psychoacoustic masking threshold representative of one or more of said subbands.
44. A decoder according to claim 43 wherein a psychoacoustic masking threshold having a relatively high selectivity for frequencies below a masking tone or narrow band of noise is taken as representative of the psychoacoustic masking threshold in lower frequency subbands and a psychoacoustic masking threshold having a relatively low selectivity for frequencies below a masking tone or narrow band of noise is taken as representative of the psychoacoustic masking threshold in higher frequency subbands.
45. A decoder according to claim 44 wherein a psychoacoustic masking threshold for a single tone or very narrow band of noise at about 1 kHz is taken as representative for subbands within the frequency range of about 500 Hz to 2 kHz and a psychoacoustic masking threshold for a single tone or very narrow band of noise at about 4 kHz is taken as representative for subbands above about 2 kHz.
46. A decoder according to claim 43 wherein said coding information defines said coding accuracy for said non-adaptive portion at a level less than the accuracy required to have no quantizing noise in excess of said corresponding psychoacoustic masking threshold.
47. A decoder according to claim 46 wherein said coding information defines said coding accuracy at a level two bits fewer than said accuracy required to have no quantizing noise in excess of said corresponding psychoacoustic masking threshold.
48. A decoder according to claim 40 or 43 wherein said subband information is expressed in block-floating-point form comprising one or more mantissas and one or more exponents, wherein said coding accuracy of said adaptive portion is based on an effective exponent value for each of said digital words, said effective exponent value derived from the value or values of said one or more exponents.
49. A decoder according to claim 48 wherein said subband information comprises one or more mantissas and a subband exponent for each of said subbands, each of said mantissas corresponding to a respective one of said digital words, said effective exponent value for each of said digital words equal to the value of the corresponding subband exponent.
50. A decoder according to claim 48 wherein said subband information comprises one or more mantissas and a subband exponent for each of said subbands, and one or more master exponents, each master exponent associated with a set of subbands, each of said mantissas corresponding to a respective one of said digital words, said effective exponent value for each of said digital words derived from a combination of the values of the corresponding subband exponent and the associated master exponent.
51. A decoder according to claim 48 wherein said derived subband information generated in response to an interval of said coded signal constitutes a subband information block, said decoder further comprising means for estimating the relative energy level of each subband represented in a subband information block, and wherein said adaptive bit allocation means assigns bits to at least some digital words, said adaptive bit allocation means comprising
means for allocating at most a maximum number of bits to each of the digital words of a first group of subbands possessing the greatest energy levels and stopping when a certain number of bits has been allocated to each of the digital words of said first group of subbands, and
means for allocating bits to the digital words of a second group of subbands adjoining subbands in which each of the digital words have been allocated said certain number of bits, each of the subbands of said second group of subbands constituting one subband of a pair of subbands immediately adjacent to said subbands in which digital words have been allocated said certain number of bits.
52. A decoder according to claim 51 wherein said certain number of bits is equal to said maximum number of bits.
53. A decoder according to claim 51 wherein said means for estimating the relative energy level estimates said relative energy level based upon the effective exponent value.
54. A decoder according to claim 53 wherein said means for estimating the relative energy level comprises
means for ascertaining the effective exponent value of the subband which contains the maximum of the values represented by each mantissa in combination with its associated effective exponent value, and
means for assigning a level number to each of all subbands represented in said subband information block, said level number equal to said maximum number of bits reduced by the absolute value of the difference between the ascertained effective exponent value and the effective exponent value corresponding to the subband for which a level is to be assigned, but in no case assigning a level number less than zero.
55. A decoder according to claim 51 wherein said means for allocating bits to the digital words constituting said second group of subbands allocates bits to the digital words of said adjacent subbands on the low-frequency side before bits are allocated to the digital words of said adjacent subbands on the high-frequency side.
56. A decoder according to claim 51 wherein said adaptive bit allocation means stops allocating bits when the number of bits allocated equals a limited number of adaptively allocatable bits.
57. A decoder according to claim 51 wherein said adaptive bit allocation means stops allocating bits when the number of bits allocated equals or exceeds a limited number of adaptively allocatable bits, said means further comprising a means for reducing the number of bits adaptively allocated to selected digital words until the number of bits adaptively allocated equals said limited number of adaptively allocatable bits.
58. A decoder according to claim 48 wherein said deformatting means reconstructs each digital word from bits representing said non-adaptive portion and bits representing said one or more exponents assembled in said coded signal apart from bits representing said adaptive portion.
59. A decoder according to claim 58 wherein said deformatting means reconstructs each digital words from bits representing said non-adaptive portion and bits representing said one or more exponents which occupy pre-established positions within said subband information block.
60. A decoder according to 59 wherein said deformatting means reconstructs each digital word from bits representing said non-adaptive portion and bits representing said one or more exponents which occupy positions in said subband information block ahead of bits representing said adaptive portion.
61. A decoder according to claim 40 or 43 wherein said derived subband information generated in response to an interval of said coded signal constitutes a subband information block, said decoder further comprising means for estimating the relative energy level of each subband represented in a subband information block, and wherein said adaptive bit allocation means assigns bits to at least some digital words, said adaptive bit allocation means comprising
means for allocating at most a maximum number of bits to each of the digital words of a first group of subbands possessing the greatest energy levels and stopping when a certain number of bits has been allocated to each of the digital words of said first group of subbands, and
means for allocating bits to the digital words of a second group of subbands adjoining subbands in which each of the digital words have been allocated said certain number of bits, each of the subbands of said second group of subbands constituting one subband of a pair of subbands immediately adjacent to said subbands in which digital words have been allocated said certain number of bits.
62. A decoder according to claim 61 wherein said certain number of bits is equal to said maximum number of bits.
63. A decoder according to claim 61 wherein said means for allocating bits to the digital words constituting said second group of subbands allocates bits to the digital words of said adjacent subbands on the low-frequency side before bits are allocated to the digital words of said adjacent subbands on the high-frequency side.
64. A decoder according to claim 61 wherein said adaptive bit allocation means stops allocating bits when the number of bits allocated equals a limited number of adaptively allocatable bits.
65. A decoder according to claim 61 wherein said adaptive bit allocation means stops allocating bits when the number of bits allocated equals or exceeds a limited number of adaptively allocatable bits, said means further comprising a means for reducing the number of bits adaptively allocated to selected digital words until the number of bits adaptively allocated equals said limited number of adaptively allocatable bits.
66. A decoder according to claim 40 or 43 wherein said adaptive bit allocation means stops allocating bits when the number of bits allocated equals a limited number of adaptively allocatable bits.
67. A decoder according to claim 40 or 43 wherein said adaptive bit allocation means stops allocating bits when the number of bits allocated equals or exceeds a limited number of adaptively allocatable bits, said means further comprising a means for reducing the number of bits adaptively allocated to selected digital words until the number of bits adaptively allocated equals said limited number of adaptively allocatable bits.
68. A decoder according to claim 40 or 43 wherein said deformatting means reconstructs each digital word from bits representing said non-adaptive portion assembled in said coded signal apart from bits representing said adaptive portion.
69. A decoder according to claim 68 wherein said deformating means reconstructs each digital word from bits representing said non-adaptive portion which occupy pre-established positions within said subband information block.
70. A decoder according to 69 wherein said deformatting means reconstructs each digital word from bits representing said non-adaptive portion which occupy positions in said subband information block ahead of bits representing said adaptive portion.
71. A decoder for the reproduction of audio information comprising signal samples from a coded signal generated by an encoder that groups said signal samples into overlapping signal sample blocks, the length of the overlap constituting an overlap interval, weights each sample block with an analysis window, generates transform coefficients by applying a discrete transform to the analysis-window weighted signal sample blocks, quantizes each transform coefficient and assembles the quantized transform coefficients into a digital output having a format suitable for transmission or storage, said decoder comprising
means for receiving said digital output for deriving said quantized transform coefficients therefrom,
means for reconstructing decoded transform coefficients from the deformatted quantized transform coefficients,
means for generating signal sample blocks by applying an inverse discrete transform function to said decoded transform coefficients, said inverse discrete transform having characteristics inverse to those of said discrete transform in the encoder, said signal sample blocks having a time period resulting in a signal propagation delay short enough so that an encoding/decoding system employing the decoder is usable for real-time aural feedback to a human operator,
synthesis window means for weighting the signal sample blocks by a synthesis window, wherein a product window equal to the product of said synthesis window and said analysis window is prederived from an analysis-only window permitting the design of a filter bank in which transform-based digital filters have the ability to trade off steepness of transition band rolloff against depth of stopband rejection in the filter characteristics, and wherein said product window overlapped with itself sums to a constant value across the overlap interval, and
means for cancelling the weighting effects of the analysis window means and the synthesis window means to recover said signal samples by adding overlapped signal sample blocks across said overlap interval.
72. A decoder according to claim 71 wherein said product window is derived from an analysis-only window selected from the set of the Kaiser-Bessel window, the Dolph-Chebyshev window, and windows derived from finite impulse filter coefficients using the Parks-McClellan method.
73. A decoder according to claim 71 wherein said means for generating transform coefficients alternately applies an inverse modified Discrete Cosine Transform and an inverse modified Discrete Sine Transform in accordance with the Time-Domain Aliasing Cancellation technique and wherein said product window is derived from a Kaiser-Bessel window having an alpha value in the range of four through seven.
74. A decoder according to claim 71 wherein said means for generating transform coefficients applies an inverse Discrete Fourier Transform and wherein said product window is derived from a Kaiser-Bessel window having an alpha value in the range of one and onehalf through three.
75. A decoder according to claim 71 wherein said product window is prederived by
(1) defining an initial window comprising substantially any window in said class of analysis windows having a length equal to one plus the number of samples in the overlap interval,
(2) defining a first unit pulse function the duration of which is equal to the length of said signal blocks less the overlap interval,
(3) obtaining an interim window by convolving said initial window with said first unit pulse function,
(4) defining a scaling factor by convolving said initial window with a second unit pulse function of duration equal to one, and
(5) obtaining said product window by dividing each element of said interim window by said scaling factor.
76. A decoder according to claim 71 wherein said steepness of transition band rolloff is maximized for a desired depth of stopband rejection.
77. A decoder according to claim 76 wherein the desired depth of stopband rejection is determined empirically by listening tests.
78. A decoder according to claim 76 wherein said transition band rolloff generally follows the lower slope of the human ear's psychoacoustic masking curve within a critical band.
79. An encoding method for the encoding of audio information comprising signal samples, said encoding method comprising
receiving said signal samples,
defining subbands and generating subband information in response to said signal samples, said subband information for each of said subbands including one or more digital words, each of said digital words comprising an adaptive portion and a non-adaptive portion, wherein coding accuracy of said adaptive portion is established by adaptive bit allocating, and
assembling digital information including said subband information into a digital output having a format suitable for transmission or storage.
80. An encoding method according to claim 79 wherein the coding accuracy of said non-adaptive portion is less than the accuracy required to have no audible quantizing noise.
81. An encoding method according to claim 79 wherein said generating subband information applies a discrete transform function to blocks of said signal samples.
82. An encoding method according to claim 79 wherein said generating subband information comprises filtering and storing coding information defining the coding accuracy for said non-adaptive portion, wherein said coding information is preestablished by comparing a representative frequency response for said filtering for each of said subbands to a corresponding psychoacoustic masking threshold representative of one or more of said subbands.
83. An encoding method according to claim 82 wherein a psychoacoustic masking threshold having a relatively high selectivity for frequencies below a masking tone or narrow band of noise is taken as representative of the psychoacoustic masking threshold in lower frequency subbands and a psychoacoustic masking threshold having a relatively low selectivity for frequencies below a masking tone or narrow band of noise is taken as representative of the psychoacoustic masking threshold in higher frequency subbands.
84. An encoding method according to claim 83 wherein a psychoacoustic masking threshold for a single tone or very narrow band of noise at about 1 kHz is taken as representative for subbands within the frequency range of about 500 Hz to 2 kHz and a psychoacoustic masking threshold for a single tone or very narrow band of noise at about 4 kHz is taken as representative for subbands above about 2 kHz.
85. An encoding method according to claim 82 wherein said coding information defines said coding accuracy for said non-adaptive portion at a level less than the accuracy required to have no quantizing noise in excess of said corresponding psychoacoustic masking threshold.
86. An encoding method according to claim 85 wherein said coding information defines said coding accuracy at a level two bits fewer than said accuracy required to have no quantizing noise in excess of said corresponding psychoacoustic masking threshold.
87. An encoding method according to claim 79 or 82 wherein said generating subband information represents said subband information in block-floating-point form comprising one or more mantissas and one or more exponents, wherein said coding accuracy of said adaptive portion is based on an effective exponent value for each of said digital words, said effective exponent value derived from the value or values of said one or more exponents.
88. An encoding method according to claim 87 wherein said subband information comprises one or more mantissas and a subband exponent for each of said subbands, each of said mantissas corresponding to a respective one of said digital words, said effective exponent value for each of said digital words equal to the value of the corresponding subband exponent.
89. An encoding method according to claim 87 wherein said subband information comprises one or more mantissas and a subband exponent for each of said subbands, and one or more master exponents, each master exponent associated with a set of subbands, each of said mantissas corresponding to a respective one of said digital words, said effective exponent value for each of said digital words derived from a combination of the values of the corresponding subband exponent and the associated master exponent.
90. An encoding method according to claim 87, wherein subband information generated in response to an interval of said signal samples constitutes a subband information block, said generating subband information further comprising estimating the relative energy level of each subband represented in a subband information block, wherein said adaptive bit allocating assigns bits to at least some digital words, said adaptive bit allocating comprising
allocating at most a maximum number of bits to each of the digital words of a first group of subbands possessing the greatest energy levels and stopping when a certain number of bits has been allocated to each of the digital words of said first group of subbands, and
allocating bits to the digital words of a second group of subbands adjoining subbands in which each of the digital words have been allocated said certain number of bits, each of the subbands of said second group of subbands constituting one subband of a pair of subbands immediately adjacent to said subbands in which digital words have been allocated said certain number of bits.
91. An encoding method according to claim 90 wherein said certain number of bits is equal to said maximum number of bits.
92. An encoding method according to claim 90 wherein said estimating the relative energy level estimates said relative energy level based upon the effective exponent value of each subband represented in a subband information block.
93. An encoding method according to claim 92 wherein said estimating the relative energy level comprises
ascertaining the effective exponent value of the subband which contains the maximum of the values represented by each mantissa in combination with its associated effective exponent value, and
assigning a level number to each of all subbands represented in said subband information block, said level number equal to said maximum number of bits reduced by the absolute value of the difference between the ascertained effective exponent value and the effective exponent value corresponding to the subband for which a level is to be assigned, but in no case assigning a level number less than zero.
94. An encoding method according to claim 90 wherein said allocating bits to the digital words constituting said second group of subbands allocates bits to the digital words of said adjacent subbands on the low-frequency side before bits are allocated to the digital words of said adjacent subbands on the high-frequency side.
95. An encoding method according to claim 90 wherein said adaptive bit allocating stops allocating bits when the number of bits allocated equals a limited number of adaptively allocatable bits.
96. An encoding method according to claim 90 wherein said adaptive bit allocating stops allocating bits when the number of bits allocated equals or exceeds a limited number of adaptively allocatable bits, said adaptive bit allocating further comprising reducing the number of bits adaptively allocated to selected digital words until the number of bits adaptively allocated equals said limited number of adaptively allocatable bits.
97. An encoding method according to claim 87 wherein said assembling digital information assembles bits representing said non-adaptive portion of each of said digital words and bits representing said one or more exponents apart from bits representing said adaptive portion of each of said digital words.
98. An encoding method according to claim 97 wherein said assembling digital information assembles said digital information into frames and inserts the bits representing said non-adaptive portion of each of said digital words and the bits representing said one or more exponents into pre-established positions within a respective one of said frames.
99. An encoding method according to claim 98 wherein said assembling digital information inserts into a respective one of said frames the bits representing said non-adaptive portion of each of said digital words and the bits representing said one or more exponents ahead of the bits representing said adaptive portion of each of said digital words.
100. An encoding method according to claim 79 or 82, wherein subband information generated in response to an interval of said signal samples constitutes a subband information block, said generating subband information further comprising estimating the relative energy level of each subband represented in a subband information block, wherein said adaptive bit allocating assigns bits to at least some digital words, said adaptive bit allocating comprising
allocating at most a maximum number of bits to each of the digital words of a first group of subbands possessing the greatest energy levels and stopping when a certain number of bits has been allocated to each of the digital words of said first group of subbands, and
allocating bits to the digital words of a second group of subbands adjoining subbands in which each of the digital words have been allocated said certain number of bits, each of the subbands of said second group of subbands constituting one subband of a pair of subbands immediately adjacent to said subbands in which digital words have been allocated said certain number of bits.
101. An encoding method according to claim 100 wherein said certain number of bits is equal to said maximum number of bits.
102. An encoding method according to claim 100 wherein said allocating bits to the digital words constituting said second group of subbands allocates bits to the digital words of said adjacent subbands on the low-frequency side before bits are allocated to the digital words of said adjacent subbands on the high-frequency side.
103. An encoding method according to claim 100 wherein said adaptive bit allocating stops allocating bits when the number of bits allocated equals a limited number of adaptively allocatable bits.
104. An encoding method according to claim 100 wherein said adaptive bit allocating stops allocating bits when the number of bits allocated equals or exceeds a limited number of adaptively allocatable bits, said adaptive bit allocating further comprising reducing the number of bits adaptively allocated to selected digital words until the number of bits adaptively allocated equals said limited number of adaptively allocatable bits.
105. An encoding method according to claim 79 or 82 wherein said adaptive bit allocating stops allocating bits when the number of bits allocated equals a limited number of adaptively allocatable bits.
106. An encoding method according to claim 79 or 82 wherein said adaptive bit allocating stops allocating bits when the number of bits allocated equals or exceeds a limited number of adaptively allocatable bits, said adaptive bit allocating further comprising reducing the number of bits adaptively allocated to selected digital words until the number of bits adaptively allocated equals said limited number of adaptively allocatable bits.
107. An encoding method according to claim 79 or 82 wherein said assembling digital information assembles bits representing said non-adaptive portion of each of said digital words apart from bits representing said adaptive portion of each of said digital words.
108. An encoding method according to claim 107 wherein said assembling digital information assembles said digital information into frames and inserts the bits representing said non-adaptive portion of each of said digital words into pre-established positions within a respective one of said frames.
109. An encoding method according to claim 108 wherein said assembling digital information inserts into a respective one of said frames the bits representing said non-adaptive portion of each of said digital words ahead of the bits representing said adaptive portion of each of said digital words.
110. An encoding method for the encoding of audio information comprising signal samples, said encoding method having a short signal propagation delay, comprising
receiving and grouping said signal samples into overlapping signal sample blocks, the length of the overlap constituting an overlap interval, said signal sample blocks having a time period resulting in a signal propagation delay short enough so that an encoding/decoding method employing the encoding method is usable for real-time aural feedback to a human operator,
weighting each signal sample block by an analysis window, wherein said analysis window constitutes one window of an analysis-synthesis window pair, wherein the product of both windows in said window pair is equal to a product window prederived from an analysis-only window permitting the design of a filter bank in which transform-based digital filters have the ability to trade off steepness of transition band rolloff against depth of stopband rejection in the filter characteristics, and wherein said product window overlapped with itself sums to a constant value across the overlap interval,
generating transform coefficients by applying a discrete transform function to each of said analysis-window weighted signal sample blocks,
quantizing each of said transform coefficients, and
assembling the quantized transform coefficients into a digital output having a format suitable for transmission or storage.
111. An encoding method according to claim 110 wherein said product window is derived from an analysis-only window selected from the set of the Kaiser-Bessel window, the Dolph-Chebyshev window, and windows derived from finite impulse filter coefficients using the Parks-McClellan method.
112. An encoding method according to claim 110 wherein said generating transform coefficients alternately applies a modified Discrete Cosine Transform and a modified Discrete Sine Transform in accordance with the Time-Domain Aliasing Cancellation technique and wherein said product window is derived from a Kaiser-Bessel window having an alpha value in the range of four through seven.
113. An encoding method according to claim 110 wherein said generating transform coefficients applies a Discrete Fourier Transform and wherein said product window is derived from a Kaiser-Bessel window having an alpha value in the range of one and one-half through three.
114. An encoding method according to claim 110 wherein said product window is prederived by
(1) defining an initial window comprising substantially any window in said class of analysis windows having a length equal to one plus the number of samples in the overlap interval,
(2) defining a first unit pulse function, the duration of which is equal to the length of said signal blocks less the overlap interval,
(3) obtaining an interim window by convolving said initial window with said first unit pulse function,
(4) defining a scaling factor by convolving said initial window with a second unit pulse function of duration equal to one, and
(5) obtaining said product window by dividing each element of said interim window by said scaling factor.
115. An encoding method according to claim 110 wherein said steepness of transition band rolloff is maximized for a desired depth of stopband rejection.
116. An encoding method according to claim 115 wherein the desired depth of stopband rejection is determined empirically by listening tests.
117. An encoding method according to claim 115 wherein said transition band rolloff generally follows the lower slope of the human ear's psychoacoustic masking curve within a critical band.
118. A decoding method for the reproduction of audio information comprising signal samples from a coded signal including digital information, said decoding method comprising
defining subbands and deriving subband information in response to said coded signal, and reconstructing digital words using said derived subband information, said digital words comprising an adaptive portion and a non-adaptive portion, wherein coding accuracy of said adaptive portion is established by adaptive bit allocating,
generating signal samples in response to said subband information, and
generating said reproduction of audio information in response to said signal samples.
119. A decoding method according to claim 118 wherein the coding accuracy of said non-adaptive portion is less than the accuracy required to have no audible quantizing noise.
120. A decoding method according to claim 118 wherein said generating signal samples applies an inverse discrete transform function to blocks of said subband information.
121. A decoding method according to claim 118 wherein said generating signal sample blocks comprises inverse filtering and storing coding information defining the coding accuracy for said non-adaptive portion, wherein said coding information is preestablished by comparing a representative frequency response for said inverse filter bank for each of said subbands to a corresponding psychoacoustic masking threshold representative of one or more of said subbands.
122. A decoding method according to claim 121 wherein a psychoacoustic masking threshold having a relatively high selectivity for frequencies below a masking tone or narrow band of noise is taken as representative of the psychoacoustic masking threshold in lower frequency subbands and a psychoacoustic masking threshold having a relatively low selectively for frequencies below a masking tone or narrow band of noise is taken as representative of the pyschoacoustic masking threshold in higher frequency subbands.
123. A decoding method according to claim 122 wherein a psychoacoustic masking threshold for a single tone or very narrow band of noise of about 1 kHz is taken as representative for subbands within the frequency range of about 500 Hz to 2 kHz and a psychoacoustic masking threshold for a single tone or very narrow band of noise at about 4 kHz is taken as representative for subbands above about 2 kHz.
124. A decoding method according to claim 121 wherein said coding information defines said coding accuracy for said non-adaptive portion at a level less than the accuracy required to have no quantizing noise in excess of said corresponding psychoacoustic masking threshold.
125. A decoding method according to claim 124 wherein said coding information defines said coding accuracy at a level two bits fewer than said accuracy required to have no quantizing noise in excess of said corresponding psychoacoustic masking threshold.
126. A decoding method according to claim 118 or 121 wherein said subband information is expressed in block-floating-point form comprising one or more mantissas and one or more exponents, wherein said coding accuracy of said adaptive portion is based on an effective exponent value for each of said digital words, said effective exponent value derived from the value or values of said one or more exponents.
127. A decoding method according to claim 126 wherein said subband information comprises one or more mantissas and a subband exponent for each of said subbands, each of said mantissas corresponding to a respective one of said digital words, said effective exponent value for each of said digital words equal to the value of the corresponding subband exponent.
128. A decoding method according to claim 126 wherein said subband information comprises one or more mantissas and a subband exponent for each of said subbands, and one or more master exponents, each master exponent associated with a set of subbands, each of said mantissas corresponding to a respective one of said digital words, said effective exponent value for each of said digital words derived from a combination of the values of the corresponding subband exponent and the associated master exponent.
129. A decoding method according to claim 126 wherein said derived subband information generated in response to an interval of said coded signal constitutes a subband information block, said decoding method further comprising estimating the relative energy level of each subband represented in a subband information block, and wherein said adaptive bit allocating assigns bits to at least some digital words, said adaptive bit allocating comprising
allocating at most a maximum number of bits to each of the digital words of a first group of subbands possessing the greatest energy levels and stopping when a certain number of bits has been allocated to each of the digital words of said first group of subbands, and
allocating bits to the digital words of a second group of subbands adjoining subbands in which each of the digital words have been allocated said certain number of bits, each of the subbands of said second group of subbands constituting one subbands of a pair of subbands immediately adjacent to said subbands in which digital words have been allocated said certain number of bits.
130. A decoding method according to claim 129 wherein said certain number of bits is equal to said maximum number of bits.
131. A decoding method according to claim 129 wherein said estimating the relative energy level estimates said relative energy level based upon the effective exponent value.
132. A decoding method according to claim 131 wherein said estimating the relative energy level comprises
ascertaining the effective exponent value of the subband which contains the maximum of the values represented by each mantissa in combination with its associated effective exponent value, and
assigning a level number to each of all subbands represented in said subband information block, said level number equal to said maximum number of bits reduced by the absolute value of the difference between the ascertained effective exponent value and the effective exponent value corresponding to the subband for which a level is to be assigned, but in no case assigning a level number less than zero.
133. A decoding method according to claim 129 wherein said allocating bits to the digital words constituting said second group of subbands allocates bits to the digital words of said adjacent subbands on the low-frequency side before bits are allocated to the digital words of said adjacent subbands on the high-frequency side.
134. A decoding method according to claim 129 wherein said adaptive bit allocating stops allocating bits when the number of bits allocated equals a limited number of adaptively allocatable bits.
135. A decoding method according to claim 129 wherein said adaptive bit allocating stops allocating bits when the number of bits allocated equals or exceeds a limited number of adaptively allocatable bits, said adaptive bit allocating further comprising reducing the number of bits adaptively allocated to selected digital words until the number of bits adaptively allocated equals said limited number of adaptively allocatable bits.
136. A decoding method according to claim 126 wherein said reconstructing digital words reconstructs each digital word from bits representing said non-adaptive portion and bits representing said one or more exponents assembled in said coded signal apart from bits representing said adaptive portion.
137. A decoding method according to claim 136 wherein said reconstructing digital words reconstructs each digital words from bits representing said non-adaptive portion and bits representing said one or more exponents which occupy pre-established positions within said subband information block.
138. A decoding method according to 137 wherein said reconstructing digital words reconstructs each digital word from bits representing said non-adaptive portion and bits representing said one or more exponents which occupy positions in said subband information block ahead of bits representing said adaptive portion.
139. A decoding method according to claim 118 or 121 wherein said derived subband information generated in response to an interval of said coded signal constitutes a subband information block, said decoding method further comprising estimating the relative energy level of each subband represented in a subband information block, and wherein said adaptive bit allocating assigns bits to at least some digital words, said adaptive bit allocating comprising
allocating at most a maximum number of bits to each of the digital words of a first group of subbands possessing the greatest energy levels and stopping when a certain number of bits has been allocated to each of the digital words of said first group of subbands, and
allocating bits to the digital words of a second group of subbands adjoining subbands in which each of the digital words have been allocated said certain number of bits, each of the subbands of said second group of subbands constituting one subband of a pair of subbands immediately adjacent to said subbands in which digital words have been allocated said certain number of bits.
140. A decoding method according to claim 139 wherein said certain number of bits is equal to said maximum number of bits.
141. A decoding method according to claim 139 wherein said allocating bits to the digital words constituting said second group of subbands allocates bits to the digital words of said adjacent subbands on the low-frequency side before bits are allocated to the digital words of said adjacent subbands on the high-frequency side.
142. A decoding method according to claim 139 wherein said adaptive bit allocating stops allocating bits when the number of bits allocated equals a limited number of adaptively allocatable bits.
143. A decoding method according to claim 139 wherein said adaptive bit allocating stops allocating bits when the number of bits allocated equals or exceeds a limited number of adaptively allocatable bits, said adaptive bit allocating further comprising reducing the number of bits adaptively allocated to selected digital words until the number of bits adaptively allocated equals said limited number of adaptively allocatable bits.
144. A decoding method according to claim 118 or 121 wherein said adaptive bit allocating stops allocating bits when the number of bits allocated equals a limited number of adaptively allocatable bits.
145. A decoding method according to claim 118 or 121 wherein said adaptive bit allocating stops allocating bits when the number of bits allocated equals or exceeds a limited number of adaptively allocatable bits, said adaptive bit allocating further comprising reducing the number of bits adaptively allocated to selected digital words until the number of bits adaptively allocated equals said limited number of adaptively allocatable bits.
146. A decoding method according to claim 118 or 121 wherein said reconstructing digital words reconstructs each digital word from bits representing said non-adaptive portion assembled in said coded signal apart from bits representing said adaptive portion.
147. A decoding method according to claim 146 wherein said reconstructing digital words reconstructs each digital word from bits representing said non-adaptive portion which occupy pre-established positions within said subband information block.
148. A decoding method according to 147 wherein said reconstructing digital words reconstructs each digital word from bits representing said non-adaptive portion which occupy positions in said subband information block ahead of bits representing said adaptive portion.
149. A decoding method for the reproduction of audio information comprising signal samples from a coded signal generated by an encoding method that groups said signal samples into overlapping signal sample blocks, the length of the overlap constituting an overlap interval, weights each sample block with an analysis window, generates transform coefficients by applying a discrete transform to the analysis-window weighted signal sample blocks, quantizes each transform coefficient and assembles the quantized transform coefficients into a digital output having a format suitable for transmission or storage, said decoding method comprising
receiving said digital output for deriving said quantized transform coefficients therefrom,
reconstructing decoded transform coefficients from the deformatted quantized transform coefficients,
generating signal sample blocks by applying an inverse discrete transform function to said decoded transform coefficients, said inverse discrete transform having characteristics inverse to those of said discrete transform in the encoding method, said signal sample blocks having a time period resulting in a signal propagation delay short enough so that an encoding/decoding method employing the decoding method is usable for real-time aural feedback to a human operator,
weighting the signal sample blocks by a synthesis window, wherein a product window equal to the product of said synthesis window and said analysis window is prederived from an analysis-only window permitting the design of a filter bank in which transform-based digital filters have the ability to trade off steepness of transition band rolloff against depth of stopband rejection in the filter characteristics, and wherein said product window overlapped with itself sums to a constant value across the overlap interval, and
cancelling the weighting effects of the analysis window and the synthesis window to recover said signal samples by adding overlapped signal sample blocks across said overlap interval.
150. A decoding method according to claim 149 wherein said product window is derived from an analysis-only window selected from the set of the Kaiser-Bessel window, the Dolph-Chebyshev window, and windows derived from finite impulse filter coefficients using the Parks-McClellan method.
151. A decoding method according to claim 149 wherein said generating transform coefficients alternately applies an inverse modified Discrete Cosine Transform and an inverse modified Discrete Sine Transform in accordance with the Time-Domain Aliasing Cancellation technique and wherein said product window is derived from a Kaiser-Bessel window having an alpha value in the range of four through seven.
152. A decoding method according to claim 151 wherein said generating transform coefficients applies an inverse Discrete Fourier Transform and wherein said product window is derived from a Kaiser-Bessel window having an alpha value in the range of one and one-half through three.
153. A decoding method according to claim 149 wherein said product window is prederived by
(1) defining an initial window comprising substantially any window in said class of analysis windows having a length equal to one plus the number of samples in the overlap interval,
(2) defining a first unit pulse function the duration of which is equal to the length of said signal blocks less the overlap interval,
(3) obtaining an interim window by convolving said initial window with said first unit pulse function,
(4) defining a scaling factor by convolving said initial window with a second unit pulse function of duration equal to one, and
(5) obtaining said product window by dividing each element of said interim window by said scaling factor.
154. A decoding method according to claim 149 wherein said steepness of transition band rolloff is maximized for a desired depth of stopband rejection.
155. A decoding method according to claim 154 wherein the desired depth of stopband rejection is determined empirically by listening tests.
156. A decoding method according to claim 154 wherein said transition band rolloff generally follows the lower slope of the human ear's psychoacoustic masking curve within a critical band.
157. A method for defining coding information which defines the coding accuracy of digital words representing spectral information in a plurality of frequency subbands, said digital words generated in response to an input signal by a split-band encoder comprising a filter bank, wherein said coding information comprises a nonadaptive coding accuracy, said method comprising
(1) obtaining a predicted quantizing noise spectrum of said split-band encoder for a frequency subband based upon a representative frequency response of said filter bank for said frequency subband,
(2) generating a subband value equal to the number of bits required to quantize spectral energy within said frequency subband such that said predicted quantizing noise spectrum does not exceed a representative psychoacoustic masking threshold for spectral energy within said frequency subband,
(3) setting said nonadaptive coding accuracy for said frequency subband equal to or less than said subband value, and
(4) reiterating the previous steps for each of said plurality of frequency subbands.
158. A method according to claim 157 wherein said nonadaptive coding accuracy for at least one of said plurality of frequency subbands is set equal to a value less than the respective subband value.
Description
TECHNICAL FIELD
The invention relates in general to the high-quality low bit-rate digital signal processing of audio signals, such as music signals. More particularly, the invention relates to transform encoders and decoders for such signals, wherein the encoders and decoders have a short signal-propagation delay. Short delays are important in applications such as broadcast audio where a speaker must monitor his own voice. A delay in voice feedback causes serious speech disruption unless the delay is very short.
BACKGROUND ART
INTRODUCTION
Transform coding of high-quality signals in the prior art have used long signal sample block lengths to achieve low bit-rate coding without creating objectionable audible distortion. For example, a transform coder disclosed in EP 0 251 028 uses a block length of 1024 samples. Long block lengths have been necessary because shorter blocks degrade transform coder selectivity. Filter selectivity is critical because transform coders with sufficient filter bank selectivity can exploit psychoacoustic masking properties of human hearing to reduce bit-rate requirements without degrading the subjective quality of the coded signal.
Coders using long block lengths suffer from two problems: (1) audible distortion of signals with large transients caused by the temporal spreading of the transient's effects throughout the transform block, and (2) excessive propagation delay of the signal through the encoding and decoding process. In prior art coders, these processing delays are too great for applications such as broadcast audio where a speaker must monitor his own voice. A delay in voice feedback causes serious speech disruption unless the delay is kept very short.
The background art is discussed in more detail in the following Background Summary.
BACKGROUND SUMMARY
There is considerable interest among those in the field of signal processing to discover methods which minimize the amount of information required to represent adequately a given signal. By reducing required information, signals may be transmitted over communication channels with lower bandwidth, or stored in less space. With respect to digital techniques, minimal informational requirements are synonymous with minimal binary bit requirements.
Two factors limit the reduction of bit requirements:
(1) A signal of bandwidth W may be accurately represented by a series of samples taken at a frequency no less than 2.multidot.W. This is the Nyquist sampling rate. Therefore, a signal T seconds in length with a bandwidth W requires at least
2.multidot.W.multidot.T number of samples for accurate representation.
(2) Quantization of signal samples which may assume any of a continuous range of values introduces inaccuracies in the representation of the signal which are proportional to the quantizing step size or resolution. These inaccuracies are called quantization errors. These errors are inversely proportional to the number of bits available to represent the signal sample quantization.
If coding techniques are applied to the full bandwidth, all quantizing errors, which manifest themselves as noise, are spread uniformly across the bandwidth. Techniques which may be applied to selected portions of the spectrum can limit the spectral spread of quantizing noise. Two such techniques are subband coding and transform coding. By using these techniques, quantizing errors can be reduced in particular frequency bands where quantizing noise is especially objectionable by quantizing that band with a smaller step size.
Subband coding may be implemented by a bank of digital bandpass filters. Transform coding may be implemented by any of several time-domain to frequency-domain transforms which simulate a bank of digital bandpass filters. Although transforms are easier to implement and require less computational power and hardware than digital filters, they have less design flexibility in the sense that each bandpass filter "frequency bin" represented by a transform coefficient has a uniform bandwidth. By contrast, a bank of digital bandpass filters can be designed to have different subband bandwidths. Transform coefficients can, however, be grouped together to define "subbands" having bandwidths which are multiples of a single transform coefficient bandwidth. The term "subband" is used hereinafter to refer to selected portions of the total signal bandwidth, whether implemented by a subband coder or a transform coder. A subband as implemented by transform coder is defined by a set of one or more adjacent transform coefficients or frequency bins. The bandwidth of a transform coder frequency bin depends upon the coder's sampling rate and the number of samples in each signal sample block (the transform length).
Two characteristics of subband bandpass filters are particularly critical to the performance of highquality music signal processing systems. The first is the bandwidth of the regions between the filter passband and stopbands (the transition bands). The second is the attenuation level in the stopbands. As used herein, the measure of filter "selectivity" is the steepness of the filter response curve within the transition bands (steepness of transition band rolloff), and the level of attenuation in the stopbands (depth of stopband rejection).
These two filter characteristics are critical because the human ear displays frequency-analysis properties resembling those of highly asymmetrical tuned filters having variable center frequencies. The frequency-resolving power of the human ear's tuned filters varies with frequency throughout the audio spectrum. The ear can discern signals closer together in frequency at frequencies below about 500 Hz, but widening as the frequency progresses upward to the limits of audibility. The effective bandwidth of such an auditory filter is referred to as a critical band. An important quality of the critical band is that psychoacoustic-masking effects are most strongly manifested within a critical band--a dominant signal within a critical band can suppress the audibility of other signals anywhere within that critical band. Signals at frequencies outside that critical band are not masked as strongly. See generally, the Audio Engineering Handbook, K. Blair Benson ed., McGraw-Hill, San Francisco,
1988, pages 1.40-1.42 and 4.8-4.10.
Psychoacoustic masking is more easily accomplished by subband and transform coders if the subband bandwidth throughout the audible spectrum is about half the critical bandwidth of the human ear in the same portions of the spectrum. This is because the critical bands of the human ear have variable center frequencies that adapt to auditory stimuli, whereas subband and transform coders typically have fixed subband center frequencies. To optimize the opportunity to utilize psychoacoustic-masking effects, any distortion artifacts resulting from the presence of a dominant signal should be limited to the subband containing the dominant signal. If the subband bandwidth is about half or less than half of the critical band (and if the transition band rolloff is sufficiently steep and the stopband rejection is sufficiently deep), the most effective masking of the undesired distortion products is likely to occur even for signals whose frequency is near the edge of the subband passband bandwidth. If the subband bandwidth is more than half a critical band, there is the possibility that the dominant signal will cause the ear's critical band to be offset from the coder's subband so that some of the undesired distortion products outside the ear's critical bandwidth are not masked. These effects are most objectionable at low frequencies where the ear's critical band is narrower.
Transform coding performance depends upon several factors, including the signal sample block length, transform coding errors, and aliasing cancellation.
BLOCK LENGTH
Inasmuch as the transform function must wait for the receipt of all signal samples in the entire block before performing the transform, the fastest theoretical time delay in an encode/decode system is twice the time period of the signal sample block. In practical systems, computation adds further delays such that the actual time delay is likely to be three or four times the time period of the signal sample block. If the encode/decode system must operate in an environment requiring a short propagation delay, a short block length is therefore required.
As block lengths become shorter, transform encoder and decoder performance is adversely affected not only by the consequential widening of the frequency bins, but also by degradation of the response characteristics of the bandpass filter frequency bins: (1) decreased rate of transition band rolloff, and (2) reduced level of stopband rejection. This degradation in filter performance results in the undesired creation of or contribution to transform coefficients in nearby frequency bins in response to a desired signal. These undesired contributions are called sidelobe leakage.
Thus, depending on the sampling rate, a short block length may result in a nominal filter bandwidth exceeding the ear's critical bandwidth at some or all frequencies, particularly low frequencies. Even if the nominal subband bandwidth is narrower than the ear's critical bandwidth, degraded filter characteristics manifested as a broad transition band and/or poor stopband rejection may result in significant signal components outside the ear's critical bandwidth. In such cases, greater constraints are ordinarily placed on other aspects of the system, particularly quantization accuracy.
Another disadvantage resulting from short sample block lengths is the exacerbation of transform coding errors, described in the next section.
TRANSFORM CODING ERRORS
Discrete transforms do not produce a perfectly accurate set of frequency coefficients because they work with only a finite segment of the signal. Strictly speaking, discrete transforms produce a time-frequency representation of the input time-domain signal rather than a true frequency-domain representation which would require infinite transform lengths. For convenience of discussion here, however, the output of discrete transforms will be referred to as a frequency-domain representation. In effect, the discrete transform assumes the sampled signal only has frequency components whose periods are a submultiple of the finite sample interval. This is equivalent to an assumption that the finite-length signal is periodic. The assumption in general is not true. The assumed periodicity creates discontinuities at the edges of the finite time interval which cause the transform to create phantom high-frequency components.
One technique which minimizes this effect is to reduce the discontinuity prior to the transformation by weighting the signal samples such that samples near the edges of the interval are close to zero. Samples at the center of the interval are generally passed unchanged, i.e., weighted by a factor of one. This weighting function is called an "analysis window" and may be of any shape, but certain windows contribute more favorably to subband filter performance.
As used herein, the term "analysis window" refers merely to the windowing function performed prior to application of the forward transform. As will be discussed below, the design of an analysis window used in the invention is constrained by synthesis window design considerations. Therefore, design and performance properties of an "analysis window" as that term is commonly used in the art may differ from such analysis windows as implemented in this invention.
While there is no single criteria which may be used to assess a window's quality, general criteria include steepness of transition band rolloff and depth of stopband rejection. In some applications, the ability to trade steeper rolloff for deeper rejection level is a useful quality.
The analysis window is a time-domain function. If no other compensation is provided, the recovered or "synthesized" signal will be distorted according to the shape of the analysis window. There are several compensation methods. For example:
(a) The recovered signal interval or block may be multiplied by an inverse window, one whose weighting factors are the reciprocal of those for the analysis window. A disadvantage of this technique is that it clearly requires that the analysis window not go to zero at the edges.
(b) Consecutive input signal blocks may be overlapped. By carefully designing the analysis window such that two adjacent windows add to unity across the overlap, the effects of the window will be exactly compensated. (But see the following paragraph.) When used with certain types of transforms such as the Discrete Fourier Transform (DFT), this technique increases the number of bits required to represent the signal since the portion of the signal in the overlap interval must be transformed and transmitted twice. For these types of transforms, it is desirable to design the window with an overlap interval as small as possible.
(c) The synthesized output from the inverse transform may also need to be windowed. Some transforms, including one used in the current invention, require it. Further, quantizing errors may cause the inverse transform to produce a time-domain signal which does not go to zero at the edges of the finite time interval. Left alone, these errors may distort the recovered time-domain signal most strongly within the window overlap interval. A synthesis window can be used to shape each synthesized signal block at its edges. In this case, the signal will be subjected to an analysis and a synthesis window, i.e., the signal will be weighted by the product of the two windows. Therefore, both windows must be designed such that the product of the two will sum to unity across the overlap. See the discussion in the previous paragraph. Short transform sample blocks impose greater compensation requirements on the analysis and synthesis windows. As the transform sample blocks become shorter there is more sidelobe leakage through the filter's transition band and stopband. A well shaped analysis window reduces this leakage.
Sidelobe leakage is undesirable because it causes the transform to create spectral coefficients which misrepresent the frequency of signal components outside the filter's passband. This misrepresentation is a distortion called aliasing.
ALIASING CANCELLATION
The Nyquist theorem holds that a signal may be accurately recovered from discrete samples when the interval between samples is no larger than one-half the period of the signal's highest frequency component. When the sampling rate is below this Nyquist rate, higher-frequency components are misrepresented as lower-frequency components. The lower-frequency component is an "alias" for the true component.
Subband filters and finite digital transforms are not perfect passband filters. The transition between the passband and stopband is not infinitely sharp, and the attenuation of signals in the stopband is not infinitely great. As a result, even if a passband-filtered input signal is sampled at the Nyquist rate suggested by the passband cut-off frequency, frequencies in the transition band above the cutoff frequency will not be faithfully represented.
It is possible to design the analysis and synthesis filters such that aliasing distortion is automatically cancelled by the inverse transform. Quadrature Mirror Filters in the time domain possess this characteristic. Some transform coder techniques, including one used in the present invention, also cancel alias distortion.
Suppressing the audible consequences of aliasing distortion in transform coders becomes more difficult as the sample block length is made shorter. As explained above, shorter sample blocks degrade filter performance: the passband bandwidth increases, the passband-stopband transition becomes less sharp, and the stopband rejection deteriorates. As a result, aliasing becomes more pronounced. If the alias components are coded and decoded with insufficient accuracy, these coding errors prevent the inverse transform from completely cancelling aliasing distortion. The residual aliasing distortion will be audible unless the distortion is psychoacoustically masked. With short sample blocks, however, some transform frequency bins may have a wider passband than the auditory critical bands, particularly at low frequencies where the ear's critical bands have the greatest resolution. Consequently, alias distortion may not be masked. One way to minimize the distortion is to increase quantization accuracy in the problem subbands, but that increases the required bit rate.
BIT-RATE REDUCTION TECHNIQUES
The two factors listed above (Nyquist sample rate and quantizing errors) should dictate the bit-rate requirements for a specified quality of signal transmission or storage. Techniques may be employed, however, to reduce the bit rate required for a given signal quality. These techniques exploit a signal's redundancy and irrelevancy. A signal component is redundant if it can be predicted or otherwise provided by the receiver. A signal component is irrelevant if it is not needed to achieve a specified quality of representation. Several techniques used in the art include:
(1) Prediction: a periodic or predictable characteristic of a signal permits a receiver to anticipate some component based upon current or previous signal characteristics.
(2) Entropy coding: components with a high probability of occurrence may be represented by abbreviated codes. Both the transmitter and receiver must have the same code book. Entropy coding and prediction have the disadvantages that they increase computational complexity and processing delay. Also, they inherently provide a variable rate output, thus requiring buffering if used in a constant bit-rate system.
(3) Nonuniform coding: representations by logarithms or nonuniform quantizing steps allow coding of large signal values with fewer bits at the expense of greater quantizing errors.
(4) Floating point: floating-point representation may reduce bit requirements at the expense of lost precision. Block-floating-point representation uses one scale factor or exponent for a block of floating-point mantissas, and is commonly used in coding time-domain signals. Floating point is a special case of nonuniform coding.
(5) Bit allocation: the receiver's demand for accuracy may vary with time, signal content, strength, or frequency. For example, lower frequency components of speech are usually more important for comprehension and speaker recognition, and therefore should be transmitted with greater accuracy than higher frequency components. Different criteria apply with respect to music signals. Some general bit-allocation criteria are:
(a) Component variance: more bits are allocated to transform coefficients with the greatest level of AC power.
(b) Component value: more bits are allocated to transform coefficients which represent frequency bands with the greatest amplitude or energy.
(c) Psychoacoustic masking: fewer bits are allocated to signal components whose quantizing errors are masked (rendered inaudible) by other signal components. This method is unique to those applications where audible signals are intended for human perception. Masking is understood best with respect to single-tone signals rather than multiple-tone signals and complex waveforms such as music signals.
DISCLOSURE OF INVENTION
It is an object of this invention to provide for the digital processing of wideband audio information, particularly music, using an encode/decode apparatus and method having a signal propagation delay short enough as to be usable for real-time aural feedback to a human operator.
It is a further object of this invention to provide such an encode/decode apparatus and method suitable for the high-quality transmission or storage and reproduction of music, wherein the quality of reproduction is suitable, for example, for broadcast audio links.
It is a further object of the invention to provide a quality of reproduction subjectively as good as that obtainable from Compact Discs.
It is yet a further object of the invention to provide such an encode/decode apparatus and method embodied in a digital processing system having a low bit rate.
It is a further object of the invention to provide such an encode/decode apparatus and method embodied in a digital processing system having a high degree of immunity against signal corruption by transmission paths.
It is yet a further object of the invention to provide such an encode/decode apparatus and method embodied in a digital processing system requiring a small amount of space to store the encoded signal.
Yet a further object of the invention is to provide an encode/decode apparatus and method embodied in a digital processing system employing transform coding having short transform blocks to achieve a short signal propagation delay but which provides the high quality reproduction of music while employing a low bit rate.
Yet another object of this invention is to compensate for the negative effects on transform coder performance resulting from the use of short transform blocks.
Another object of the invention is to provide improved psychoacoustic-masking techniques in a transform coder processing music signals.
It is still another object of the invention to provide techniques for psychoacoustically compensating for otherwise audible distortion artifacts in a transform coder.
Further details of the above objects and still other objects of the invention are set forth throughout this document, particularly in the section describing the Modes for Carrying Out the Invention, below.
In accordance with the teachings of the present invention, an encoder provides for the digital encoding of wideband audio information, the encoder having a short signal propagation delay. The wideband audio signals are sampled and quantized into time-domain sample blocks, the sample blocks having a time period resulting in a signal propagation delay short enough so that an encode/decode system employing the encoder is usable for real-time aural feedback to a human operator. Each sample block is then modulated by an analysis window. Frequency-domain spectral components are then generated in response to the analysis-window weighted time-domain sample block. A transform coder having adaptive bit allocation nonuniformly quantizes each transform coefficient, and those coefficients are assembled into a digital output having a format suitable for storage or transmission. Error correction codes may be used in applications where the transmitted signal is subject to noise or other corrupting effects of the communication path.
Also in accordance with the teachings of the present invention, a decoder provides for the high-quality reproduction of digitally encoded wideband audio signals encoded by the encoder of the invention. The decoder receives the digital output of the encoder via a storage device or transmission path. It derives the nonuniformly coded spectral components from the formatted digital signal and reconstructs the frequency-domain spectral components therefrom. Time-domain signal sample blocks are generated in response to frequency-domain spectral components by means having characteristics inverse to those of the means in the encoder which generated the frequency-domain spectral components. The sample blocks are modulated by a synthesis window. The synthesis window has characteristics such that the product of the synthesis-window response and the response of the analysis-window in the encoder produces a composite response which sums to unity for two adjacent overlapped sample blocks. Adjacent sample blocks are overlapped and added to cancel the weighting effects of the analysis and synthesis windows and recover a digitized representation of the time-domain signal which is then converted to a high-quality analog output.
Further in accordance with the teachings of the present invention, an encoder/decoder system provides for the digital encoding and high-quality reproduction of wideband audio information, the system having a short signal propagation delay. In the encoder portion of the system, the analog wideband audio signals are sampled and quantized into time-domain sample blocks, the sample blocks having a time period resulting in a signal propagation delay short enough so that an encode/decode system employing the encoder is usable for real-time aural feedback to a human operator. Each sample block is then modulated by an analysis window. Frequency-domain spectral components are then generated in response to the analysis-window weighted time-domain sample block. Nonuniform spectral coding, including adaptive bit allocation, quantizes each spectral component, and those components are assembled into a digital format suitable for storage or transmission over communication paths susceptible to signal corrupting noise. The decoder portion of the system receives the digital output of the encoder via a storage device or transmission path. It derives the nonuniformly coded spectral components from the formatted digital signal and reconstructs the frequency-domain spectral components therefrom. Time-domain signal sample blocks are generated in response to frequency-domain transform coefficients by means having characteristics inverse to those of the means in the encoder which generated the frequency-domain transform coefficients. The sample blocks are modulated by a synthesis window. The synthesis window has characteristics such that the product of the synthesis-window response and the response of the analysis-window in the encoder produces a composite response which sums to unity for two adjacent overlapped sample blocks. Adjacent sample blocks are overlapped and added to cancel the weighting effects of the analysis and synthesis windows and recover a digitized representation of the time-domain signal which is then converted to a high-quality analog output.
In an embodiment of the encoder of the present invention, a discrete transform generates frequency-domain spectral components in response to the analysis-window weighted time-domain sample blocks. Preferably, the discrete transform has a function equivalent to the alternate application of a modified Discrete Cosine Transform (DCT) and a modified Discrete Sine Transform (DST). In an alternative embodiment, the discrete transform is implemented by a Discrete Fourier Transform (DFT), however, virtually any time-domain to frequency-domain transform can be used.
In a preferred embodiment of the invention for a two-channel encoder, a single FFT is utilized to simultaneously calculate the forward transform for one signal sample block from each channel. In a preferred embodiment of the invention for a two-channel decoder, a single FFT is utilized to simultaneously calculate the inverse transform for two transform blocks, one from each of the two channels.
In the preferred embodiments of the encoder and decoder, the sampling rate is 44.1 kHz. While the sampling rate is not critical, 44.1 kHz is a suitable sampling rate and it is convenient because it is also the sampling rate used for Compact Discs. An alternative embodiment employs a 48 kHz sampling rate. In the preferred embodiment employing the 44.1 kHz sampling rate, the nominal frequency response extends to 15 kHz and the time-domain sample blocks have a length of 128 samples to provide an acceptably low signal-propagation delay so that the system is usable for providing real-time aural feedback to a human operator (such as for broadcast audio). When a person's own voice is returned to his ears after a delay, speech disturbances are created unless the delay is kept very short. See for example "Effects of Delayed Speech Feedback" by Bernard S. Lee, Journal of the Acoustical Soc. of America, vol. 22, no. 6, November 1950, pp. 824-826. The overall encode/decode system is assumed to have a delay of about three times the sample block period or about 10 milliseconds (msec) or less which is sufficiently short to overcome speech disturbance problems. In the preferred embodiment, the serial bit rate of the encoder output is in the order of 192 kBits per second (including overhead information such as error correction codes). Other bit rates yielding varying levels of signal quality may be used without departing from the basic spirit of the invention.
In a preferred embodiment of the encoder, the nonuniform transform coder computes a variable bit-length code word for each transform coefficient, which code-word bit length is the sum of a fixed number of bits and a variable number of bits determined by adaptive bit allocation based on whether, because of current signal content, noise in the subband is less subject to psychoacoustic masking than noise in other subbands. The fixed number of bits are assigned to each subband based on empirical observations regarding psychoacoustic-masking effects of a single-tone signal in the subband under consideration. The assignment of fixed bits takes into consideration the poorer subjective performance of the system at low frequencies due to the greater selectivity of the ear at low frequencies. Although masking performance in the presence of complex signals ordinarily is better than in the presence of single tone signals, masking effects in the presence of complex signals are not as well understood nor are they as predictable. The system is not aggressive in the sense that most of the bits are fixed bits and a relatively few bits are adaptively assigned. This approach has several advantages. First, the fixed bit assignment inherently compensates for the undesired distortion products generated by the inverse transform because the empirical procedure which established the required fixed bit assignments included the inverse transform process. Second, the adaptive bit-allocation algorithm can be kept relatively simple. In addition, adaptively-assigned bits are more sensitive to signal transmission errors occurring between the encoder and decoder since such errors can result in incorrect assignment as well as incorrect values for these bits in the decoder.
The empirical technique for allocating bits in accordance with the invention may be better understood by reference to FIG. 13 which shows critical band spectra of the output noise and distortion (e.g., the noise and distortion shown is with respect to auditory critical bands) resulting from a 500 Hz tone (sine wave) for three different bit allocations compared to auditory masking. The Figure is intended to demonstrate an empirical approach rather than any particular data.
Allocation A (the solid line) is a reference, showing the noise and distortion products produced by the 500 Hz sine wave when an arbitrary number of bits are allocated to each of the transform coefficients. Allocation B (the short dashed line) shows the noise and distortion products for the same relative bit allocation as allocation A but with 2 fewer bits per transform coefficient. Allocation C (the long dashed line) is the same as allocation A for frequencies in the lower part of the audio band up to about 1500 Hz. Allocation C is then the same as allocation B for frequencies in the upper part of the audio band above about 1500 Hz. The dotted line shows the auditory masking curve for a 500 Hz tone.
It will be observed that audible noise is present at frequencies below the 500 Hz tone for all three cases of bit allocation due to the rapid fall off of the masking curve: the noise and distortion product curves are above the masking threshold from about 100 Hz to 300 or 400 Hz. The removal of two bits (allocation A to allocation B) exacerbates the audible noise and distortion; adding back the two bits over a portion of the spectrum including the region below the tone, as shown in allocation C, restores the original audible noise and distortion levels. Audible noise is also present at high frequencies, but does not change as substantially when bits are removed and added because at that extreme portion of the audio spectrum the noise and distortion products created by the 500 Hz tone are relatively low.
By observing the noise and distortion created in response to tones at various frequencies for various bit allocations, bit lengths for the various transform coefficients can be allocated that result in acceptable levels of noise and distortion with respect to auditory masking throughout the audio spectrum. With respect to the example in FIG. 13, in order to lower the level of the noise and distortion products below the masking threshold in the region from about 100 Hz to 300 or 400 Hz, additional bits could be added to the reference allocation for the transform coefficient containing the 500 Hz tone and nearby coefficients until the noise and distortion dropped below the masking threshold. Similar steps would be taken for other tones throughout the audio spectrum until the overall transform-coefficient bit-length allocation resulted in acceptably low audible noise in the presence of tones, taken one at a time, throughout the audio spectrum. This is most easily done by way of computer simulations. The fixed bit allocation assignment is then taken as somewhat less by removing one or more bits from each transform coefficient across the spectrum (such as allocation B). Adaptively allocated bits are added to reduce the audible noise to acceptable levels in the problem regions as required (such as allocation C). Thus, empirical observations regarding the increase and decrease of audible noise with respect to bit allocation such as in the example of FIG. 13 form the basis of the fixed and adaptive bit allocation scheme of the present invention.
In a preferred embodiment of the encoder, the nonuniformly quantized transform coefficients are expressed by a block-floating-point representation comprised of block exponents and variable-length code words. As described above, the variable-length code words are further comprised of a fixed bit-length portion and a variable length portion of adaptively assigned bits. For each signal sample block, the encoded signal is assembled into frames composed of exponents and the fixed-length portion of the code words followed by a string of all adaptively allocated bits. The exponents and fixed-length portion of code words are assembled separately from adaptively allocated bits to reduce vulnerability to noise burst errors.
Unlike many coders in the prior art, an encoder conforming to this invention need not transmit side information regarding the assignment of adaptively allocated bits in each frame. The decoder can deduce the correct assignment by applying the same allocation algorithm to the exponents as that used by the encoder.
In applications where frame synchronization is required, the encoder portion of the invention appends the formatted data to frame synchronization bits. The formatted data bits are first randomized to reduce the probability of long sequences of bits with values of all ones or zeroes. This is necessary in many environments such as T-1 carrier which will not tolerate such sequences beyond specified lengths. In asynchronous applications, randomization also reduces the probability that valid data within the frame will be mistaken for the block synchronization sequence. In the decoder portion of the invention, the formatted data bits are recovered by removing the frame synchronization bits and applying an inverse randomization process.
In applications where the encoded signal is subject to corruption, error correction codes are utilized to protect the most critical information, that is, the exponents and fixed portions of the lowest-frequency coefficient code words. Error codes and the protected data are scattered throughout the formatted frame to reduce sensitivity to noise burst errors, i.e., to increase the length of a noise burst required before critical data cannot be corrected.
The various features of the invention and its preferred embodiments are set forth in greater detail in a following section describing the Modes for Carrying Out the Invention and in the accompanying drawings.
BRIEF DESCRIPTION OF DRAWINGS
FIGS. 1a and 1b are functional block diagrams illustrating the basic structure of the invention, particularly for the TDAC transform version of the invention.
FIGS. 2a through 2e are block diagrams showing the hardware architecture for one embodiment of the invention, particularly for the TDAC transform version of the invention.
FIGS. 3a and 3b are block diagrams showing in greater detail the serial-communications interface of the processor for a two-channel embodiment of the invention.
FIG. 4 is a hypothetical graphical representation showing a time-domain signal sample block.
FIG. 5 is a further hypothetical graphical representation of a time-domain signal sample block showing discontinuities at the edges of the sample block caused by a discrete transform assuming the signal within the block is periodic.
FIG. 6a is a functional block diagram showing the modulation of a function X(t) by a function W(t) to provide the resulting function Y(t).
FIGS. 6b through 6d are further hypothetical graphical representations showing the modulation of a time-domain signal sample block by an analysis window.
FIG. 7 is a flow chart showing the high level logic for the nonuniform quantizer utilized in the invention.
FIG. 8 is a flow chart showing more detailed logic for the adaptive bit allocation process utilized in the invention.
FIG. 9 is a graphical representation showing a representative TDAC coder filter characteristic response curve and two psychoacoustic masking curves.
FIG. 10 is a graphical representation showing a TDAC coder filter characteristic response with respect to a 4 kHz psychoacoustic masking curve.
FIG. 11 is a graphical representation showing a TDAC coder filter characteristic response with respect to a 1 kHz psychoacoustic masking curve.
FIG. 12 is a graphical representation illustrating a composite masking curve derived from the psychoacoustic masking curves of several tones.
FIG. 13 is a graphical representation showing the spectral levels of coding noise and distortion of an encoded 500 Hz tone for three different bit allocation schemes with respect to the psychoacoustic masking curve for a 500 Hz tone.
FIGS. 14a through 14e are hypothetical graphical representations illustrating a time-domain signal grouped into a series of overlapped and windowed time-domain signal sample blocks.
FIGS. 15a through 15d are hypothetical graphical representations illustrating the time-domain aliasing distortion created by the TDAC transform.
FIGS. 16a through 16g are hypothetical graphical representations illustrating the cancellation of time-domain aliasing by overlap-add during TDAC transform signal synthesis.
FIG. 17 is a graphical representation comparing filter transition band rolloff and stopband rejection of a filter bank using an analysis-only window with that of a filter bank using the analysis window of an analysis-synthesis window pair designed for the preferred TDAC transform embodiment of the invention.
FIG. 18 is a hypothetical graphical representation showing the overlap-add property of adjacent windowed blocks.
FIG. 19 is a hypothetical graphical representation comparing the shape of several convolved Kaiser-Bessel analysis windows for a range of alpha values 4 to 7 with a sine-tapered window.
FIG. 20 is a schematic representation illustrating the format of a frame of two encoded transform blocks without error correction, particularly for the TDAC transform version of the invention.
FIG. 21 is a schematic representation illustrating the format of a frame of two encoded transform blocks with error correction codes, particularly for the TDAC transform version of the invention.
FIGS. 22a and 22b are functional block diagrams illustrating the basic structure of the invention, particularly for the DFT version of the invention.
FIG. 23 is a graphical representation comparing the shapes of two coder analysis windows for the TDAC transform and DFT coders.
FIG. 24 is a graphical representation comparing the characteristic filter response of a TDAC transform coder using windows with 100% overlap to the response of a DFT coder using windows with 25% overlap.
FIG. 25 is a schematic representation illustrating the format of a frame of two encoded transform blocks without error correction, particularly for the DFT version of the invention.
FIG. 26 is a schematic representation illustrating the format of a frame of two encoded transform blocks with error correction codes, particularly for the DFT version of the invention.
Table I shows master exponents, subband grouping, and coefficient bit lengths for the TDAC transform coder.
Table II shows subband grouping and coefficient bit lengths for the DFT coder.