Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent
6229854
Kikuchi , ; et al.
May 8, 2001
Title
Video coding/decoding apparatus
Abstract
A video coding/decoding apparatus comprises a prediction circuit that divides an input video signal into large regions and small regions in a hierarchical fashion and produces a prediction signal by performing prediction region by region, a subtracter for generating a prediction error signal for a prediction signal at the lowest level, a DCT circuit for coding a prediction error signal, a quantization circuit and a variable-length encoder, a variable-length encoder for coding the prediction mode and motion vector information obtained at each level from the prediction circuit, and a multiplexer for multiplexing the code strings obtained from the variable-length encoder and dividing them into the upper-layer and lower-layer code strings to output the code strings obtained at the variable-length encoder particularly as upper-layer code strings.
Inventors:
Kikuchi; Yoshihiro
(Yokohama,
JP
)
, Watanabe; Toshiaki
(Yokohama,
JP
)
, Dachiku; Kenshi
(Kawasaki,
JP
)
, Ida; Takashi
(Kawasaki,
JP
)
, Yamaguchi; Noboru
(Yashio,
JP
)
, Chujoh; Takeshi
(Tokyo,
JP
)
Assignee:
Kabushiki Kaisha Toshiba
(Kawasaki,
JP
)
Appl. No.:
636509
Filed:
August 10, 2000
Foreign Application Priority Data
Mar 10, 1995 [JP] 7-050993
May 31, 1995 [JP] 7-134406
Sep 29, 1995 [JP] 7-277180
Oct 27, 1995 [JP] 7-280443
Current U.S. Class:
375/240.27
375/240.13
375/240.16
Field of Search:
375/240.01,240.23,240.26,240.27,240.28,240.22,240.12,240.13,240.11,240.16 346/401.1,402.1,407.1,408.1,412.1,413.1,414.1,415.1,416.1,417.1,418.1,422.1 382/238-239
U.S. Patent Documents
4821119
April 1989
Gharavi
5068724
November 1991
Krause et al.
5231384
July 1993
Kuriacose
5302949
April 1994
Yoshinari et al.
5351095
September 1994
Kerdranvat
5386234
January 1995
Veltman et al.
5392037
February 1995
Kato
5475435
December 1995
Yonemitsu et al.
5488616
January 1996
Takishima et al.
5534927
July 1996
Shishikui et al.
5563593
October 1996
Puri
5608458
March 1997
Chen et al.
5625356
April 1997
Lee et al.
5680174
October 1997
Sugiyama
5715005
February 1998
Masaki
5731840
March 1998
Kikuchi et al.
5852469
December 1998
Nagai et al.
Foreign Patent Documents
0 485 230
May., 1992
EP
0 490 544
Jun., 1992
EP
0 556 507
Aug., 1993
EP
0 619 552
Oct., 1994
EP
0 691 789
Jan., 1996
EP
4-3684
Jan., 1992
JP
5-122685
May., 1993
JP
5-217300
Aug., 1993
JP
5-219489
Aug., 1993
JP
WO 92/14339
Aug., 1992
WO
WO 93/20653
Oct., 1993
WO
Primary Examiner:
Le; Vu
Attorney, Agent or Firm:
Oblon, Spivak, McClelland, Maier & Neustadt, P.C.
Parent Case Text
This application is a division of Ser. No. 09/223,780, filed Dec. 31, 1998 now U.S. Pat. No. 6,128,342, which is a division of Ser. No. 08/916,006 filed Aug. 21, 1997 now U.S. Pat. No. 5,912,706 which is a continuation of Ser. No. 08/613,175 filed Mar. 8, 1996 now U.S. Pat. No. 5,731,840.
Claims
What is claimed is:
1. A video coding apparatus, comprising:
prediction signal generating means for selecting one of an intraframe predictive coding method and an interframe predictive coding method for coding an input video signal divided into a plurality of regions, to selectively generate an intraframe prediction signal with the intraframe predictive coding method and a set of an interframe prediction signal and a motion vector with the interframe predictive coding method;
first coding means for coding coding mode information for each of the regions, the mode information indicating one of the intraframe and interframe predictive coding methods, to obtain plural coded coding mode information, and for coding the motion vector obtained for each of the regions by said prediction signal generating means in the interframe predictive coding method to obtain plural coded motion vectors;
prediction error generating means for generating a prediction error signal for each of the regions based on one of the intraframe prediction signal and the interframe prediction signal generated by the prediction signal generating means and the input video signal;
second coding means for coding the prediction error signal for each of the regions to obtain plural coded prediction error information; and
code string generating means for outputting a code string divided into at least first and second layers, the first layer including a synchronization code, the mode information for each of the plurality of regions and the motion vector for each of the plurality of regions, and the second layer including the predictive error information for the plurality of regions.
2. A video coding apparatus, comprising:
prediction signal generating means for selecting one of an intraframe predictive coding method and an interframe predictive coding method for coding an input video signal divided into a plurality of regions, to selectively generate an intraframe prediction signal with the intraframe predictive coding method and a set of an interframe prediction signal and a motion vector with the interframe predictive coding method;
first coding means for coding coding mode information for each of the regions, the mode information indicating one of the intraframe and interframe predictive coding methods, to obtain plural coded coding mode information;
second coding means for coding the motion vector obtained for each of the regions by said prediction signal generating means in the interframe predictive coding method to obtain plural coded motion vectors;
prediction error generating means for generating a prediction error signal for each of the regions based on one of the intraframe prediction signal and the interframe prediction signal generated by the prediction signal generating means and the input video signal;
third coding means for coding the prediction error signal for each of the regions to obtain plural coded prediction error signals; and
code string generating means for outputting a code string divided into at least first and second layers, the first layer including a synchronization code and the mode information for each of the plurality of regions, and the second layer including the predictive error information for the plurality of regions.
3. A video coding apparatus, comprising:
prediction signal generating means for selecting one of an intraframe predictive coding method and an interframe predictive coding method for coding an input video signal divided into a plurality of regions, to selectively generate an intraframe prediction signal with the intraframe predictive coding method and an interframe prediction signal with the interframe predictive coding method;
first coding means for coding coding mode information for each of the regions, the mode information indicating one of the intraframe and interframe predictive coding methods, to obtain plural coded coding mode information;
prediction error generating means for generating a prediction error signal for each of the regions based on one of the intraframe prediction signal and the interframe prediction signal generated by the prediction signal generating means and the input video signal;
second coding means for coding the prediction error signal for each of the regions to obtain plural coded prediction error signals; and
code string generating means for outputting a code string divided into at least first and second layers, the first layer including a synchronization code and the mode information for each of the plurality of regions, and the second layer including the predictive error information for the plurality of regions.
4. A video coding apparatus, comprising:
prediction signal generating means for selecting one of an intraframe predictive coding method and an interframe predictive coding method for coding an input video signal divided into a plurality of regions, to selectively generate an intraframe prediction signal with the intraframe predictive coding method and a set of an interframe prediction signal and a motion vector with the interframe predictive coding method;
first coding means for coding coding mode information for each of the regions, the mode information indicating one of the intraframe and interframe predictive coding methods, to obtain plural coded coding mode information;
second coding means for coding the motion vector obtained for each of the regions by said prediction signal generating means in the interframe predictive coding method to obtain plural coded motion vectors;
prediction error generating means for generating a prediction error signal for each of the regions based on one of the intraframe prediction signal and the interframe prediction signal generated by the prediction signal generating means and the input video signal;
third coding means for coding the prediction error signal for each of the regions to obtain plural coded prediction error signals; and
code string generating means for outputting a code string divided into at least first and second layers, the first layer including a synchronization code, the mode information for each of the plurality of regions and the motion vector for each of the plurality of regions, and the second layer including the predictive error information for the plurality of regions.
5. A video coding apparatus, comprising
prediction signal generating means for selecting one of an intraframe predictive coding method and an interframe predictive coding method for coding an input video signal divided into a plurality of regions, to selectively generate an intraframe prediction signal with the intraframe predictive coding method and a set of an interframe prediction signal and a motion vector with the interframe predictive coding method;
first coding means for coding coding mode information for each of the regions, the mode information indicating one of the intraframe and interframe predictive coding methods, to obtain plural coded coding mode information, and for coding the motion vector obtained for each of the regions by said prediction signal generating means in the interframe predictive coding method to obtain plural coded motion vectors;
prediction error generating means for generating a prediction error signal for each of the regions based on one of the intraframe prediction signal and the interframe prediction signal generated by the prediction signal generating means and the input video signal;
second coding means for coding the prediction error signal for each of the regions to obtain plural coded prediction error information; and
code string generating means for outputting a code string divided into at least first and second layers, the first layer including a synchronization code and the motion vector for each of the plurality of regions, and the second layer including the predictive error information for the plurality of regions.
6. A video coding apparatus, comprising:
a prediction signal generating section configured to select one of an intraframe predictive coding method and an interframe predictive coding method for coding an input video signal divided into a plurality of regions, and to selectively generate an intraframe prediction signal with the intraframe predictive coding method and a set of an interframe prediction signal and a motion vector with the interframe predictive coding method;
a first coding section configured to code coding mode information for each of the regions, the coding mode information indicating one of the intraframe and interframe predictive coding methods, for obtaining plural coded coding mode information, and to code the motion vector obtained for each of the regions by said prediction signal generating section in the interframe predictive coding method for obtaining plural coded motion vectors;
a prediction error generating section configured to generate a prediction error signal for each of the regions based on one of the intraframe prediction signal and the interframe prediction signal generated by the prediction signal generating section and the input video signal;
a second coding section configured to code the prediction error signal for each of the regions for obtaining plural coded prediction error information; and
a code string generating section configured to output a code string divided into at least first and second layers, the first layer including a synchronization code, the mode information for each of the plurality of regions and the motion vector for each of the plurality of regions, and the second layer including the predictive error information for the plurality of regions.
7. A video coding apparatus, comprising:
a prediction signal generating section configured to select one of an intraframe predictive coding method and an interframe predictive coding method for coding an input video signal divided into a plurality of regions, for selectively generating an intraframe prediction signal with the intraframe predictive coding method, and a set of an interframe prediction signal and a motion vector with the interframe predictive coding method;
a first coding section configured to code coding mode information for each of the regions, the coding mode information indicating one of the intraframe and interframe predictive coding methods, for obtaining plural coded coding mode information;
a second coding section configured to code the motion vector obtained for each of the regions by said prediction signal generating section in the interframe predictive coding method, for obtaining plural coded motion vectors;
a prediction error generating section configured to generate a prediction error signal for each of the regions based on one of the intraframe prediction signal and the interframe prediction signal generated by the prediction signal generating section and the input video signal;
a third coding section configured to code the prediction error signal for each of the regions for obtaining plural coded prediction error signals; and
a code string generating section configured to output a code string divided into at least first and second layers, the first layer including a synchronization code and the mode information for each of the plurality of regions, and the second layer including the predictive error information for the plurality of regions.
8. A video coding apparatus, comprising:
a prediction signal generating section configured to select one of an intraframe predictive coding method and an interframe predictive coding method for coding an input video signal divided into a plurality of regions, for selectively generating an intraframe prediction signal with the intraframe predictive coding method and an interframe prediction signal with the interframe predictive coding method;
a first coding section configured to code coding mode information for each of the regions, the coding mode information indicating one of the intraframe and interframe predictive coding methods, for obtaining plural coded coding mode information;
a prediction error generating section configured to generate a prediction error signal for each of the regions based on one of the intraframe prediction signal and the interframe prediction signal generated by the prediction signal generating section and the input video signal,
a second coding section configured to code the prediction error signal for each of the regions for obtaining plural coded prediction error signals; and
a code string generating section configured to output a code string divided into at least first and second layers, the first layer including a synchronization code and the mode information for each of the plurality of regions, and the second layer including the predictive error information for the plurality of regions.
9. A video coding apparatus, comprising:
a prediction signal generating section configured to select one of an intraframe predictive coding method and an interframe predictive coding method for coding an input video signal divided into a plurality of regions, for selectively generating an intraframe prediction signal with the intraframe predictive coding method and a set of an interframe prediction signal and a motion vector with the interframe predictive coding method;
a first coding section configured to code coding mode information for each of the regions, the coding mode information indicating one of the intraframe and interframe predictive coding methods, for obtaining plural coded coding mode information;
a second coding section configured to code the motion vector obtained for each of the regions by said prediction signal generating section in the interframe predictive coding method, for obtaining plural coded motion vectors;
a prediction error generating section configured to generate a prediction error signal for each of the regions based on one of the intraframe prediction signal and the interframe prediction signal generated by the prediction signal generating section and the input video signal;
a third coding section configured to code the prediction error signal for each of the regions to obtain plural coded prediction error signals; and
a code string generating section configured to output a code string divided into at least first and second layers, the first layer including a synchronization code, the mode information for each of the plurality of regions and the motion vector for each of the plurality of regions, and the second layer including the predictive error information for the plurality of regions.
10. A video coding apparatus, comprising:
a prediction signal generating section configured to select one of an intraframe predictive coding method and an interframe predictive coding method for coding an input video signal divided into a plurality of regions, and to selectively generate an intraframe prediction signal with the intraframe predictive coding method and a set of an interframe prediction signal and a motion vector with the interframe predictive coding method;
a first coding section configured to code mode information for each of the regions, the mode information indicating one of the intraframe and interframe predictive coding methods, for obtaining plural coded coding mode information, and to code the motion vector obtained for each of the regions by said prediction signal generating section in the interframe predictive coding method for obtaining plural coded motion vectors;
a prediction error generating section configured to generate a prediction error signal for each of the regions based on one of the intraframe prediction signal and the interframe prediction signal generated by the prediction signal generating section and the input video signal;
a second coding section configured to code the prediction error signal for each of the regions for obtaining plural coded prediction error information; and
a code string generating section configured to output a code string divided into at least first and second layers, the first layer including a synchronization code and the motion vector for each of the plurality of regions, and the second layer including the predictive error information for the plurality of regions.
11. A video coding method, comprising:
selecting one of an intraframe predictive coding method and an interframe predictive coding method for coding an input video signal divided into a plurality of regions and selectively generating an intraframe prediction signal with the intraframe predictive coding method and a set of an interframe prediction signal and a motion vector with the interframe predictive coding method;
coding coding mode information for each of the regions, the mode information indicating one of the intraframe and interframe predictive coding methods, to obtain plural coded coding mode information;
coding the motion vector obtained for each of the regions by said prediction signal generating step in the interframe predictive coding method to obtain plural coded motion vectors;
generating a prediction error signal for each of the regions based on one of the intraframe prediction signal and the interframe prediction signal generated by the prediction signal generating step and the input video signal;
coding the prediction error signal for each of the regions to obtain plural coded prediction error information; and
outputting a code string divided into at least first and second layers, the first layer including a synchronization code, the mode information for each of the plurality of regions and the motion vector for each of the plurality of regions, and the second layer including the predictive error information for the plurality of regions.
12. A video coding method, comprising:
selecting one of an intraframe predictive coding method and an interframe predictive coding method for coding an input video signal divided into a plurality of regions and selectively generating an intraframe prediction signal with the intraframe predictive coding method and a set of an interframe prediction signal and a motion vector with the interframe predictive coding method;
coding coding mode information for each of the regions, the mode information indicating one of the intraframe and interframe predictive coding methods, to obtain plural coded coding mode information;
coding the motion vector obtained for each of the regions by said prediction signal generating step in the interframe predictive coding method to obtain plural coded motion vectors;
generating a prediction error signal for each of the regions based on one of the intraframe prediction signal and the interframe prediction signal generated by the prediction signal generating step and the input video signal;
coding the prediction error signal for each of the regions to obtain plural coded prediction error signals; and
outputting a code string divided into at least first and second layers, the first layer including a synchronization code and the mode information for each of the plurality of regions, and the second layer including the predictive error information for the plurality of regions.
13. A video coding method, comprising:
selecting one of an intraframe predictive coding method and an interframe predictive coding method for coding an input video signal divided into a plurality of regions and selectively generating an intraframe prediction signal with the intraframe predictive coding method and an interframe prediction signal with the interframe predictive coding method;
coding coding mode information for each of the regions, the mode information indicating one of the intraframe and interframe predictive coding methods, to obtain plural coded coding made information;
generating a prediction error signal for each of the regions based on one of the intraframe prediction signal and the interframe prediction signal generated by the prediction signal generating step and the input video signal;
coding the prediction error signal for each of the regions to obtain plural coded prediction error signals; and
outputting a code string divided into at least first and second layers, the first layer including a synchronization code and the mode information for each of the plurality of regions, and the second layer including the predictive error information for the plurality of regions.
14. A video coding method, comprising:
selecting one of an intraframe predictive coding method and an interframe predictive coding method for coding an input video signal divided into a plurality of regions and selectively generating an intraframe prediction signal with the intraframe predictive coding method and a set of an interframe prediction signal and a motion vector with the interframe predictive coding method;
coding coding mode information for each of the regions, the mode information indicating one of the intraframe and interframe predictive coding methods, to obtain plural coded coding mode information;
coding the motion vector obtained for each of the regions by said prediction signal generating step in the interframe predictive coding method to obtain plural coded motion vectors;
generating a prediction error signal for each of the regions based on one of the intraframe prediction signal and the interframe prediction signal generated by the prediction signal generating step and the input video signal;
coding the prediction error signal for each of the regions to obtain plural coded prediction error signals; and
outputting a code string divided into at least first and second layers, the first layer including a synchronization code, the mode information for each of the plurality of regions and the motion vector for each of the plurality of regions, and the second layer including the predictive error information for the plurality of regions.
15. A video coding method, comprising:
selecting one of an intraframe predictive coding method and an interframe predictive coding method for coding an input video signal divided into a plurality of regions, to selectively generate an intraframe prediction signal with the intraframe predictive coding method and a set of an interframe prediction signal and a motion vector with the interframe predictive coding method;
coding coding mode information for each of the regions, the mode information indicating one of the intraframe and interframe predictive coding methods, to obtain plural coded coding mode information;
coding the motion vector obtained for each of the regions by said prediction signal generating step in the interframe predictive coding method to obtain plural coded motion vectors;
generating a prediction error signal for each of the regions based on one of the intraframe prediction signal and the interframe prediction signal generated by the prediction signal generating step and the input video signal;
coding the prediction error signal for each of the regions to obtain plural coded prediction error information; and
outputting a code string divided into at least first and second layers, the first layer including a synchronization code and the motion vector for each of the plurality of regions, and the second layer including the predictive error information for the plurality of regions.
Description
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to a video coding apparatus wherein a video signal is compression-coded at high efficiency and a video decoding apparatus for decoding the compression-coded signal to reconstruct the original video signal, and more particularly to a video coding/decoding apparatus that is immune to errors in the transmission channel/storage medium and assures a high-quality transmission channel/storage.
2. Description of the Related Art
In a system for transmitting and storing a video signal, such as a videophone, a teleconference system, a personal digital assistant, a digital video disk system, or a digital TV broadcasting system, a video signal is compression-coded into code strings with a small amount of information, which are transmitted to a transmission channel and stored in a storage medium. The transmitted and stored code strings are decoded to reconstruct the original video signal.
For video-signal compression-coding techniques applied to such a system, various methods have been developed, including motion compensation, discrete cosine transform (DCT), and sub-band coding, pyramid coding techniques, and a combination of these techniques. Furthermore, ISO, MPEG1, MPEG2, ITU-T H.261, and ITU-T H.262 have been determined as international standard systems for compression-coding a video signal. Any of these coding techniques uses motion compensation adaptive predictive cosine transform coding, which has been described in detail in, for example, reference 1: Hiroshi Yasuda, "International Standard for Multimedia Coding," Maruzen, June. 1991.
When the code strings obtained by coding a video signal as described above are transmitted and stored via a radio transmission channel that is prone to errors, the picture signal reconstructed on the decoding side may be degraded due to errors in transmission and storage. One known measure to deal with such errors is the multi-layered coding system which, under the conditions where code strings can be transmitted via a plurality of transmission channels each having a different error probability, divides the code strings into several layers and transmits the upper-layer code strings via transmission channels with lower error probabilities to reduce the degradation of picture quality due to errors. One proposed layer division method is such that the mode information, the motion compensation information, the low-frequency components of the prediction error signal are allocated to the upper layers and the high-frequency components of the prediction error signal are allocated to the lower layer.
In a conventional layered video coding apparatus, a prediction circuit detects a motion vector between the input video signal and the reference video signal obtained by coding and local decoding and stored in the frame memory, performs the motion compensation prediction of a specific unit region (referred to as a prediction region) on the basis of the motion vector, and produces a motion compensation prediction signal. By subtracting the prediction signal from the input video signal, a prediction error signal is produced. The prediction error signal undergoes discrete cosine transform in blocks of a specific size at a DCT circuit and is converted into DCT coefficient information. The DCT coefficient information is quantized at a quantizer. The quantized DCT coefficient information is branched into two pieces of information; one piece of DCT coefficient information undergoes variable-length coding at a first variable-length coding circuit and the other piece of DCT coefficient information undergoes inverse quantization. The inverse quantized information is subjected to inverse discrete cosine transform. The inverse DCT information is added to the prediction signal to produce a local decoded signal. The local decoded signal is stored in the frame memory as a reference video signal.
The prediction mode and motion vector information related to prediction outputted from the prediction circuit is subjected to variable-length coding at a second variable length coding circuit. The code strings outputted from each of the first and second variable-length coding circuits are multiplexed at a multiplexer, divided into upper-layer code strings and lower-layer code strings, and outputted to the transmission channels. Specifically, the upper-layer code strings are outputted to transmission channels having a relatively low probability that transmission errors will take place, and the lower-layer code strings are outputted to transmission channels having a relatively high probability that transmission errors will occur.
The multiplexer divides the code strings into the upper-layer code strings and the lower-layer code strings in a manner that allocates the mode information representing the prediction mode at the prediction circuit, the motion vector information (MV), and the low-frequency-band DCT coefficient information in the variable-length-coded DCT coefficient information to the upper-layer code strings and the remaining high-frequency-band DCT coefficient information in the variable-length-coded DCT coefficient information to the lower-layer code strings.
Such a conventional multi-layered video coding apparatus has the following problems. A first problem is that since each prediction region contains only one piece of motion vector information whose error would cause the picture quality to deteriorate seriously, if an error occurs in the motion vector information, the motion information cannot be decoded for the prediction region at all, leading to a serious picture-quality deterioration. To reduce such a picture-quality deterioration, all of the motion vector information (NV) should be allocated to the upper-layer code strings. In general, however, there is a limit to the ratio of the code amount of code strings in each layer to the total code amount of code strings in all of the layers. If all of the motion vector information is allocated to the upper-layer code strings, the limit may be exceeded. To avoid this, if the motion vector information is allocated to the lower-layer code strings, this causes the problem that error resilience decreases seriously.
Furthermore, since the individual code words of the two transmitted code strings are made up of the variable-length codes created at the first and second variable-length coding circuits, the variable-length codes may be out of synchronization due to errors in decoding. With the conventional video coding apparatus, however, multiplexing is effected in such a manner that important information related to prediction including the mode information and motion vector information, whose errors would lead to a serious deterioration of the decoded picture, is mingled with DCT coefficient information including the prediction error signal, whose errors would not cause a serious deterioration. Thus, when synchronization failure has occurred during the decoding of the code words containing the unimportant information, this may introduce errors into the code words containing the important information, causing a serious deterioration of the reconstructed picture. Should this happen, synchronization cannot be recovered until a synchronization code appears. Consequently, all of the pieces of information on the decoded pictures obtained until then have become erroneous, raising the problem that a serious deterioration develops in a wide range of the picture.
Furthermore, many conventional video coding systems use the technique for calculating the difference between adjacent motion vectors and subjecting the difference to variable-length coding in order to increase the coding efficiency. Since variable length coding is used, even an error in part of a code string will cause synchronization failure in variable length coding, which will permit the error to have an adverse effect on all of the subsequent code strings, bringing about a serious deterioration of quality of reconstructed video signal. Since the difference between adjacent motion vectors is coded, if an error occurs in one motion vector, the error will affect all of the pieces of the motion vector information obtained by computing the difference between the erroneous motion vector and each of the other motion vectors and coding the difference, with the result that the quality of reconstructed video signal will degrade considerably.
Furthermore, when there is a limit to the amount of codes that can be transferred over a transmission channel with a low error rate, part of the motion vector information must be coded in a lower layer with a high error rate, bringing about a substantial deterioration of picture quality. When a picture to be coded makes a great motion, the amount of codes in the motion vector information is very large. When the coding rate is relatively low, only the motion vector information may account for more than half of the total amount of codes. This makes greater the rate of the motion vector information to be coded in a lower layer, so that the possibility that an error will get mixed in the motion vector information becomes stronger, making a serious deterioration of picture quality more liable to develop.
On the other hand, many conventional video coding systems including the international standard systems use block matching motion compensation that divides the input motion picture into square blocks (referred to as motion compensation blocks) and performing motion compensation by representing the motion of each of these blocks by a motion vector. With the block matching motion compensation, when a motion compensation block contains regions with different motions, the vector to be obtained is the average of the motions in the respective regions, so that each region cannot be predicted with high accuracy, causing the problem that the quality of reconstructed video signal may deteriorate at the boundaries or the edges of the regions. When the coding rate is low, a motion compensation block must be made larger than the size of the picture, making degradation of picture quality from block matching more serious.
To overcome the problem with the block matching motion compensation, a segmentation based motion compensation scheme has been studied which divides the motion compensation blocks along the boundary of the object and performs motion compensation using a different motion vector for each region. The segmentation based motion compensation scheme requires an additional piece of information (region shape information) to indicate how the regions have been divided. Although the motion compensation efficiency is improved more as the region shape is represented more accurately, the volume of the region shape information increases accordingly. Therefore, the point of improvements in the coding efficiency is how efficiently the region shape is represented. When the coding rate is low, the ratio of the side information including the motion vector information and the region shape information gets larger, making the problem more significant.
The scheme for coding the region shape information include a method of chain-coding the region shape information, a method of approximating the region shape using several division patterns, and a method of approximating the region shape through interpolation by expressing the shape in approximate blocks. With any method, however, it is difficult to represent the shape of a region with a high accuracy using a small amount of codes, so that segmentation based compensation coding does not necessarily improve the coding efficiency remarkably. Furthermore, a method has been studied which estimates the region shape information from the decoded picture of an already coded frame at both of the coding unit and the decoding unit and consequently requires no independent region shape information. With this method, however, the amount of processing at the decoding unit increases significantly, and the decoded reconstructed picture contains coding distortion, so that it is difficult to effect region division with a high accuracy and better results are not necessarily obtained.
As described above, with the conventional video coding apparatus, since only one piece of information related to prediction, such as motion vector information whose error would degrade the quality of the decoded picture seriously, is coded for each prediction region, resistance to error is low.
To increase resistance to error, pieces of information on all of the predictions must be transferred via transmission channels having low error probabilities. Since there is a limit to the ratio of the code amount of code strings in each layer to the total code amount of code strings in all of the layers, the code strings must be transferred over transmission channels having different error probabilities, thus impairing the feature of multi-layered coding to alleviate the deterioration of picture quality due to errors.
Furthermore, with the conventional video coding apparatus, since the relatively important information including information related to prediction and the relatively unimportant information are mingled in code strings, an error occurred in the unimportant information affects the important information, resulting in a serious deterioration of picture quality.
As described above, with the conventional video coding/decoding apparatus using variable length coding to code the motion vector information, even if a measure to cope with errors, such as multi-layered coding, is taken, only an error in part of the code words in the motion vector information is permitted to spread over the remaining code words behind, so that the error has an adverse effect on the entire screen. Since all of the pieces of the motion vector information cannot be coded in the upper layers, many errors occur in pieces of the motion vector information, making a significant deterioration of picture quality liable to develop in the decoded picture.
Additionally, with the conventional video coding/decoding apparatus using block matching motion compensation, when a motion compensation block contains regions with different motions, the motion compensation efficiency decreases, causing the quality of reconstructed video signal to deteriorate. In addition, the amount of codes in the region shape information is large, making the coding efficiency lower.
Furthermore, with the conventional video coding/decoding apparatus using segmentation based compensation, the amount of codes in the region shape information is large, thus decreasing the coding efficiency.
SUMMARY OF THE INVENTION
An object of the present invention is to provide a video coding/decoding apparatus with high error resilience.
Another object of the present invention is to provide a video coding/decoding apparatus having a high coding efficiency as well as a high error resilience.
According to a first aspect of the present invention, there is provided a video coding apparatus comprising: a prediction section for predicting each of a plurality of first regions and each of a plurality of second regions smaller than the first regions at respective prediction levels ranging from a first prediction level at which motion prediction is effected with a first accuracy and to a second prediction level at which motion prediction is effected with a second accuracy higher than the first accuracy, for generating prediction signals corresponding to the first and second prediction levels, the first and second regions being obtained by dividing an input video signal at the respective levels; a prediction error generating section for generating a prediction error signal on the basis of the prediction signals obtained by the prediction section and the input video signal; a first coding section for coding the prediction error signal generated by the prediction error generating section to output first coded information; a second coding section for coding information on the prediction which is carried out at each of the prediction levels by the prediction section to output second coded information; and a code string output section for outputting the coded information obtained by the first and second coding section in the form of hierarchical code strings.
According to a second aspect of the present invention, there is provided a video decoding apparatus comprising: a first decoding section for obtaining a prediction error signal by decoding the upper-layer coded information and lower-layer coded information obtained in a high-level prediction with a low accuracy and in a low-level prediction with a high accuracy, respectively; a second decoding section for obtaining information on prediction by decoding the upper-layer coded information and the lower-layer coded information; a deciding section for deciding whether or not the prediction error signals decoded at the first and second decoding section and the information on prediction have been decoded correctly; and picture generating section for reconstructing the video signal on the basis of the information decoded at the second decoding section, when the deciding section has decided that the first decoding section has not decoded the prediction error signal correctly.
According to a third aspect of the present invention, there is provided a video coding apparatus comprising: a prediction section for obtaining a motion vector from an input video signal and a reference video signal and generating a prediction signal on the basis of the motion vector; a prediction error generating section for generating a prediction error signal on the basis of the prediction signal from the prediction section and the input video signal; a first coding section for coding the prediction error signal; a second coding section for generating and coding the vector-quantized motion information corresponding to the motion vector; and a reference picture generating section for generating the reference video signal on the basis of result obtained by local-decoding the prediction error signal.
According to a fourth aspect of the present invention, there is provided a video decoding apparatus comprising: a first decoding section for decoding the prediction error information from a code string containing index information indicating a vector-quantized motion vector and coded prediction error information; a second decoding section for decoding the index information from the code string and obtaining a vector-quantized motion vector; a prediction section for generating a prediction signal by performing the motion compensation prediction of the preceding reconstructed video signal using the motion vectors obtained from the second decoding section; and a reconstructing section for reconstructing a video signal from the prediction signal and the prediction error signal.
According to a fifth aspect of the present invention, there is provided a video coding apparatus comprising: a prediction section for predicting each of a plurality of first regions and each of a plurality of second regions smaller than the first regions at respective prediction levels ranging from a first prediction level at which motion prediction is effected with a first accuracy and to a second prediction level at which motion prediction is effected with a second accuracy higher than the first accuracy, for generating prediction signals corresponding to the first and second prediction levels, the first and second regions being obtained by dividing an input video signal at the respective levels; a prediction error generating section for generating a prediction error signal on the basis of the prediction signals obtained by the prediction section and the input video signal; a first coding section for coding the prediction error signal generated by the prediction error generating section to output first coded information; a second coding section for coding information on the prediction which is carried out at each of the prediction levels by the prediction section to output second coded information; and a code string output section for outputting the coded information obtained by the first and second coding section in the form of hierarchical code strings, and wherein the prediction section obtains motion vectors for the first and second regions from the input video signal and reference video signal and on the basis of the motion vectors and generates a prediction signal corresponding to each of the prediction levels from the first prediction level to the second prediction level, and the second coding section generates and codes the vector-quantized motion information corresponding to the motion vector as information on the prediction.
With the video coding apparatus according to the first aspect of the invention, because the input video signal is predicted in a hierarchical fashion over as many regions as possible and the pieces of information on the predictions obtained at the individual levels containing not only the lowest level but also higher levels are coded, even if an error has occurred in the information on the prediction at a specific level, the video decoding apparatus can produce a prediction signal from the information on the predictions, provided that there is no error in the information on the predictions at the higher levels. This helps reduce the deterioration of picture quality of the decoded image when an error has occurred in information on the prediction.
With the video coding apparatus according to the second aspect of the invention, pieces of information on the predictions at the individual levels of hierarchical predictions are decoded. When the prediction error signal is not decoded correctly because an error has occurred in the prediction at a specific level, the video signal is decoded by using the information on the prediction at higher levels to produce a prediction signal, whereby the deterioration of picture quality of the decoded image can be reduced.
With the video decoding apparatus according the third aspect of the invention, because the probability that erroneous information will be determined to be free from an error becomes low by whether the decoded information is possible information in coding motion pictures, the deterioration of picture quality of the decode picture due to use of erroneous information in decoding the motion picture can be suppressed.
With the video coding apparatus and video decoding apparatus according to the fifth and sixth aspects of the invention, motion compensation prediction is performed using vector-quantized motion information. In vector quantization, a series of sampled values is quantized into a single code vector, enabling the redundancy in the sampled values to be used directly for information compression. Therefore, by vector-quantizing the motion information and representing the motion information using the code book indexes specifying code vectors in a code book, it is possible to effect motion compensation prediction efficiently while suppressing the amount of codes in the motion information.
Furthermore, since the vector-quantized motion information has a smaller bias in the volume of generated information than the directly coded motion vector information, even fixed-length coding enables the motion information to be coded at a relatively high coding efficiency. With the video decoding apparatus, even when an error has got mixed in the transmission channel, use of fixed-length coding prevents the error from spreading over a wide range due to synchronization failure as found in variable-length coding, with the result that the quality of the reconstructed picture at the time when an error has occurred is improved remarkably. Therefore, with the first video coding/decoding apparatus, the error resilience is improved while a high coding efficiency is maintained.
Furthermore, candidates for the motion vectors corresponding to code vectors stored in a code book are arranged in such a manner that candidates for motion vectors with smaller movement are arranged with higher pixel accuracy. Since motion vectors with smaller movement usually appear more frequently, such an arrangement with higher pixel accuracy makes smaller a prediction error signal for motion compensation prediction, so that the coding efficiency is improved. On the other hand, since motion vectors with larger movement appear less frequently, an arrangement with high pixel accuracy does not contribute much to a reduction in the prediction error signal. By decreasing the pixel accuracy and reducing the number of candidates for motion vectors to be searched for, the coding efficiency can be improved more.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate presently preferred embodiments of the invention and, together with the general description given above and the detailed description of the preferred embodiments given below, serve to explain the principles of the invention.
FIG. 1 is a block diagram of a video coding apparatus according to a first embodiment of the present invention;
FIG. 2 illustrates the motion compensation regions in the video coding apparatus of FIG. 1 and the motion vectors corresponding to the regions;
FIGS. 3A and 3B show an upper-layer code string and a lower-layer code string outputted from the video coding apparatus of FIG. 1;
FIG. 4 is a block diagram of a video decoding apparatus corresponding to the video coding apparatus of the first embodiment;
FIG. 5 is a block diagram of a video coding apparatus according to a second embodiment of the present invention;
FIG. 6 is a block diagram of a motion compensation adaptive prediction circuit in the video coding apparatus of the second embodiment;
FIG. 7 is a block diagram of a video decoding apparatus corresponding to the video decoding apparatus of the second embodiment;
FIG. 8 is a block diagram of a motion compensation adaptive prediction circuit in the video decoding apparatus of FIG. 7;
FIG. 9 is a block diagram of a video coding apparatus according to a third embodiment of the present invention;
FIG. 10 is a block diagram of a motion compensation adaptive prediction circuit in the video coding apparatus of the third embodiment;
FIGS. 11A and 11B are diagrams to help explain the operation of the motion compensation adaptive prediction circuit of FIG. 10;
FIG. 12 is a block diagram of a video decoding apparatus corresponding to the video coding apparatus of the third embodiment of the invention;
FIG. 13 is a block diagram of a motion compensation adaptive prediction circuit in the video decoding apparatus of FIG. 12;
FIG. 14 is a block diagram of a motion compensation adaptive prediction circuit in a video coding apparatus according to a fourth embodiment of the present invention;
FIG. 15 is a diagram to help explain the operation of the motion compensation adaptive prediction circuit of FIG. 14;
FIG. 16 is another block diagram of the motion compensation adaptive prediction circuit in the video coding apparatus of the fourth embodiment;
FIG. 17 illustrates an example of motion vector code vectors stored in a motion vector code book used in the video coding apparatus and video decoding apparatus in the second, third, or fourth embodiment;
FIG. 18 illustrates another example of motion vector code vectors stored in a motion vector code book used in the video coding apparatus and video decoding apparatus in the second, third, or fourth embodiment;
FIG. 19 is another block diagram of the motion compensation adaptive prediction circuit in the video decoding apparatus corresponding to the video coding apparatus according to the second embodiment of the invention;
FIG. 20 is a diagram to help explain the operation of the motion compensation adaptive prediction circuit of FIG. 19;
FIG. 21 is another block diagram of the motion compensation adaptive prediction circuit in the video decoding apparatus corresponding to the video coding apparatus according to the second embodiment of the invention;
FIG. 22 is a block diagram of a video coding apparatus according to a fifth embodiment of the present invention;
FIG. 23 shows an example of candidates for a motion vector to be searched for at the motion compensation adaptive prediction circuit of FIG. 22;
FIG. 24 is a block diagram of a video decoding apparatus corresponding to the video coding apparatus of the fifth embodiment;
FIG. 25 is a block diagram of a video coding apparatus according to a sixth embodiment of the present invention;
FIG. 26 is a diagram to help explain the operation of the motion compensation adaptive prediction circuit in the video coding apparatus of FIG. 25;
FIG. 27 is a block diagram of a video decoding apparatus corresponding to the video coding apparatus of the sixth embodiment;
FIG. 28 is a block diagram of the motion compensation adaptive prediction circuit in a video coding apparatus according to a seventh embodiment of the present invention;
FIG. 29 is a diagram to help explain the operation of the motion compensation adaptive prediction circuit of FIG. 28;
FIG. 30 is another block diagram of the motion compensation adaptive prediction circuit in the video decoding apparatus corresponding to the video coding apparatus of the seventh embodiment;
FIGS. 31A to 31D are diagrams to help explain the region shape in segmentation based compensation and a method of searching for a motion vector;
FIG. 32 is a flowchart of the procedure for creating a code book for segmentation based compensation using vector quantization;
FIGS. 33A and 33B are diagrams to help explain a method of predicting small-region motion vectors from large-region representative motion vectors in the video coding apparatus and video decoding apparatus of the seventh embodiment;
FIG. 34 is a diagram to help explain a method of switching between vector quantization code books in the video coding apparatus and video decoding apparatus of the seventh embodiment;
FIG. 35 is a block diagram of a motion compensation adaptive prediction circuit in a video coding apparatus according to an eighth embodiment of the present invention;
FIG. 36 is a diagram to help explain a method of searching for a large-region representative motion vector in the motion compensation prediction circuit of FIG. 35;
FIG. 37 is a diagram to help explain a method of searching for a small-region motion vector without region division in the motion compensation prediction circuit of FIG. 36;
FIG. 38 is a block diagram of a system to which a video coding/decoding apparatus according to the present invention is applied;
FIG. 39 is a schematic block diagram of a video coding apparatus used in the system of FIG. 38;
FIG. 40 is a schematic block diagram of a video decoding apparatus used in the system of FIG. 38;
FIG. 41 is a schematic block diagram of a recording apparatus to which a video coding system of the present invention is applied; and
FIG. 42 is a block diagram of a reconstructing apparatus that reconstructs the coded data recorded on a recording medium by the recording apparatus of FIG. 41.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Hereinafter, referring to the accompanying drawings, embodiments of the present invention will be explained.
A video coding apparatus shown in FIG. 1 comprises a local decode and prediction circuit section 1 that produces prediction signals in the range from the upper to the lower layers according to the motion of in an input video signal, a prediction error generator section 2 that generates a prediction error signal, a first coding circuit section 3 that encodes a prediction error signal, a second coding circuit section 4 that encodes information on prediction, and a multiplexer that multiplexes the code strings obtained from the first and second coding circuit sections.
The input video signal 11 is first used in prediction at a prediction circuit 12 in the local decoding and prediction circuit section 1. Specifically, the prediction circuit 12 senses the motion vector between the input video signal 11 and the reference video signal obtained by coding/decoding and stored in a frame memory 13 and on the basis of the motion vector, produces a motion compensation prediction signal. The prediction circuit 12 can operate in two modes, a motion compensation prediction mode (interframe prediction mode) and an intra-frame prediction mode in which the input video signal 11 is coded directly, and selects the mode most suitable for coding and outputs a prediction signal 14 corresponding to the mode. Namely, the prediction circuit 12 outputs a motion compensation prediction signal in the motion compensation prediction mode and "0" as a prediction signal in the intra-frame prediction mode.
A subtracter 15 produces a prediction error signal 16 by subtracting the prediction signal 14 from the input video signal 11. A discrete cosine transform (DCT) circuit 17 subjects the prediction error signal 16 to discrete cosine transform in blocks of a specific size and produces DCT coefficient information 18. The DCT coefficient information 18 is quantized at a quantizer 19. Since in the intra-frame prediction mode, the prediction signal 14 is "0", the subtracter 15 outputs the input video signal 11 directly as the prediction error signal 16.
The DCT coefficient information 20 quantized at the quantizer 19 is branched into two pieces of information; one piece of information is subjected to variable-length coding at a first variable-length encoder 21 and the other piece of information is subjected to inverse quantization at an inverse quantizer 22. The output of the inverse quantizer 22 undergoes inverse discrete cosine transform at an inverse discrete cosine transform (inverse DCT) circuit 23. That is, the inverse quantizer 22 and inverse DCT circuit 23 carry out the processings opposite to those in the quantizer 19 and DCT circuit 17, respectively, and the inverse DCT circuit 23 produces a signal approximate to the prediction error signal 16 at the output. The output of the inverse DCT circuit 23 is added to the prediction signal 14 from the prediction circuit 12 at an adder 24, which then produces a local decoded signal 25. The local decoded signal 25 is stored in the frame memory 13 as a reference video signal.
The prediction circuit 12, as described later, outputs the large-region prediction mode/motion vector information 26 and the small-region prediction mode/motion vector information 27 as information on prediction, which are subjected to variable-length coding at a variable-length encoder 28 and a variable-length encoder 29, respectively. The code strings outputted from the variable-length encoders 21, 28, and 29 are multiplexed at a multiplexer 30 and divided into upper-layer code strings 31 and lower-layer code strings 32, which are outputted to a transmission channel/storage medium (not shown).
Under the conditions where code strings can be transmitted/stored via transmission channels/storage mediums having different error probabilities, the upper-layer code strings 31 are transmitted and stored via transmission channels/storage mediums having lower error probabilities and the lower-layer code strings 32 are transmitted and stored via transmission channels/storage mediums having higher error probabilities, whereby an error is made as less liable to occur in the upper-layer code strings as possible. When the code strings 31, 32 are subjected to error correcting/detecting coding, more powerful error correcting/detecting coding is effected so that the upper-layer code strings 31 may have a lower error probability than the lower-layer code strings 32.
Next, the configuration and operation of the prediction circuit 12 will be explained in detail with reference to FIG. 2. The prediction circuit 12 divides the input video signal 11 into as many regions as possible in a hierarchical fashion in the range from the highest to lowest levels, performs the motion compensation prediction of the input video signal for each region divided at each level, and thereby produces a prediction signal. In the example of FIG. 2, the prediction circuit 12
performs region division and prediction at two levels. Specifically, at the first level, the prediction circuit 12 broadly divides the input video signal 11 as shown by the solid-line regions in FIG. 2 (referred to as large regions), performs motion compensation prediction of these large regions with low pixel accuracy, and then at the second level, further divides the large regions into broken-line regions (referred to as small regions) if necessary, and effects motion compensation prediction of these small regions with high pixel accuracy.
Then, variable-length encoders 28 and 29 encode not only information on the prediction of the large regions outputted from the prediction circuit 12 but also information on the prediction of the small regions in connection with the large regions. By doing this way, if information on the prediction of the small regions should be lost by the error on transmission line/storage media, the decoding apparatus can predict with low accuracy provided that information on the prediction of the large regions is decoded correctly, thereby preventing a serious deterioration of picture quality of the decode image.
The information on the prediction outputted from the prediction circuit 12 consists of information indicating the prediction mode and information indicating the motion vector. Information indicating the prediction mode of large regions and information indicating motion vectors (shown by the sold-line arrows in FIG. 2), that is, large-region prediction mode/motion vector information 26, are subjected to variable-length coding at the variable-length encoder 28. In this case, for the motion vector information, the difference from the already coded adjacent large-region motion vector information may be subjected to variable-length coding or the motion vector information may undergo fixed-length coding without computing the difference. Additionally, the motion vector information may be subjected to fixed-length coding in some regions and to variable-length coding in the remaining regions.
On the other hand, information indicating the prediction mode of small regions and information indicating motion vectors (shown by the broken-line arrows in FIG. 2), that is, small-region prediction mode/motion vector information 27 undergoes variable-length coding at the variable-length encoder 29. In this case, the difference between the motion vector information and the large-region motion vector information may be calculated for each small region and subjected to variable-length coding. The differences may be coded together for each large region by block encoding and vector quantization. When the difference value is subjected to variable-length coding, by expressing a large-region motion vector as a reversible function (e.g., the average value) on the basis of small-region motion vectors, any one of the small-region motion vectors can be computed on the basis of the large-region motion vector and the other small-region motion vectors, thereby eliminating the necessity of coding.
FIGS. 3A and 3B show the structure of an upper-layer code string 31 and a lower-layer code string 32. A uniquely decodable synchronization code is put at the head of each coding frame or each region unit in the upper-layer code string 31 shown in FIG. 3A. PSC represents a synchronization code for a frame unit. The synchronization code PSC is followed by a picture header indicating coding information on the frame. The picture header includes a frame number indicating the temporal position of the frame, information indicating the prediction mode of the frame (frame prediction mode), and information indicating the length of each code string in the upper and lower layers of the frame (upper layer code amount and lower layer code amount.
Furthermore, when the sizes of the large and small regions and information indicating the pixel accuracy of motion compensation (large-region MC accuracy and small-region MC accuracy) are added to the picture header as shown in FIG. 3A, whereby the amount of codes in the motion vector information can be controlled by changing the motion compensation accuracy frame by frame. As a result, even when the ratio of the amount of codes 31 in the upper-layer code strings to that in the lower-layer code strings 32 is determined on account of the limits of the transmission channel/storage medium, the amount of codes can be allocated according to the situation. Because the total amount of codes in the motion vector information on each frame can be controlled, the optimum motion compensation accuracy can be selected on the basis of the motion compensation accuracy and the amount of codes in the motion vector information, whereby the coding efficiency can be improved.
One of the features is that pieces of coding information on the individual regions are arranged in descending order of importance. Here, information with much importance means information whose error would cause a serious deterioration of the decoded image. Specifically, behind the picture header in the upper-layer code string 31, information indicating the prediction mode (mode information) which has the greatest importance is placed. This mode information consists of the large-region prediction mode and motion vector information and small-region prediction mode information.
Placed behind the code string of the prediction mode information is the DC component of the DCT coefficient information (intra DC) in the region for which the intra-frame prediction mode has been selected, that is, the code string of the DC component among the code strings obtained by coding by means of the variable-length encoder 21 the DCT coefficient information obtained by passing the prediction error signal 16 through the DCT circuit 17 and quantizer 19. Furthermore, put in a region for which the motion compensation prediction mode has been selected is the large-region motion vector information (large-region MV), that is, the code string of the motion vector information among the code strings obtained by coding by means of the variable-length encoder 28 the large-region prediction mode/motion vector information 26 outputted from the prediction circuit 12.
For example, where the prediction mode information is inserted into the code string, first the large-region mode information may be inserted into the code string in units of one frame and then the large-region mode information may be inserted therein. Alternatively, large-region mode information and small-region mode information contained in the large-region mode information may be inserted together in the code string for each large region.
On the other hand, placed first in the lower-layer code string 32 shown in FIG. 3B is the small-region motion vector information (small-region MV), that is, the code string of the motion vector among the code strings obtained by coding the small-region prediction mode/motion vector information 27 outputted from the prediction circuit 12 at the variable-length encoder 29. Put behind that string is the high-frequency components of the DCT coefficient information, that is, the code string of the high-frequency components among the code strings obtained by coding by means of the variable-length encoder 21 the DCT coefficient information produced by passing the prediction error signal 16 through the DCT circuit 17 and quantizer 19.
As described above, motion compensation prediction is performed hierarchically and the prediction mode information and the large-region motion vector information are allocated to the upper-layer code string 31 and small-region motion vector information is assigned to the lower-layer code string 32. Therefore, even if the small-region motion vector information contained in the lower-layer code string 32 has been lost due to an error in the transmission channel/storage medium, the video decoding apparatus can perform motion compensation prediction with low accuracy using the large-region motion vector information contained in the upper-layer code string 31, so that the probability that the picture quality of the decoded image will deteriorate seriously can be reduced.
Furthermore, in the embodiment, because even the code strings in each of the upper and lower layers are arranged in descending order of importance, an error occurred in the unimportant information has no adverse effect on the important information, preventing a significant deterioration of picture quality.
Next, an embodiment of a video decoding apparatus according to the present invention will be explained. FIG. 4 is a block diagram of a video decoding apparatus corresponding to the video coding apparatus of FIG. 1.
In FIG. 4, the upper-layer code string 31 and lower-layer code string 32 outputted from the video coding apparatus of FIG. 1 pass through the transmission channel/storage medium and become the upper-layer code string 41 and lower-layer code string 42, which enter a demultiplexer 43, which then separates these code strings into variable-length codes 44 of quantized DCT coefficient information, variable-length codes 45 of large-region prediction mode and motion vector information, and a variable-length code 46 of small-region prediction mode and motion vector information, which are in turn supplied to variable-length decoders 47, 48, and 49, respectively.
The variable-length decoder 47 subjects the variable-length codes 44 to variable-length decoding and produces quantized DCT coefficient information 50. The quantized DCT coefficient information 50 is subjected to inverse quantization at an inverse quantizer 53. The resulting signal undergoes inverse discrete cosine transform at an inverse DCT circuit 54, which produces a prediction error signal 55. An adder 56 adds the prediction error signal 55 to the prediction signal 59 from the prediction circuit 57 and produces a reconstructed video signal 61. The reconstructed video signal 61 is outputted to the outside of the video decoding apparatus and is also stored in a frame memory 58 as a reference video signal.
On the other hand, the variable-length decoders 48 and 49 subject the variable-length codes 45, 46 to variable-length decoding and produce large-region prediction mode and motion vector information 51 and small-region prediction mode and motion vector information 52, respectively. These pieces of information 51 and 52 are inputted to the prediction circuit 57. The prediction circuit 57 predicts the video signal from the reference video signal stored in the frame memory 58, the large-region prediction mode and motion vector information 51, and the small-region prediction mode and motion vector information 52, and produces a prediction signal 59.
An error detecting circuit 60 determines whether there is an error in the upper-layer code string 41 and the lower-layer code string 42 on the basis of the state of the demultiplexer 43 and variable-length decoders 47, 48 and 49 and supplies the determination result to the prediction circuit 57. If the error detecting circuit 60 has sensed that neither the upper-layer code string 41 nor the lower-layer code string 42 has an error, the prediction circuit 57 will output the same prediction signal
59 as the prediction signal 14 in FIG. 1 on the basis of the reference video signal stored in the frame memory 58.
The above process is the process of reconstructing a picture signal in a manner that corresponds to the video coding apparatus of FIG. 1. The processes carried out at the inverse quantizer 53, inverse DCT circuit 54, adder 56, and frame memory
58 are basically the same as those at the inverse quantizer 22, inverse DCT circuit 23, adder 24, and frame memory 13 in FIG. 1. The variable-length decoders 47, 48, 49, demultiplexer 43 perform the reverse of the processes effected at the variable-length encoders 21, 28, 29 and multiplexer 30.
If the error detecting circuit 60 has sensed that at least one of the upper-layer and lower-layer code strings has an error, for example, a reconstructed image will be created using the information more important than the information in which an error has been sensed, as follows:
(1) If an error has been sensed in the DCT coefficient information in the block with motion compensation prediction mode, the prediction error signal for the block will be set to 0 and a reconstructed video signal 61 will be created using as a prediction signal 59 the motion compensation prediction signal obtained from the properly decoded mode information, large-region motion vector information, and small-region motion vector information.
(2) If an error has occurred in the small-region motion vector information, a reconstructed video signal 61 will be set to a motion compensation prediction signal created using the large-region motion vector information.
(3) If an error has occurred in the large-region motion vector information, concealment will be performed. If the large-region motion vector information can be predicted from the motion vector information of the surrounding regions or that of the already decoded frame this predicted motion vector is used. Otherwise, the picture signal of the already decoded frame will be used directly as a reconstructed video signal 61.
(4) If an error has occurred in the AC components of the DCT coefficient information in the block for which the intra-frame prediction mode has been selected, a picture signal for the block will be predicted from the DC component of the DCT coefficient information and the correctly decoded picture signal in the surrounding blocks or an picture signal for the block will be predicted from the picture signal of the already decoded frame.
When variable-length codes are used, an error may bring about synchronization failure and have an adverse effect on the subsequent codes until re-synchronization performed by detecting a synchronizing code. Should this happen, the subsequent codes will not be used in decoding. For example, when an error has occurred in the small-region motion vector information in the lower-layer code string 42, the error may spread over the motion vector information on those subsequent to the small region and the DCT coefficient information behind. In such a case, pieces of information over which the error has spread are not used for decoding. Even when such synchronization failure has taken place, an error occurred in information of little importance will not have an adverse effect on information of great importance because codes are arranged in descending order of importance in a code string, so that a serious deterioration of the reconstructed image can be prevented.
Concrete methods of sensing an error in the code string 41 or 42 at the error detecting circuit 60 are as follows.
A first method is to use error detecting codes, such as parity check codes or CRC codes. In this case, variable-length codes are subjected to error detecting coding at the multiplexer 30 in the video coding apparatus of FIG. 1 and an error detection process is carried out at the demultiplexer 43 in the video decoding apparatus of FIG. 4. The detecting result is supplied to the error detecting circuit 60.
A second method is such that when a code word that does not exist in a code word table has been detected, the code word is determined to be erroneous. When variable-length codes are used, an error can spread over not only the portion where the error has been detected, but also the code strings before and after the portion. Therefore, the error detection process is performed on all of the code words.
A third method is to determine an error by whether the motion vector information, prediction signal, DCT coefficient information, prediction error signal, and reconstructed video signal are signals impossible to appear in coding the moving image. Since the present invention is characterized by using the third method, an detailed explanation will be given.
For example, when the motion vector shown in the motion vector information has exceeded the previously determined searching range or gone outside the screen, it is determined to be erroneous.
Furthermore, an error can be detected by determining the DCT coefficient information subjected to inverse quantization at the inverse quantizer 53. If the pixel value of an input picture signal 11 is in the range of 0 to D-1 and the DCT block size is N.times.N, the DCT coefficient will take a value in the following range:
<Intraframe Prediction Mode>
DC component: 0 to N.times.D
AC component: -(N/2.times.D) to (N/2.times.D)
<Interframe Prediction Mode>
-(N/2.times.D) to (N/2.times.D)
When the decoded DCT coefficient takes a value outside the range, it will be determined to be erroneous. In this case, all or part of the DCT coefficients for the block in which an error has been detected should be determined to be 0 or the decoded value should be estimated from the decoded values of the surrounding blocks.
Additionally, an error can be detected using the pixel value of the reconstructed video signal 61. If the pixel value of the input picture signal 11 is in the range of 0 to D-1, the DCT block size is N.times.N, and the quantization step-size is Q (in the case of linear quantization), the range in which the pixel value of the reconstructed video signal 61 can lie is:
When the pixel value of the reconstructed video signal 61 has exceeded the range, it will be determined to be erroneous. In that case, for example, the reconstructed video signal 61 should be obtained by the prediction error signal 55 being made
0 in the inter-frame prediction mode (motion compensation prediction mode) and part of the DCT coefficients inputted to the inverse DCT circuit 54 being made 0 in the intraframe prediction mode, or the reconstructed video signal should be estimated from the pixel values of the surrounding blocks of the reconstructed video signal 61.
As described above, with the present invention, by adding the determination of whether the decoded information or signal is information or a signal impossible to appear in the encoding to the error detection at the error detecting circuit 60 in the video decoding apparatus, a more accurate error detection can be made. This helps prevent the deterioration of quality of the reconstructed image which could take place as a result of using the erroneous information or signal directly for the reconstruction of video signals without correcting the erroneous information or signal by an error process.
The present invention may be practiced or embodied in still other ways without departing from the spirit or essential character thereof. For example, while in the embodiment, the code strings outputted from the video coding apparatus are divided into two layers, they may be divided into three layers or more. For instance, the frame synchronizing code (PSC), picture header, and mode information may be allocated to the first layer, the DC component of the DCT coefficient information in the intra-frame prediction mode (intra DC) and the large-region motion vector information be allocated to the second layer, the small-region motion vector information be allocated to the third layer, and the DCT coefficient information other than the DCT coefficient information allocated to the other layer may be allocated to the fourth layer. The DCT coefficients may be further divided into several layers, such as the low-frequency band components and the high-frequency band components.
When fixed-length coding and the variable-length coding of the difference motion vector information are used for the coding of the motion vector information as described earlier, the fixed-length coding which would have no adverse effect on the subsequent code strings due to synchronization failure is used in coding the large-region motion vector information, and placed first in each frame or specific region unit, followed by variable-length coded motion vector information. By doing this, even if an error in the variable-length coded section has caused a synchronization failure, the error will have no adverse effect on the fixed-length coded motion vector information. This makes it possible to estimate the motion vector in which an error has occurred from the fixed-length coded motion vector information and produce a prediction signal with low accuracy, so that the deterioration of picture quality of the reconstructed image due to an error can be reduced.
The method of determining whether the information reconstructed at the video decoding apparatus of FIG. 4 is information impossible to appear in coding motion pictures is applied to not only the layered code strings, but also a video decoding apparatus that decodes the original picture signal from the code strings obtained from an ordinary video coding apparatus.
A second embodiment of the present invention will be described with reference to FIG. 5. In this embodiment, because motion compensation adaptive prediction and discrete cosine transform coding are basically the same as those in the above embodiment, an explanation of them will not be given.
In FIG. 5, an inputted video signal 121 is first used in prediction at a motion compensation adaptive prediction circuit 101. The motion compensation adaptive prediction circuit 101 senses the motion vector between a video signal 121 and a local decoded picture signal 125 of the local-decoded frame outputted from a frame memory 102 and produces a motion compensation prediction signal on the basis of the motion vector. The motion compensation adaptive prediction circuit 101 has a motion compensation prediction mode (interframe prediction mode) and an intra-frame prediction mode in which the video signal 101 is coded directly and selects the optimum one from these modes and outputs a prediction signal 122 corresponding to each mode. Namely, the motion compensation adaptive prediction circuit 101 outputs a motion compensation prediction signal in the motion compensation prediction mode and "0" in the intra-frame prediction mode as the prediction signal 122. The motion compensation adaptive prediction circuit 101 also outputs as motion information 126 the motion vector index indicating the vector quantized motion vector information used in motion compensation prediction.
A subtracter 103 produces a prediction error signal 123 by subtracting the prediction signal 122 from the video signal 121. A discrete cosine transform (DCT) circuit 104 subjects the prediction error signal 123 to discrete cosine transform in blocks of a fixed size and produces DCT coefficient information. The DCT coefficient information is quantized at a quantizer 105. Since in the intra-frame prediction mode, the prediction signal 122 is "0", the subtracter 103 outputs the video signal
121 directly as the prediction error signal 123.
The DCT coefficient information quantized at the quantizer 105 is branched into two pieces of information; one piece of information is subjected to variable-length coding at a variable-length encoder 106 and the other piece of information is subjected to inverse quantization at an inverse quantizer 107. The output of the inverse quantizer 107 undergoes inverse discrete cosine transform at an inverse discrete cosine transform (inverse DCT) circuit 108. That is, the inverse quantizer 107 and inverse DCT circuit 108 carry out the processing opposite to that in the quantizer 105 and DCT circuit 104 and the inverse DCT circuit 108 produces a signal approximate to the prediction error signal 123 at the output. The output of the inverse DCT circuit 108 is added to the prediction signal 122 from the motion compensation adaptive prediction circuit 101 at an adder 109, which then produces a local decoded picture signal. The local decoded picture signal is stored in the frame memory 102.
The coded prediction error signal (the variable-length codes of DCT coefficient information) 124 outputted from the variable-length encoder 106 and the motion information (motion vector index) outputted from the motion compensation adaptive prediction circuit 101 are multiplexed at a multiplexer 110, which are outputted to a transmission channel/storage medium (not shown) as an output code string 127.
Next, the motion compensation adaptive prediction circuit 101, a characteristic portion of the present invention, will be explained. The motion compensation adaptive prediction circuit 101 obtains a motion vector by vector quantization and performs motion compensation prediction using the vector-quantized motion vector.
The motion compensation adaptive prediction circuit 101 shown in FIG. 6 comprises a prediction circuit 201, an error calculator 202, a controller 203, and a code book 204. The code book 204 stores vector-quantized motion vector candidates in the form of code vectors.
The prediction circuit 201 generates a prediction signal 122 corresponding to the code vector 213 taken out from the code book 204 on the basis of the local coded picture signal 125 of the local-decoded frame from the frame memory of FIG. 5, that is, a signal obtained by performing the motion compensation prediction of the local decoded picture signal 125 using the motion vector corresponding to the code vector 213.
The error calculator 202 computes the magnitude of the difference (error) between the video signal 121 and the prediction signal 122 and generates an error level signal 211 indicating the magnitude. The magnitude of error may be determined by, for example, the absolute sum or the square sum of errors, or by the square error sum with the weighted low-frequency-band components.
The controller 203 gives a code book index 212 specifying a code vector to the code book 204 and thereby takes a code vector 213 minimizing the magnitude of the error given by the error level signal 211 out of the code book 204 and supplies it to the prediction circuit 201. The controller 203 converts the code book index 212 into a fixed-length code to produce a motion vector index and outputs it as motion information 126 to the multiplexer 110 of FIG. 5.
Next, a video decoding apparatus in the present embodiment will be explained. FIG. 7 is a block diagram of a first video decoding apparatus corresponding to the video coding apparatus of FIG. 5.
In FIG. 7, the output bit stream 127 sent from the video coding apparatus of FIG. 5 is transmitted over a transmission channel or stored in a storage medium and becomes an input bit stream 321. The input code string is supplied to a demultiplexer 310, which separates the string into a coded prediction error signal (a variable-length code of quantized DCT coefficient information) 322 and motion information (motion vector index) 323. A variable-length decoder 306, an inverse quantizer 307, and an inverse DCT circuit 308 subject the coded prediction error signal 322 to the processes opposite to those at the variable-length encoder 106, quantizer 105, and DCT circuit 104 of FIG. 1, and produce a prediction error signal 324.
An adder 309 adds the prediction error signal 324 to the prediction signal 326 from the motion compensation adaptive prediction circuit 301 and produces a reconstructed video signal 327. The reconstructed video signal 327 is outputted outside the video decoding apparatus and stored in a frame memory 302.
The motion information 323 is inputted to the motion compensation adaptive prediction circuit 301. The motion compensation adaptive prediction circuit 301 performs motion compensation prediction using the motion information 323, on the basis of the reconstructed video signal 325 of the preceding frame outputted from the frame memory 302 and outputs a prediction signal 326.
FIG. 8 is a block diagram of the motion compensation adaptive prediction circuit 301 of FIG. 7, which contains a prediction section 401 and a code book 402. The code book 402 has the same structure as the code book 204 of FIG. 6 and stores vector-quantized motion vector candidates in the form of code vectors. From the code book 402, a code vector 410 corresponding to the motion information (motion vector index) 323, that is, the vector-quantized motion information, is taken.
The prediction section 401 creates a prediction signal 326 corresponding to the code vector 410 from the code book 402 on the basis of the reconstructed video signal of the coded frame shown in FIG. 7, that is, a signal obtained by performing the motion compensation prediction of the reconstructed video signal 325 of the coded frame, using the motion vector corresponding to the code vector 410.
As described above, the video coding/decoding apparatus according to the second embodiment performs motion compensation prediction using the vector-quantized motion information. Namely, because the motion information 126 can be expressed by a code book index specifying a code vector in the code book 204, it is possible to perform efficient motion compensation prediction while suppressing the amount of the motion information.
Furthermore, such vector-quantized motion information has a smaller bias in the occurrence than non-quantized motion vector information, so that the fixed-length coding of the vector-quantized motion information at the controller 203 as explained above enables the motion information to be coded with a relatively high coding efficiency. Use of fixed-length coding eliminates the problem that an error spreads over a wide range due to synchronization failure as found in variable-length coding when an error has been introduced at a transmission line or storage media.
Accordingly, with the present embodiment, it is possible to achieve excellent error resilience while maintaining high coding efficiency.
FIG. 9 shows a video coding apparatus according to a third embodiment of the present invention. The same parts as those in FIG. 5 are indicated by the same reference symbols and an explanation will be centered on the difference from the second embodiment. The third embodiment differs from the second embodiment in the configuration of the motion compensation adaptive prediction circuit 101. Specifically, in the third embodiment, another variable-length encoder 111 is added.
In the motion compensation adaptive prediction circuit 101 of the present embodiment, motion vectors are divided into two levels, large-region representative motion vectors indicating motion in large regions and small-region motion vectors indicating motion in small regions obtained by subdividing the large regions, and searching is effected at these two levels. The large-region representative motion vectors are searched for by a widely used conventional method, such as the block matching method. The obtained motion vector information is subjected to variable-length coding and the resulting information is outputted. On the other hand, for the small-region motion vectors, the difference motion vectors obtained on the basis of the difference between the large-region representative motion vectors and the small-region motion vectors are coded by vector quantization.
The reason why motion compensation prediction is effected in a hierarchical fashion is that since the difference motion vectors based on the difference between the large-region representative motion vectors and the small-region motion vectors are generally distributed in the vicinity of difference=0 with a high probability, as compared with the motion vectors for which the difference has not been calculated, vector quantization is effected at a high coding efficiency. By combining such large-region representative motion vectors and the vector quantization of the difference motion vectors based on the difference between the large-region representative motion vectors and the small-region motion vectors, it is possible to achieve higher error resilience while maintaining almost the same coding efficiency as that of a conventional coding method that subjects all of the motion vector information to variable-length coding.
FIG. 10 is a block diagram of the motion compensation adaptive prediction circuit 101 of the present embodiment. The motion compensation adaptive prediction circuit comprises a first prediction circuit 221, a first error calculator 222, a first controller 223, a second prediction circuit 224, a second error calculator 225, a second controller 226, a code book 227, and an adder 228. The first prediction circuit 221, first error calculator 222, and first controller 223 are provided for motion compensation prediction using large-region representative motion vectors.
On the other hand, the second prediction circuit 224, second error calculator 225, and second controller 226, code book 227, and adder 228 are provided for motion compensation prediction using the motion vectors obtained by the vector quantization of the difference motion vectors based on the difference between the large-region representative motion vectors and the small-region motion vectors. The code book 227 stores the candidates of the difference motion vectors based on the difference large-region motion vectors and the small-region motion vectors in the form of code vectors.
FIG. 11A is a drawing to help explain the operation of motion compensation prediction in a frame in the third embodiment. First, the inside of the frame is divided into large regions indicated by solid lines and the large-region representative motion vectors shown by the solid-line arrows are obtained using the block matching method. Specifically, the first controller 223 generates first motion vectors 233 one after another, each being shifted one pixel from each other, in the horizontal and vertical directions in a specific range (e.g., in the range of .+-.15 pixels in the horizontal and vertical directions). The first prediction circuit 221 generates a first prediction signal 231 corresponding to the first motion vector 233 on the basis of the local decoded picture signal from the frame memory 102 of FIG. 9. Then, the first error calculator 222 computes the magnitude of the difference (error) between the input picture signal 121 in a large region and the first prediction signal and generates an error level signal 232 indicating the magnitude. The magnitude of error may be determined by, for example, the absolute sum or the square sum of errors, or by the square error sum with the weighted low-frequency-band components.
The first controller 223 obtains a large-region representative motion vector 128 for which the magnitude of the error given by the error level signal 232 becomes minimal and outputs it. The variable-length encoder 111 of FIG. 9 calculates the difference between the large-region representative motion vector 128 and the adjacent large-region representative motion vectors and subjects the difference to variable-length coding.
Then, the small-region motion vectors indicated by the broken-line arrows in the small regions shown by broken lines in FIG. 11A obtained by subdividing the large regions are calculated by vector quantization as the difference from the large-region representative motion vectors. Since the purpose of the vector quantization is to effect motion compensation prediction most efficiently, the selection of a difference motion vector from the code book 227 is performed on the basis of the magnitude of an error in motion compensation prediction, not on the basis of a direct square error sum of the input vectors and the code vectors as generally implemented in vector quantization. Hereinafter, the operation will be explained.
The code book 227 stores the candidates of the difference motion vectors based on the large-region representative motion vectors and the small-region motion vectors in the form of code vectors. The second controller 226 changes code book indexes
235 one after another and takes a difference motion vector 236 corresponding to the code book index 235 out of the code book 227. The adder 228 adds the large-region representative motion vector 128 to the difference motion vector 236 to obtain a small-region motion vector candidate 237. The second prediction circuit 224 obtains a prediction signal 122 corresponding to the small-region motion vector candidate 237 on the basis of the local decoded picture signal 125 from the frame memory 102 of FIG. 9. Furthermore, the second error calculator 225 computes the magnitude of the difference (error) between the video signal 121 and the prediction signal 122 and outputs an error level signal 234 indicating the magnitude.
The second controller 226 obtains as a small-region motion vector index a code book index for which the magnitude of the error given by the error level signal 234 becomes minimal, subjects the small-region motion vector index to fixed-length coding to produce a small-region motion vector 129, and outputs a prediction signal corresponding to the small-region motion vector.
In FIG. 9, the coded prediction error signal 124 from the variable-length encoder 106, the coded large-region motion vector 130 from the variable-length encoder 111, and the coded small-region motion vector 129 from the motion compensation adaptive prediction circuit 101 are multiplexed at the multiplexer 110 and the resulting code string is sent to the transmission channel/storage medium (not shown) as an output bit stream 127.
The multiplexer 110 arranges the motion vectors in a hierarchical fashion according to importance corresponding to the degree of the picture quality deterioration of the decoded picture signal due to an error in transmission/storage and carries out different levels of error protection. A method of changing levels of error protection is, for example, to use a different error correcting/detecting code for each layer and carry out error protection in a manner that uses a stronger correcting/detecting capability for an upper-layer of greater importance. When transmission and storage can be performed via transmission channels/storage mediums having different error rates, an upper-layer with greater importance is transmitted and stored via a transmission/storage medium having a lower error rate to make an error less liable to occur in the upper-layer code string. The coding mode and quantizing step size whose error has the most significant effect are allocated to the upper layer of the highest level and the intra-frame-coded low-frequency components are allocated to the upper layer of the second highest level.
Since an error in the large-region representative motion vector 128 has an adverse effect on a wide range, it is allocated to the upper layer of the third highest level. When the variable-length encoder 111 subjects the large-region representative motion vector 128 to variable-length coding by calculating the difference between the large-region representative motion vector and the adjacent large-region representative motion vectors, it is desirable that strong protection against errors should be provided, because a synchronization failure of a variable-length code may permit an error to spread over the entire screen.
Because the vector-quantized small-region motion vector 129 has the index fixed-length coded, is has a higher error resilience than the variable-length coded large-region representative motion vector 130. As will be described later, it is possible to rearrange the indexes or learn the code book so that the magnitude of error may be minimized. Therefore, it is assumed that the small-region motion vector indexes are allocated to lower-layer of a lower level than the representative motion vectors.
The error signals whose loss would cause a minor deterioration of picture quality or the high-frequency band components in intra-frame coding are allocated to the lower layer of the lowest level. This layer may be added with only an error detecting code used in error detecting, such as CRC, or a parity check bit. Furthermore, the layer may be further divided into sublayers according to orthogonal transform sequence, such as the relatively important low-frequency-band components and the less important high-frequency-band components in terms of the subjective quality of reconstructed image.
When the types of transmission channel/storage medium having different error rates have been determined to be, for example, two layers or three layers, when the ratio of the amount of codes in each layer to the total amount of codes is restricted to a specific range, when not so many types of error correction and detecting codes can be used because of the limit of hardware, what type of code word should be used in which layer may be determined suitably. For example, the number of layers is two and there is a limit to the amount of codes in the upper layer, the most important mode information, the low-frequency-band components in intra-frame coding, and the representative motion vectors are coded in the upper layer. The vector-quantized small-region difference motion vectors, the high-frequency-band components in intra-frame coding, and the prediction error signal are coded in the upper layer when there is room in the amount of codes in the upper layer, and are coded in the lower layer when there is no room in the amount of codes in the upper layer. In that case, since the difference motion vectors are more important, they are coded in the upper layer in preference to the others.
While in the example of FIG. 11A, a single large region is subdivided into four small regions, control of the size of a large region and the rate of subdivision into small regions enables much finer control for the amount of information.
Next, a video decoding apparatus in the present embodiment will be explained. FIG. 12 is a block diagram of a second video decoding apparatus corresponding to the video coding apparatus in the third embodiment of FIG. 9.
In FIG. 12, the output bit stream 127 sent from the video coding apparatus of FIG. 9 is transmitted over a transmission channel or stored in a storage medium and becomes an input bit stream 321. The input code string is supplied to a demultiplexer 310, which separates the string into a coded prediction error signal (a variable-length code of quantized DCT coefficient information) 322, variable-length coded large-region representative motion vector 328, and fixed-length coded small-region motion vector 329. A variable-length decoder 306, an inverse quantizer 307, and an inverse DCT circuit 308 subject the coded prediction error signal 322 to the processes opposite to those at the variable-length encoder 106, quantizer 105, and DCT circuit 104 which are shown in FIG. 9, and produce a prediction error signal 324.
An adder 309 adds the prediction error signal 324 to the prediction signal 326 from the motion compensation adaptive prediction circuit 301 and produces a reconstructed video signal 327. The reconstructed video signal 327 is outputted outside the video decoding apparatus and stored in a frame memory 302.
On the other hand, the variable-length coded large-region representative motion vector 328 is decoded at the variable-length decoder 311. The decoded large-region representative motion vector 330, together with the small-region motion vector
329, is inputted to the motion compensation adaptive prediction circuit 301. The motion compensation adaptive prediction circuit 301 performs motion compensation prediction using the large-region representative motion vector 330 and small-region motion vector 329, on the basis of the reconstructed video signal 325 outputted from the frame memory 302 and outputs a prediction signal 326.
FIG. 13 is a block diagram of the motion compensation adaptive prediction circuit 301 of FIG. 12, which contains a code book 411, an adder 412, and a prediction circuit 413. The code book 411 has the same structure as that of the code book 227
and stores candidates for vector-quantized difference motion vectors in the form of code vectors. The code book 411 outputs a difference motion vector 421, a code vector corresponding to the index of the small-region motion vector 328. The difference motion vector 421 is added to the large-region representative motion vector 329 at the adder 412 and thereby a small-region motion vector 422 is decoded. The prediction circuit 413 generates a prediction signal 326 corresponding to the small-region motion vector 422 from the adder 413 on the basis of the reconstructed video signal 325 of the coded frame from the frame memory 302 of FIG. 12.
A supplementary explanation about the present embodiment will be given. With the video decoding apparatus of FIG. 12, when an error has been detected in the input bit stream 321, decoding is done by a different process according to the type of code word into which an error has been introduced. For example, when an error has been detected in the coded prediction error signal 322, the prediction error signal 324 of the block is determined to be 0 and the prediction signal 326 obtained at the motion compensation adaptive prediction circuit 301 using the correctly decoded mode information and motion vector is used as a reconstructed video signal. When the prediction error signal is divided in a hierarchical fashion according to orthogonal transform sequence, it may be subjected to inverse orthogonal transform using only a lower-order sequence than the sequence in which an error has been introduced and the resulting signal may be used as an error signal.
On the other hand, since the small-region motion vector 329, a vector-quantized motion vector, has a high error resilience, even if an error is introduced into the vector, it will be processed in the same manner as when there is no error. When a lot of errors have been introduced in the small-region motion vector 329, the small-region motion vector containing errors will not be used to prevent a significant deterioration of picture quality and decoding is effected by using the representative motion vector in the region. Since the mode information and representative motion vector information are provided with great error protection, these information have less error probability. If an error should get mixed in, however, there is a possibility that the picture quality will deteriorate seriously. For the region of the code and the regions on which the error has an adverse effect due to synchronization failure, the reconstructed video signal 325 of the previous frame is used directly as the reconstructed video signal in those regions. When the correct motion vector has been decoded in the regions around the region in which an error has got mixed, the motion vector in the erroneous region may be estimated from the correctly decoded motion vector and the estimated motion vector may be used in decoding.
It is assumed that the code books 227 and 411 used for vector quantization of difference motion vectors have been learned by an LBG algorithm or an annealing method. The mean squared error of the input signal (in this case, the mean squared error of the difference motion vector) is generally used as a criterion in code book learning. Because the code books 227 and 411 used in the present embodiment are used in motion compensation prediction, it is desirable that learning should be done using the mean squared value or the absolute sum of prediction error as a criterion as described earlier.
Furthermore, learning may be done by a criterion that takes transmission errors in code words into account. In this case, although the performance of vector quantization decreases when there is no error, the quality of the reconstructed picture can be increased when an error has occurred.
Additionally, what is obtained by rearranging indexes after code book learning, taking error resilience into account, may be used as a code book. This equalizes the error sensitivity of each bit in the motion vector index code to prevent a large error from occurring when an error of a small number of bits, such as one bit, has occurred in an index code. Learning is performed so that an amount of increase of the motion compensation prediction error due to the injured motion vector index becomes the smallest. Of all of the combinations of motion vector code words and index code words, the one for which the increase amount is the smallest can be selected. When the code book is large, however, the number of combinations is very large, making the volume of calculations associated with learning enormous. Therefore, learning may be done using an annealing method. The volume of calculations can be reduced by taking the assumption that the number of error bits in a single code word is less than or equal to one.
The difference motion vectors may be vector-quantized for a group of small regions. For example, the different motion vectors in all of the small regions contained in a single large region may be vector-quantized in unison. This increases the coding efficiency more.
When the vector-quantized difference motion vector is coded, a flag may be added according to the frequency of appearance and another coding method may be used. For example, the difference motion vector has a high frequency of appearance of 0
and the probability that 0 will be selected from the code book as a horizontal and vertical difference motion vectors. A flag distinguishing this state from the others is provided. When 0 is selected as both of the horizontal and vertical difference motion vectors, only the flag is coded. In the other cases, a flag indicating that such is not the case is added and the further selected difference motion vector index is coded. By doing this, a difference of 0 occurring at a high frequency can be represented by a short code word, thereby improving the coding efficiency.
In the above embodiment, although neither the intra-frame prediction mode nor Not Coded mode in which the reconstructed video signal of the previous frame is used directly as the reconstructed video signal have been explained, it is possible to switch between Not Coded mode and the motion compensation prediction mode suitably. The mode