United States Patent6161209
MoherDecember 12, 2000

Title

Joint detector for multiple coded digital signals

Abstract

A method for the joint detection of multiple coded digital signals that share the same transmission medium in a manner that causes mutual interference. The method is comprised of two steps that are applied to preliminary estimates of each digital signal, one or more times. The first step is to obtain reliability estimates for each data element of each digital signal by combining the preliminary estimates, a statistical model for the interference, and any a priori information regarding the data elements. The second step is to revise these reliability estimates for each digital signal based on the forward error correction code used for that digital signal. When the steps are repeated, the revised reliability estimates from the second step are used as a priori information for the first step.


Inventors:Moher; Michael I. (Stittsville, CA)
Assignee:Her Majesty the Queen in right of Canada, as represented by the Minister of Industry through the Communications Research Centre (Ottawa, CA)
Appl. No.:827533
Filed:March 28, 1997

Current U.S. Class:714/780 714/786 714/795 375/262 375/341 714/746 
Field of Search:371/30,43.1,43.6,43.7,43.4 375/262,341 714/746,786,794,795,792,780

U.S. Patent Documents
5323418June 1994Ayerst et al.
5446747August 1995Berrou
5506861April 1996Bottomley
5553062September 1996Schiling et al.
5659573August 1997Bruckert et al.
5757767May 1998Zehavi
5757821May 1998Jamal et al.
5761237June 1998Petersen et al.
Other References
Linear Multiuser Detectors for Synchronous Code-Division Multiple-Access Channels Lupus and Verdu, IEEE Transactions on Information Theory, vol. 35 No. 1, Jan. 1989. .
Near-Far Resistance of Multiuser Detectors in Asynchronous Channels Lupus and Verdu, IEEE vol. 38. No. 4, Apr. 1990. .
Multistage Detection in Asynchronous Code-Division Multiple-Access Communications Varanasi and Aazhang, IEEE Transactions of Communications, Nol. 38, No. 4, Apr. 1990. .
Minimum Probability of Error for Asynchronous Gaussian Multiple-Access Channels Verdu, IEEE Transactions Theory, vol. IT-32, No. 1, Jan. 1986. .
A Comparison of Optimal and Sub-Optimal MAP Decoding Algorithms Operating in the Log Domain Robertson and Hoeber, IEEE 1995. .
Very Low Rate Convulutional Codes for Maximum Theoretical Performance of Spread-Spectrum Multiple-Access Channels Viterbi, IEEE Journal on Selected Areas in Communications, vol. 8, No. 4, May 1990. .
Multiuser ML Sequence Estimator for Convolutionally Coded Asynchronous DS-CDMA Systems Giallorenzi and Wilson, IEEE Transactions on Communications, vol. 44, No. 8, Aug. 1996. .
Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate Bahl, Cocke, Jelinek and Raviv, IEEE Transactions of Information Theory, Mar. 1974. .
A Viterbi Algorithm with Soft Decision Outputs and its application Hagenauer and Hocher, IEEE 1989 pp. 1680-1686..~
Primary Examiner: Moise; Emmanuel L.
Attorney, Agent or Firm:Freedman & Associates

Claims


What is claimed is:
1. A method of detecting a plurality of digital signals that are forward error correction encoded and mutually interfere comprising the steps of:
(a) using a detector, detecting the plurality of digital signals and providing detector estimates of a first digital signal and second other digital signal from the plurality of digital signals;
b) (i) using a processor receiving the detector estimates and calculating a reliability estimate for each data element of first digital signal from the plurality of digital signals, the reliability estimate calculated from detector estimates of those data elements, a model of the interference, and a priori information determined in previous iterations, if any, concerning those data elements;
(ii) using a processor, calculating a reliability estimate for each data element of a second other digital signal from the plurality of digital signals, the reliability estimate calculated from detector estimates of those data elements, a model of the interference, and a priori information determined in previous iterations, if any, concerning those data elements;
c) using the processor, calculating a revised reliability estimate for each data element in dependence upon the reliability estimates from the step (b) and the properties of the forward error correction code for the corresponding digital signal; and
d) repeating the previous two steps, one or more times, using the revised reliability estimates provided by step (c) as a priori information for the step (b).

2. A method as defined claim 1 wherein during the first step (a), the processor uses only a subset of data when calculating the reliability estimates.

3. A method as defined in claim 1 wherein during the second step (b), the processor uses only a subset of the data when calculating the revised reliability estimates.

4. The method as defined in claim 1 wherein the first and second steps (a) and (b) provide reliability estimates for a subset of the K digital signals.

5. A method as defined in claim 1, wherein step (b) comprises the step of using a soft-output decoder, performing soft-output decoding.

6. A method as defined in claim 5 where the step of soft-output decoding is implemented for a plurality of digital signals using a single soft-output decoder.

7. A method as defined in claim 5 wherein the step of soft-output decoding is applied to a subset of K digital signals.

8. A method as defined in claim 5 where the step of soft-output decoding is implemented in parallel for each of a plurality of digital signals.

9. A method as defined in claim 1 comprising the step of:
receiving a plurality of substantially similar digital signals from a plurality of receivers;
wherein the detector detects data elements within at least two of the received digital signals and provides the preliminary estimates of those signals; and,
wherein the steps (a)(i) and (ii) are performed in dependence upon the plurality of substantially similar digital signals from a plurality of receivers.

10. The method as defined in claim 1 including the step of outputting information content of one or more of the digital signals.

11. A method of detecting a plurality of digital signals that are forward error correction encoded and mutually interfere comprising the steps of:
a) providing preliminary estimates of the plurality of detected digital signals to a processor;
b) using the processor, calculating a reliability estimate for each data element of each digital signal from preliminary estimates of those data elements, a model of the interference, and a priori information, if any, concerning those data elements;
c) using the processor, calculating a revised reliability estimate for each data element in dependence upon the reliability estimates from the step (b) and the properties of the forward error correction code for the corresponding digital signal; and,
d) providing corrected estimates of each of the plurality of digital signals, the corrected estimates corrected from the preliminary estimates based on the calculated and revised reliability estimates.

12. A method of detecting as defined in claim 11 comprising the step of repeating steps (b) and (c) one or more times, using the revised reliability estimates provided by the step (c) as a priori information for step (b).

13. A method as defined in claim 12, wherein step (c) comprises the step of soft-output decoding.

14. The method as defined in claim 12 including the step of using a detector, detecting the plurality of digital signals and providing detector estimates of a first digital signal and second other digital signal from the plurality of digital signals.

15. A system for detecting a plurality of digital signals that are forward error correction encoded and mutually interfere, given preliminary estimates of those signals, comprising:
a processor having an input and an output, the processor comprising:
means for calculating a reliability estimate for each data element of at least two different digital signal from the plurality of digital signals in dependence upon the preliminary estimates of those data elements, a statistical model of the interference, and a priori information, if any, concerning those data elements; and,
means for calculating a revised reliability estimate for each data element based on the reliability estimates calculated and the properties of the forward error correction code for the corresponding digital signal; and,
means for providing corrected estimates of the data elements of each of the first and second digital signals, the corrected estimates corrected based on the calculated and revised reliability estimates.

16. A system of detecting a plurality of digital signals as defined in claim 15, including a suitably programmed processing means for performing said calculations.

17. A system as defined in claim 16 including feed back means for providing feedback from the output to the input.

18. A system as defined in claim 15 including output means for outputting information content of one or more of the digital signals.

19. A system as defined in claim 15 comprising:
a plurality of transmitters for transmitting data signals via a common communications channel;
a model of mutual interference between signals transmitted from the transmitters from the plurality of transmitters; and,
a plurality of detectors for detecting mutually interfering digital data signals and for providing the detector estimates of those signals to the processor.

Description

FIELD OF INVENTION

This invention relates to the joint detection of multiple digital signals that are forward error correction coded and share the same transmission medium in a manner that causes mutual interference. More specifically, the present invention relates to a novel method for detection that allows the permissible interference to be increased and bandwidth to be conserved.

BACKGROUND OF THE INVENTION

In order to maximize the number of signals that can share a transmission medium, the frequency spectrum is re-used in a variety of ways. The traditional approach is to physically isolate communications signals of the same frequency in order to reduce their mutual interference to acceptable levels. Less traditional approaches use spread spectrum techniques to average the effects of interference over a bandwidth significantly greater than the information bandwidth. In both of these cases, interference will exist to some extent and, in some cases, can significantly reduce the system capacity, i.e., the information/unit time/unit bandwidth.

To increase the capacity, joint detection schemes have been proposed that take into account the effect of the interference between the different signals and perform interference cancellation. Examples of such schemes are found in

S.Verdu, "Minimum probability of error for asynchronous Gaussian multiple-access channels," IEEE Trans. Inf. Th., vol. 32, No. 1, pp. 85-96, January 1986;

R.Lupas and S. Verdu, "Linear multiuser detectors for synchronous code-division multiple-access channels," IEEE Trans. Inf. Th., vol. 35, No. 1, pp.123-136, January 1989;

R. Lupas and S. Verdu, "Near-far resistance of multiuser detectors in asynchronous channels," IEEE Trans. Comm., vol. 38, No.4, pp.496-508, April 1990;

M. K. Varanasi and B. Aazhang, "Multistage detection in asynchronous code-division multiple access communications," IEEE Trans. Comm., vol. 38, No.4, pp.509-519, April 1990;

D. L. Ayerst et al, U.S. Pat. No.: 5323418, 1994;

D. L. Schilling et al, U.S. Pat. No.: 5553062, 1996;

and include techniques such as applying linear transformations to the received samples to decorrelate the interference, and techniques such as estimating the strongest user first, subtracting it from the received signal and repeating it for the next strongest signal, etc. These techniques work well if the interference does not overwhelm the desired signal at any stage in the processing. Because of the latter constraint, these techniques have generally only been considered for spread spectrum signals. The aforementioned joint detection schemes do not take into account any forward error correction coding of the signals.

To achieve the theoretically optimum capacity when multiple signals share the same transmission medium requires the use of forward error correction coding as is described by T. M. Cover and J. A. Thomas, in Elements of Information Theory, New York: Wiley, 1991. Pedagogical techniques for achieving the theoretical capacity suggest applying a different code to each user at the transmitter and, at the receiver, estimating the digital signal with the largest signal to noise ratio (or the strongest code), subtracting its effect and then repeating for the next digital signal; very similar to the techniques which have been proposed for uncoded systems. These techniques require powerful codes that do not lead to a practical implementation. Such a technique, that is "almost practical", has been presented in the literature, for example by A. J. Viterbi, in a paper entitled "Very low rate convolutional codes for maximum theoretical performance of spread-spectrum multiple-access channels", IEEE J. Sel. Areas Comm., vol. 8, no.4, pp.641-649, May 1990, but it has the drawback that it treats the digital signals asymmetrically and requires some co-ordination between transmitters. An alternative approach to joint detection of multiple coded digital signals that is known to be optimum in a maximum likelihood sense for certain types of forward error correction codes, is described by T. R. Giallorenzi and S. G. Wilson, in a paper entitled "Multiuser ML sequence estimator for convolutional coded asynchronous DS-CDMA systems," IEEE Trans. Comm., vol. 44, No. 8, pp.997-1008, August 1996. This latter technique is a Viterbi-like algorithm that has a complexity, which is exponential in both the code memory and the number of digital signals, making it impractical for implementation. There are other approaches to the joint detection of multiple coded signals that are obvious to those practicing the art. These include approaches such as cascading a joint detector for multiple uncoded signals with standard decoding algorithm. These approaches however, are fundamentally limited by the performance of the joint detector for multiple uncoded signals.

Examples of schemes related to and for obtaining reliability estimates from the preliminary estimates are found in H. L. van Trees, Detection, Estimation and Modulation Theory:Part I, New York: Wiley, 1968.

Examples of schemes related to soft-output decoding are found in

L. R. Bahl et al, "Optimal decoding of linear codes for minimizing symbol error rate," IEEE Trans. Inf. Th., vol.20, pp284-287, March 1974;

G. Battail, "Coding for the Gaussian channel: the promise of weighted output decoding," Int. J. Sat. Comm., vol.7, pp.183-192, 1989;

J. Hagenauer and P. Hoeher, "A Viterbi algorithm with soft decision outputs and its applications," Proc. IEEE Globecom, pp.47.1.1-47.1.7, November 1989;

P. Robertson et al, "A comparison of optimal and suboptimal MAP decoding algorithms operating in the log domain," Proc. ICC, pp.1009-1013, Seattle, June 1995.

SUMMARY OF THE INVENTION

It is an object of this invention is to reduce the effects of the interference between a multiplicity of coded digital signals sharing the same transmission medium so as to permit greater interference and conserve bandwidth.

It is a further object to provide a method that does not require asymmetrical treatment of the digital signals.

It is a further object of the invention to provide a method that has a practical implementation.

The present invention provides an iterative method for reliably estimating multiple coded digital signals that share the same medium causing mutual interference. The digital signals, in general, come from different sources but they need not. The method is comprised of two steps that are applied to preliminary estimates of each digital signal, one or more times. There are various known methods of obtaining these preliminary estimates. In many cases, the best approach is to detect each digital signal as if it were the only digital signal present in the transmission medium. The performance of the present invention will depend upon the quality of these preliminary estimates. With regard to the approach taken to obtain these preliminary estimates, the present invention relies only on a statistical model of the interference between these preliminary estimates.

The first step of the method is to provide reliability estimates for each data element of each digital signal using the preliminary estimates, a statistical model for the interference, and any a priori information regarding the data elements. This interference model corresponds to the statistical distribution of the preliminary estimates assuming the transmitted data is known. The reliability estimate is defined as the conditional probability of the data elements given the interference model, the preliminary estimates, and the a priori information. On the first iteration there is often no a priori information and the reliability estimates are based on the preliminary estimate and the interference model. For binary data elements, the resulting reliability estimate is often expressed as the probability that the data element is a "0" or a "1". Properties of the forward error correction coding are not used in this step.

The second step of the method is to revise these reliability estimates for each digital signal based on the forward error correction code used for that digital signal. In the literature, decoders that revise the symbol reliabilities are often called soft-output decoders. The revised probability estimates are the conditional probabilities of the data elements given the reliabilities of all the data elements for that digital signal, and the relationships between them, as defined by the forward error correction code. In this step, the digital signals are independently decoded. This results in a significant computational saving over joint decoding.

The subsequent iterations use the revised reliability estimates for each data element of each digital signal obtained from the second step as a priori information for the first step. This improves the performance of the latter, which in turn can be used to improve the performance of the second step. On the last iteration, the decoders of the second step are configured to produce reliability estimates or hard decisions corresponding to the information elements of those digital signals of interest. The information elements may or may not be explicitly contained in the data elements of each digital signal, but they may be always be estimated through the knowledge of the forward error correction code.

In accordance with the invention, a method is provided of detecting a plurality of digital signals that are forward error correction encoded and mutually interfere. The method comprises the steps of:

a) obtaining preliminary estimates of the plurality of digital signals;

b) calculating a reliability estimate for each data element of each digital signal from preliminary estimates of those data elements, a statistical model of the interference, and a priori information, if any, concerning those data elements;

c) calculating a revised reliability estimate for each data element based on the reliability estimates from the first step and the properties of the forward error correction code for the corresponding digital signal; and

d) repeating the previous two steps, one or more times, using the revised reliability estimates provided by step (c) as a priori information for step (b).

In accordance to another aspect of the invention, a system is provided of detecting a plurality of digital signals that are forward error correction encoded and mutually interfere, given preliminary estimates of those signals, comprising: means for calculating a reliability estimate for each data element of each digital signal in dependence upon the preliminary estimates of those data elements, a statistical model of the interference, and a priori information, if any, concerning those data elements; and, for calculating a revised reliability estimate for each data element based on the reliability estimates calculated and the properties of the forward error correction code for the corresponding digital signal.

The present invention can be applied to digital signals in any shared transmission medium, or in distinct transmission media where there is crosstalk between the media.

BRIEF DESCRIPTION OF THE DRAWINGS

Exemplary embodiments of the invention will now be described in conjunction with the drawings, in which:

FIG. 1 is schematic block diagram of a communication system to which the present invention can be applied;

FIG. 2 is a schematic block diagram of an iterative joint detector for multiple decoded signals;

FIG. 3 is flow chart of the steps required for obtaining reliability estimates;

FIG. 4 is a graph illustrating performance of present invention in a synchronous Gaussian channel for the case of five digital signals with a pairwise correlation of 0.75;

FIG. 5 is a block diagram illustrating a serial implementation of this invention;

FIG. 6 is a block diagram illustrating an alternative serial implementation to that shown in FIG. 5;

FIG. 7 is a block diagram illustrating the use of multiple preliminary estimates, in accordance with this invention;

FIG. 8 is a block diagram illustrating a feedback arrangement for implementing each iteration of the method;

FIG. 9 is a simplified diagram of a system where x=x.sub.[1,J] is the information (data) to be transmitted and x are the corresponding data estimates;

FIG. 10 is a diagram of a trellis structure for four-states;

FIG. 11 represents a comparison between bit error rate performance of two algorithms, BCJR and MLSE, in additive white Gaussian noise for two convolutional codes;

FIG. 12 is a diagram comparing the BER performance of SPC component codes and uncoded performance in additive white Gaussian noise;

FIG. 13 is an informative to plot the same results as a function of the energy per channel bit to noise density ratio, E.sub.s /N.sub.0, that is uncompensated for the code rate;

FIG. 14 is an illustration of parallel concatenated coding;

FIG. 15 shows a typical two-dimensional product code;

FIG. 16 is a block diagram of an approach applied to a two-dimensional product code shown in FIG. 15 where each row and each column of this code is a codeword in the (n,k) component block code;

FIG. 17 is a diagram of performance of an iterative decoding strategy;

FIG. 18 is a diagram illustrating performance using various decoding strategies;

FIG. 19 illustrates that Turbo coding is a parallel coding scheme;

FIG. 20 is an illustration of the original Turbo component code;

FIG. 21 is a block diagram of a Turbo Code decoder structure;

FIG. 22 is a diagram illustrating performance of Turbo Codes,

FIG. 23 is an illustration of iterative search between different constraint sets;

FIG. 24 is an illustration of Turbo detection algorithm showing difference operations and the interleaver (I) and de-interleaver (D) operations;

FIG. 25 is an illustration of extrinsic form of the Turbo detection algorithm;

FIG. 26 is an illustration of a geometric interpretation of the modified MCE algorithm:

FIG. 27 is an illustration of structure of multiuser communication system;

FIG. 28 is an illustration of a finite-state machine representation of multi-user channel with Viterbi detection;

FIG. 29 is a graph of asymptotic efficiencies for various detectors with two users having a correlation .rho.=0.5;

FIG. 30 is a graph of asymptotic efficiency for two equal-power users as a function of user cross-correlation (.rho.) for various synchronous detectors;

FIG. 31 is a graph of BER performance of optimum, decorrelating, and conventional multi-user detectors over Gaussian channel with K=2 users (.rho.=0.33);

FIG. 32 is a graph of BER performance of optimum, decorrelating, and conventional multi-user detectors over Gaussian channel with K=7 users (.rho.=1/7).

FIG. 33 is a block diagram of limiting asynchronous decorrelating receiver;

FIG. 34 is a graph of asymptotic efficiencies for various detectors with two asynchronous users with correlations .rho..sub.12 =.rho..sub.21 =0.33;

FIG. 35 is a graph of asymptotic efficiency of asynchronous optimum, decorrelating and conventional detectors over Gaussian channel as a function of cross-correlation (.rho..sub.12 +.rho..sub.21 =.rho.=0.75);

FIG. 36 is an illustration of Varanasi-Aazhang multistage interference cancellation algorithm;

FIG. 37 is an illustration of multi-user array processing algorithm;

FIG. 38 is a diagram of theoretical bound on single user capacity for a Gaussian multiple access channel with K users having the same rate and SNR;

FIG. 39 is a diagram of asymptotic efficiencies of conventional, decorrelating, and optimum detectors as a function of the number of users (K) and the cross-correlation parameter (.rho.);

FIG. 40 is a diagram of BER performance of three different detectors over a K-symmetric channel with K=5 users and a cross-correlation parameter .rho.=0.25;

FIG. 41 is a diagram of BER performance of decorrelating and optimal detectors over a K-symmetric channel with K=5 users and a cross-correlation parameter .rho.=0.75;

FIG. 42 is an illustration of conventional and decorrelating detector structures for multiuser signals with forward error correction coding;

FIG. 43 is a plot of the BER performance of conventional detector with rate 1/2 constraint length 7 convolutional code ([1011011 1111001]) for different numbers of users (K) and cross-correlation parameter .rho.=0.25;

FIG. 44 is a plot of the BER performance of decorrelating detector with rate 1/2 convolutional code for different numbers of users (K) and cross-correlation parameters (.rho.). (Code generators: [10011 11101]);

FIG. 45 is an illustration of iterative multiuser detection algorithm;

FIG. 46 is a graphical comparison of BER performance of iterative detector with two users and .rho.=0.75 to ideal single user performance;

FIG. 47 is a graphical comparison of BER performance of iterative detector with two users and .rho.=0.90 to ideal single user performance;

FIG. 48 is a graphical comparison of BER performance of iterative detector with five users and .rho.=0.60 (1,2,3,4 and 5 iterations) to ideal single user performance;

FIG. 49 is a graphical comparison of BER performance of iterative detector with five users and .rho.=0.75 (1,2,4 and 8 iterations) to ideal single user performance;

FIG. 50 is a graphical comparison of BER performance of iterative detector with five users and .rho.=0.90 (1,2,4, 8 and 16 iterations) to ideal single user performance;

FIG. 51 is a graphical comparison of BER performance of iterative detector with ten users and .rho.=0.30 (1,2,4 and 8 iterations) to ideal single user performance;

FIG. 52 is a graphical comparison of BER performance of iterative detector with ten users and .rho.=0.60 (1,2,4 and 8 iterations) to ideal single user performance;

FIG. 53 is a diagram illustrating dependence of detector performance on interleaver size for cross-correlations of .rho.=0.75 and 0.90 where results are for five users and rate 1/2 constraint length 5 convolutional code and eight iterations of the joint detector;

FIG. 54 is a graphical illustration of BER performance of multiuser detector with rate 1/2 constraint length 4 convolutional code over K-symmetric channel with .rho.=0.75 and K=5 users. Interleaver contains 500 information bits:

FIG. 55 is a graphical illustration of BER performance of multiuser detector with rate 1/2 constraint length 7 convolutional code over K-symmetric channel with .rho.=0.90 and K=5 users. Interleaver contains 500 information bits;

FIG. 56 is an illustration of co-channel interference with multi-beam satellites;

FIG. 57 is a simplified diagram of a process structure for multiuser detection with spatial diversity;

FIG. 58 is a diagram of asymptotic efficiency of different detectors over K-symmetric diversity channel for 2, 5, and 10 users;

FIG. 59 is an illustration of relationship between crosstalk parameter (.psi.) and the user correlation parameter (.rho.);

FIG. 60 is a graphical comparison of BER performance of iterative detector with two users and .rho.=0.90 to ideal single user performance;

FIG. 61 is a graphical comparison of BER performance of iterative detector with two users and .rho.=1.0 to ideal single user performance;

FIG. 62 is a graphical comparison of BER performance of iterative detector with five users and .rho.=0.75 to ideal single user performance;

FIG. 63 is a graphical comparison of BER performance of iterative detector with five users and .rho.=0.90 to ideal single user performance; and,

FIG. 64 is a graphical comparison of BER performance of iterative detector with five users and .rho.=1.0 to ideal single user performance.

FIG. 65 is a simplified diagram of theoretical capacity of K-symmetric channel with diversity for different correlation values and five users, in bits per channel use per user.

DETAILED DESCRIPTION

The present invention is a method of processing the received signal samples obtained when multiple coded signals share the same transmission medium. An example of a communications system to which this invention can be applied is illustrated in FIG. 1. In this example, each of the K independent data sequences {b.sub.k (i): k=1 . . . K, i=1, . . . } are modulated with a signaling waveform to produce a digital signal. These signaling waveforms may include filtering, frequency translations, spreading codes, etc. The signaling waveforms need not he unique. These signals then enter the transmission medium and may suffer corresponding delays and attenuation, and be degraded by noise. In this example, the communications receiver has K parallel subreceivers, one for each digital signal. These subreceivers provide preliminary estimates of the data elements of each digital signal. These preliminary estimates are often called soft decisions in the technical literature. In many cases, the best subreceiver is one that is matched to the signaling waveform for the corresponding digital signal, ignoring the presence of the other digital signals. This matching refers not only to the transmitted signaling waveform but also any delay, frequency translation, or phase rotation that may have been incidentally applied to the signal after transmission. The present invention also applies to non-optimum subreceivers. The output of these matched detectors is then sampled, once per data element period, to produce a soft decision for each data element in the corresponding digital signal.

The present invention is a method of processing the preliminary estimates provided by these K subreceivers to reduce the effects of interference. An exemplary arrangement for the processing performed in the present invention is shown in FIG. 2. In the simplest embodiment for this invention, the digital signals have the same signaling rate and are synchronous. Let b(i) be the vector of K symbols, one from each digital signal, with a common symbol time i, and let y(i) be the corresponding vector of K preliminary estimates from the K subreceivers. The statistical model for the interference, in this case, is the conditional distribution of y(i) given the transmitted data b(i). If the noise is Gaussian then, in this case, the conditional distribution of y(i) given b(i) is multivariate Gaussian for the symbol time i and independent from one symbol time to the next.

The first step of the invention requires a means for estimating reliability of the data elements of each of the digital signals. The conventional approach is to use Bayes' rule for conditional probability, one then computes the reliability estimate (conditional probability) of each data element of each digital signal that is based only on the vector of preliminary estimates y(i), the statistical model for the interference, and the a priori information regarding those data elements. Mathematically, joint reliability estimates for the K digital signals is given by ##EQU1##

The reliability estimates for the individual digital signals are given by the corresponding marginal distributions. This constitutes the first step. Exemplary steps for determining these reliability estimators are shown in FIG. 3.

In the second step of the method, each digital signal is considered independent of the others. As shown in FIG. 2, this can be implemented as K parallel decoders. For each digital signal, the reliability estimates provided by the first step are revised based on the known relationships between data elements. These known relationships are due to the forward error correction encoding. When the data sequences are finite, a preferred means for soft-output decoding is described by L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, in a paper entitled "Optimal decoding of linear codes for minimizing symbol error rate," IEEE Trans. Inf. Th., vol. 20, pp.284-287, March 1974.

In subsequent iterations, the revised reliability estimates provided by the second step are used as a priori probabilities in the first step. In the preferred embodiment, the revised reliability estimates are treated as independent of one another; and in the preferred embodiment, not all of the revised reliability estimates provided by the second step first stage are used in each first step calculation. In particular, the first step reliability estimates for a particular digital signal only use the a priori information for those digital signals other than the one of interest.

An example of the bit error rate performance obtained with this method for the case of a synchronous Gaussian channel with five independent pseudo-randomly interleaved digital signals, when the cross-correlation between each pair of the signaling waveforms (a measure of the interference) is 0.75, is shown in FIG. 4 for 1, 2, 4, and 8 iterations. Also shown in FIG. 4 is the performance obtained when there is no interference between the users (.rho.=0).

Investigations have shown that performance improves if each of the K digital signals is pseudo-randomly interleaved relative to one another at the transmitter after forward error correction encoding. In the description of the present invention, the interleaving is considered part of the forward error correction code. However, the approach can be applied with any type of interleaving, or even with no interleaving.

The present invention does not require that the digital signals are synchronous. However, it is recommended that the interference model for the preliminary estimates include the effects of any asynchronism. The complexity of the present invention depends in part on the complexity of this interference model. There is the possibility of reducing the complexity by appropriate design of the K subreceivers. If the digital signals are not only asynchronous but also have different signaling rates then to optimize performance may require oversampling of the received signal, and constructing a corresponding interference model. Oversampling is defined as sampling at a rate higher than the transmission rate of the data elements.

The present invention does not require that all data sequences use the same forward error correction code. The use of different error correction codes will only affect the second step of the method. If the digital signals are asynchronous or the data sequences have different lengths then the direct implementation of the soft decoding method of L. R. Bahl et al, may not be appropriate. Alternatively, in this case and others where the sequence length is an issue, the soft decoding techniques can be applied to a series of overlapping blocks where the block size is less than the sequence length. In addition to L. R. Bahl et al. there are alternative soft decoding techniques as presented by J. Hagenaur and P. Hoeher, in a paper entitled "A Viterbi algorithm with soft-decision outputs and its applications", Proc. IEEE Globecoin'89, pp.47.1.1-47.1.7, November, 1989. [and by P. Robertson et al, in a paper entitled "A comparison of optimal and sub-optimal MAP decoding algorithms operating in the log domain," Proc. ICC'95, pp.1009-1013, June 1995, that can also be used in the second step of the method.

The digital signals are not required to have the same modulation format. There are no particular issues associated with different modulation formats except to note that in the exemplary communications shown in FIG. 1, the sampling will correspond to sampling both the in phase and quadrature components of a digital signal with some modulation formats. With a binary modulation format, only the reliability of the data element "1" or a "0", but not both, needs to be stored; while with a M-ary modulation formats, at least M-1 or M reliability values should be stored corresponding to the M possible values for each data element.

The invention also applies when the digital signals have the same, different, or even time-varying power levels. The latter may be due to different propagation losses, fading and multi-path. For the best performance the available knowledge concerning the power levels and time-variations, whether it be deterministic or statistical, should be included in the interference model.

Complexity may often be an issue in either the first or the second step of the method. Sometimes, in the first step, simplifications are made to the interference model to reduce this complexity. These simplifications often result from a consideration of a subset of the available data in the model. For example, for each digital signal, the interference model may only consider the interference from the two strongest interferers and ignore the effects of the other signals. Similarly, in addition to the block processing approach mentioned above, in the second step simplifications can often be made to the decoding technique to reduce the complexity. These simplifications often result from considering only a subset of the available data. For example, some simplified decoding techniques only consider the most probable sequences (paths) at each step and ignore the less probable ones. In practice, there is usually a tradeoff between complexity and performance.

The parallel implementation shown in FIG. 2 is one implementation of this invention; however, the invention can also be implemented serially. In particular, one need not decode all K signals in the second step. It is only necessary to decode a subset of the digital signals, containing at least one signal, in the second step before repeating the first step. This approach may be appropriate when a subset of the signals has significantly greater power, and it is desirable to characterize their effect accurately first. Such an approach affects the convergence time of the algorithm.

Referring now to FIG. 8, the invention does not need to be implemented with distinct hardware or software for each iteration of the method as may be suggested by FIG. 2.

One can also use the same hardware or software in a feedback arrangement, as is more obviously suggested by FIG. 8, for implementing each iteration of the method, wherein the revised reliability estimates are fed back to the initial means for estimating reliability, and the two steps of the algorithm are repeated. This feedback implementation can also be applied to all embodiments of the method. In practice, the approach illustrated in FIG. 8 may require some buffering of the incoming preliminary estimates.

In FIG. 5, a serial implementation is shown that decodes only one signal at the second step, selecting a different one of the K digital signals for each second step. According to the invention, any number from one to all of the K digital signals is decoded on execution of the second step. The signals need not be decoded in any particular order, nor do all signals have to be decoded an equal number of times. An alternative serial implementation is shown in FIG. 6. In this case, two decodings are performed with each second step but the decoded digital signals are not distinct on subsequent decodings. In FIGS. 2, 5, and 6, the phrase "Means for soft-output decoding k" indicates a means for soft-output decoding of digital signal k.

Note that any existing interference cancellation method for uncoded signals can be applied, prior to this invention, to provide the preliminary estimates. The only requirement is that the interference model applies to the preliminary estimates after the initial interference cancellation, if any, is done.

The invention is also applicable when there are multiple preliminary estimates of the signal such as may occur when one has a number of distinct receivers. This is known in the literature as diversity reception. There are a number of ways to use the multiple preliminary estimates according to the invention. The simplest approach is using a means of combining the multiple estimates into a single estimate prior to this invention. This is illustrated in FIG. 6.

There are many methods in the literature for combining a plurality of estimates of the same signal or set of signals. Examples of such combining schemes can be found in W. C. Jakes (ed.), Microwave Mobile Communications, (1974) reprinted by New York: IEEE Press, 1993 .As shown in FIG. 7, the method remains unchanged in this application. An alternative to the approach shown in FIG. 7 is to include the multiple preliminary estimates in the interference model. In this case, the method remains the same although the means for calculating the reliability estimates may change.

An exemplary embodiment is presented below in detail. This is merely an example and is not intended to limit the scope of the above-described invention. In the embodiment described below, the following abbreviations, when used, have the associated meanings:

______________________________________ (.).sup.H Hermitian transpose (.).sup.T transpose .left brkt-top..alpha..right brkt-top. smallest integer equal to or larger than a .left brkt-bot..alpha..right brkt-bot. largest integer less than or equal to a (.sub.m .sup.n) number of combinations of n objects taken m at a time .vertline...vertline. absolute value of a scalar, determinant of a matrix, or cardinality of a set .parallel.a.parallel. L.sup.1 norm of a positive vector: .parallel..alpha..parallel. = .SIGMA..alpha.(m) .alpha.m(t) forward state probability for state m at time t .beta.m(t) backward state probability for state m at time t .gamma. signal to noise ratio .gamma.m,m'(t) probability of transition from state m to m' at time t .GAMMA.(t) the matrix of .gamma..sub.m,m' (t) m(t) the state probability for state m at time t .LAMBDA.(x) sum of intrinsic and extrinsic information for x .LAMBDA.(x) extrinsic information for x .rho. cross-correlation between user modulating waveforms .psi. crosstalk between users .sigma..sup.2 noise variance .zeta. noisy parity bits A.sub.C (X,Z) weight enumerating function for the code .right brkt-top.C.right brkt-bot. with systematic bits X and parity bits Z b the codeword bits b.epsilon..right brkt-top.B.right brkt-bot.; K-dimensional codeword bits b.epsilon..right brkt-top.B.right brkt-bot..sup.K .right brkt-top.B.right brkt-bot. space of binary N-tuples : {-1,+1}.sup.N .right brkt-top.S.right brkt-bot.(.) theoretical channel capacity d.sub.free minimum distance of a forward error correction code D.sub.g diversity gain E[X] expected value of random variable x E.sub.b /N.sub.O energy per information bit to noise density ratio E.sub.s /N.sub.O energy per channel bit to noise density ratio H cross-correlation matrix for multiple users H(x) entropy of the random variable x H[p,q] cross-entropy of the two probability distributions p and q .right brkt-top.I.right brkt-bot.(X;Y) the mutual information between random variables X and Y I.sub.C (X) indicator function for the variable x in the set C J the number of information bits in a codeword L(x) log-likelihood ratio for the binary random variable x K number of simultaneous users of a multiple access channel M number of states in a trellis; number of constraint subsets N the number of channel bits in a codeword P.sub.e probability of bit error Pr[X] probability distribution function for the random variable x p[X] probability density function for the random variable x Q(X) Gaussian error function: Q(x) = 1/2 erfc(.sqroot.x) .right brkt-top.R.right brkt-bot. the real numbers s(t) state of a trellis at time S(b,t) multiuser waveform at time t for K-user channel sequence b T the number of symbols in a codeword; the symbol period u noisy systematic bits in received codeword v the transmitted codeword - symbol representation .right brkt-top.V.right brkt-bot. the symbol alphabet w the noisy received codeword - symbol representation W diagonal matrix of multiuser channel gains x the systematic bits of a codeword x.sub.[1,N] the sequence {x(1), x(2),...,x(N)} y the noisy received codeword - bit representation z the parity bits of a systematic codeword .right brkt-top.Z.left brkt-top. the integers AWGN additive white Gaussian noise BCJR Bahl, Cocke, Jelinek, and Raviv BER bit error rate BSC binary symmetric channel CDMA code division multiple access FDMA frequency division multiple access FEC forward error correction HD hard decision LSB least significant bit MAP maximum a posteriori MCE minimum cross-entropy MLSE maximum likelihood sequence estimation MUD multiuser detection SNR signal to noise ratio SPC single parity check WEF weight enumerating function ______________________________________

A trellis structure is a requirement of both the BCJR and the Viterbi algorithm still providing a rich collection of codes upon which to draw. A simplified system, such as that shown in FIG. 9 where x=x.sub.[1,J] is the information (data) to be transmitted and x are the corresponding data estimates, is described. The transmitted codeword is represented by b=b.sub.[1,N]. For simplicity bipolar transmission will be assumed, that is b.di-elect cons.={-1,+1}.sup.N. The transmitted bits will be grouped together into T symbols, v.sub.[1,T]. This grouping arises naturally from the code structure. For example, with a rate 1/2 code, T is N/2. The noisy received bits will be denoted by y=y.sub.[1,N] and the corresponding received symbols by w.sub.[1,T]. The detection algorithm provides an estimate x of the data based on the sequence y (or w) and knowledge of the FEC code. Initially, the channel is assumed to be the additive white Gaussian noise channel.

The channel encoder is assumed to have a trellis structure such as that illustrated in FIG. 10 for four-states. It is assumed that the trellis starts and ends in the zero state, and that the appropriate null bits are added to the information stream to insure proper termination of the trellis. In general, it is assumed that the block lengths are long enough to make the effect of truncation on the code properties negligible.

The number of trellis transitions, T, corresponding to the N codebits is a function of the encoding strategy. Its relationship to the index of the channel bits or the information bits will depend upon the code rate and how the trellis is truncated. Each channel symbol, v(t), corresponds to a state transition from s(t-1) to s(t). For example with a rate 2/3 binary convolutional code, each channel symbol will have three bits and each pair of information bits will correspond to a state transition until one reaches the termination stage. In the termination stage, "0" information bits are appended to the input and the corresponding channel symbols are generated until the trellis returns to the all-zero state.

With MLSE, the objective is to determine that x which maximizes the conditional probability Pr[x.vertline.y]. For the case of a memoryless AWGN channel, this conditional probability is a product of independent Gaussian distributions, and this maximization algorithm can be solved by the Viterbi algorithm [Pro89]. The Viterbi algorithm is a maximum-likelihood decoder which minimizes the probability of codeword or sequence error. It does not, necessarily, minimize the probability of symbol or bit error.

With MAP detection the objective is to estimate a symbol based on the conditional probability Pr[x(j).vertline.y]. This minimizes the probability of bit error and these probabilities are determinable by means of the BCJR algorithm as outlined in L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, "Optimal decoding of linear codes for minimizing symbol error rate," IEEE Trans. Inf. Th., vol. 20, pp.284-287, March 1974 herein incorporated by reference.

The BCJR algorithm is an optimal decoder which minimizes the symbol error probability for linear codes transmitted over a discrete memoryless channel (DMC). The BCJR algorithm can provide the probabilities: Pr[x(j).vertline.y], Pr[b(n).vertline.y], and Pr[v(t).vertline.y] for the information bits, channel bits, and channel symbols, respectively; and minimizes the corresponding probability of error when one makes a decision based on these probabilities. The algorithm includes a backward and forward recursion unlike the Viterbi algorithm, which only contains a forward recursion. Below, a modified but equivalent version of this algorithm is presented. The modifications are two-fold; the first changes the quantities to which the recursion are applied to make them more amenable to analysis and also make them symmetric in the backward and forward direction; the second extends the results to cover the memoryless Gaussian channel.

The fundamental assumption of the BCJR algorithm is that the channel encoding can be modeled as a Markov process. Equivalently, the code can be represented as a trellis, such as illustrated in FIG. 10, where the next state depends only on the current state and the new input bit. The states of the trellis are indexed by the integer in m. m=0,1, . . . ,M-1. The state of the trellis at time t is denoted by s(t), and the output symbol due to the transition from s(t-1) to s(t) is denoted by v(t). A state sequence from time t to t' is denoted by s.sub.t,t', and the corresponding output sequence is v.sub.[t,t'].

Although the ultimate goal is to estimate the probability that a given symbol or information bit was transmitted, it is simpler to first derive the a posteriori probabilities of the states and transitions of the trellis based on the observations y or w. From these results, most other probabilities of interest can be obtained by performing summations over selected subsets of states and/or transitions.

Denote the M-vector of state probabilities at time t based on the set of observations by

where

Then, for a linear rate 1/n convolutional code without feedback, the probability that a "1" was the information bit is (assuming the transmitted information bit is the LSB of the state) ##EQU2## where t(j) is the channel time index corresponding to input bit index j, and LSB(s) is the least significant bit of the state s. For a recursive code, the right hand side of (4) is replaced by a summation over the set of transitions that correspond to a "1" at the input.

Define the forward estimator of state probabilities as the M-vector

and, similarly define the backward estimator of state probabilities, as

These are the estimates of the state probabilities at time t based on the past and future observations, respectively. The important result related to these quantities is the following separability theorem. First, however, define the vector product c=a.b as c(m)=a(m)b(m) for all m, and define the L.sup.1 norm for probability vectors as ##EQU3## Theorem 1.1 (Separability Theorem). The state probabilities at time t are related to the forward and backward estimators by ##EQU4##

One aspect of this theorem is, for a Markov process, the state distribution at time t given the past, is independent of the state distribution at time t given the future. Also, there is a simple way of combining the forward and backward estimates to obtain a complete estimate. It is just the normalized product of the state distributions based on the past and the future. Thus, if there are simple algorithms for obtaining the forward and backward estimates, this would be a solution to the MAP detection problem.

In fact, representing the state transition probability at time t by

and denoting the matrix of these probabilities as

the following recursion theorem for calculating the forward and backward state estimates results.

Theorem 1.2 (Recursion Theorem). The forward and backward estimates can be determined recursively as follows. ##EQU5## and ##EQU6##

Together these two theorems define the BCJR algorithm for MAP detection of a Markov source transmitted over a discrete memoryless channel. In particular, the steps are

i) .alpha.(0) and .beta.(T) are initialized according to the trellis structure. For a trellis beginning and ending in the all zeros state, one has .alpha..sub.0 (0)=1, .alpha..sub.m (0) for all m.noteq.0, and similarly for .beta.(T) .

ii) As y(t) is received, the decoder computes .GAMMA.(t), and then .alpha.(t) using (11). The obtained values of .alpha.(t) are stored for all t and m.

iii) after the complete sequence y.sub.[1,T] has been received, the decoder recursively computes .beta.(t) using (12). When the .beta.(t) have been computed they can be multiplied by the appropriate .alpha.(t) to obtain .lambda.(t) using (8).

The BCJR algorithm is a double recursion, one recursion in the forward direction and one in the reverse. Consequently, it has at least twice the complexity of the Viterbi algorithm in its most general form. It also has larger storage requirements as the results of the forward recursion must be stored to be used with the reverse recursion. Hard decisions are derived from the algorithm on the basis of whether the probability a bit is a "1" is greater than 0.5 but the algorithm's main feature is its soft-output capability.

In FIG. 11. the bit error rate performance of the two algorithms, BCJR and MLSE, in additive white Gaussian noise for two convolutional codes are compared. The two codes had constraint lengths of 5 and 7. respectively. The codes were evaluated over blocks of 400 information bits. The performance of the two codes are identical at high SNR. At low SNR, although it is not clear in the illustration, the BCJR approach has a marginal performance advantage. The BCJR is expected to be better because it is guaranteed to provide the minimum BER performance of all techniques. The MLSE approach on the other hand provides a minimum probability of codeword error.

Single parity check (SPC) codes are among the simplest codes possible. For an information word of J=N-1 bits, a single parity checksum bit is added to form a (N,N-1) code. These codes prove to be interesting component codes for iterative decoding techniques. In FIG. 12, the BER performance of SPC component codes are compared to uncoded performance in additive white Gaussian noise.

These codes perform slightly better than uncoded, as the SNR increases. The performance is shown in FIG. 12 as a function of the energy per information bit E.sub.b. It is informative to plot the same results as a function of the energy per channel bit to noise density ratio, E.sub.s /N.sub.0, that is uncompensated for the code rate. This is shown in FIG. 13. In this case, a monotonic improvement in BER performance with decreasing code rate at all E.sub.s /N.sub.0 ratios is evident. This is illustrative of the MAP detection property of minimum symbol error rate.

All of these codes are systematic. That is, some of the channel bits correspond to information bits. It is the property of MAP detection that, on average, it always improves upon the estimate of the bit provided by the channel. Thus explaining, the monotonic behaviour seen in FIG. 13.

Parallel concatenated coding implies that same information is encoded by two different encoding schemes as illustrated in FIG. 14. In a sense all systematic parity check codes fit this definition, because they could be written as a set a parallel single parity check equations, although generally more powerful component codes are considered.

A typical two-dimensional product code is illustrated in FIG. 15. In this product code, the bits are arranged in a k.times.k square and encoded horizontally using an (n,k) code. The same bits are then encoded vertically using the same (n,k) code. In the example shown in FIG. 15, the vertical encoding also encodes the parity bits of the horizontal encoding although this is not necessarily the case. Prior art implementations involve each dimension being decoded in tu. the check digits are used to correct single errors and then discarded.

While this code does not achieve the Shannon capacity limit, it is one of the first examples of a realizable code of non-zero capacity with an arbitrarily small error rate. It was observed that the rectangular structure of the product code was important to maintain independence of the symbols in the different decoding stages. It is the average error-correction capability that makes transmission at a nonzero rate possible. The relevance of this observation will become important when we discuss the behaviour and performance of Turbo codes. This approach does not achieve capacity for two reasons. First, there is a loss inherent in using hard decisions when decoding. Second, there is the fact that the information in the parity bits for a particular stage is only used for that stage and then discarded.

Therefore, a soft-input, soft-output decoding technique called weighted output decoding was proposed applied to D-dimensional product codes. The development of the soft-output Viterbi algorithm (SOVA), and the rediscovery of the BCJR algorithm further improved soft output decoding. While still concentrating on systematic parallel codes using simple component codes, the next key development was the idea of iterating the decoding process using soft-output algorithms. Lodge et al showed that by using the BCJR algorithm on each of the component codes in a multi-dimensional product code, and repeating the decoding process several times, some of the losses associated with the component-wise decoding were recouped.

A block diagram of this approach is shown in FIG. 16 for the two-dimensional product code shown in FIG. 15 where each row and each column of this code is a codeword in the (n,k) component block code. This is an (n.sup.2,k.sup.2) product code. If the component code has a minimum distance d, then the two-dimensional product code has a minimum distance d.sup.2.

The algorithm applies the BCJR algorithm to the soft decisions received for each row to produce the a posteriori probabilities for each channel bit. These output probabilities are used as an input to the column decoding, again using the BCJR algorithm. This process is then repeated with further row and column decodings. It is the transmitted symbol probabilities that are the important quantities for subsequent decodings. At the last stage the BCJR algorithm is modified to produce the a posteriori probabilities of the information bits. In the case where the code is systematic, no modification is necessary.

With this approach, remarkable performance is possible out of some very simple codes. For example, consider the two dimensional product code where each of the component codes is a single parity check (8,7) code. The overall code rate for this combination is approximately 3/4. The performance of this code using this iterative decoding strategy is illustrated in FIG. 17 for one, two and five iterations. One iteration corresponds to one row plus one column BCJR decoding.

Compared to uncoded performance this simple rate 3/4 code provides 2 dB or more performance gain for BERs less than 10.sup.-3 with a single iteration. A second iteration results in a further gain of about 0.25 dB. The improvement with subsequent iterations is marginal. For single parity check codes, there is a simple two-state trellis for each component code and thus the decoding algorithm has very low complexity.

As a second example, consider the two-dimensional product code constructed from the (7,4) Hamming code. This code has a rate of 0.326 and an asymptotic coding gain of 4.67 dB. Shown in FIG. 18 is the performance with various decoding strategies. All results are for an AWGN channel. The upper curve in this figure is for a decoder that makes hard decisions, decodes each row, and then uses the output of the row decoder to decode each column.

In FIG. 18, one iteration of the MAP algorithm provides greater than 2 dB improvement over uncoded performance at a BER of 10.sup.-3. The gain increases with increasing E.sub.b /N.sub.0. In this case a second iteration results in a further 2.5
dB improvement in performance. As with the single parity check codes, the improvement with subsequent iterations was marginal. This was found to be a general rule, i.e for two-dimensional product codes, two iterations achieved most of the gain.

Also shown in FIG. 18 is the performance with hard decision decoding of the product code. It is interesting to note that one iteration of the hard decision approach is almost exactly 2 dB worse than one iteration of the soft decision approach over the full range shown. With the hard decision approach, further iterations result in only a marginal improvement as shown.

The next significant development in this area was Turbo codes. These rely heavily on the approaches above but also made three insightful improvements. These three improvements are i) the use of recursive convolutional codes as component codes, ii) the use of random interleavers, and iii) the separation of intrinsic and extrinsic information in the decoder.

Turbo coding is a parallel coding scheme as illustrated in FIG. 19. Similar to FIG. 14, the information is systematically encoded by two encoders. However, unlike the ordered interleaving used with the product codes, the information bits are re-ordered using a pseudo-random interleaver with Turbo codes. Typically the same code is used for both encoders, although this is not necessarily the case. The information bits and the parity bits generated by each of the encoders are then transmitted over the channel. In the original Turbo encoding scheme, the parity bits are punctured in a repeating pattern to increase the code rate. Again, this is not necessary.

The component codes that are recommended for Turbo encoding are the short constraint length, systematic, recursive convolutional codes. The original Turbo component code is illustrated in FIG. 20. This is a simple 16-state code. The feedback nature of these component codes means that a single bit error corresponds to an infinite sequence of channel errors.

Although the component codes are convolutional, Turbo codes are inherently block codes with the block size being determined by the size of the interleaver. The block nature also raises a practical problem of how to correctly terminate the trellises of both encoders. This problem is avoided for evaluation purposes by using an all-zeros sequence.

In addition to this novel adaptation of the parallel coding structure, the decoding strategy applied to Turbo codes is also innovative. It is from the structure of the decoding algorithm that the scheme draws its name from an analogy to the turbo engine principle. A block diagram of the decoder structure is shown in FIG. 21. It is illustrated as a single iterated stage but also could be implemented as a number of serial stages.

In FIG. 21, the inputs to the first decoding stage are the channel samples (u, .zeta..sub.1) corresponding respectively to the systematic bits x, and the channel samples corresponding the parity bits of the first encoder z.sub.1, and the extrinsic information about the systematic bits that was determined from previous decoder stages .LAMBDA..sub.2 (x). We will later define extrinsic information but it is suffice to say that, on the first decoding, the extrinsic information is zero. If puncturing is applied at the transmitter then the parity bits must be stuffed with zeros in the appropriate positions.

The first decoding stage uses the BCJR algorithm to produce a refined (soft) estimate of the systematic bits, Pr[x(j).vertline.(u, .zeta..sub.1, .LAMBDA..sub.2, (x)]. However, for the implementation it is more convenient to express this soft estimate as the equivalent log-likelihood ratio ##EQU7##

In practice, this calculation is much simpler than equation (13) would suggest. On the first iteration this refined estimate is re-ordered to compensate for the pseudo-random interleaving at the transmitter and combined with the second set of parity samples from the channel as the input to the second decoding stage.

The second decoding stage uses the BCJR algorithm to produce a further refined estimate of the systematic bits Pr[x(j).vertline..LAMBDA..sub.1 (x), .zeta..sub.2 ], using the second set of parity bits. This second estimate is expressed as the log-likelihood, .LAMBDA..sub.2 (x(j)). This estimate can be hard-detected to provide bit estimates at this point. Alternatively, the output of the second stage can be used to provide extrinsic information to the first stage. In either case, the samples must be re-ordered to compensate for the random interleaving.

The decoding schemes discussed above rely on the implicit assumption that the bit probabilities remain independent from one iteration to another. This turns out to be true at the end of the first decoding, for approaches such as applying the BCJR algorithm to a product code. On subsequent decoding iterations, the bit estimates are correlated but the BCJR algorithm is applied as if they were independent.

To increase the independence of the inputs from one processing stage to the next, the Turbo algorithm makes use of the concepts of intrinsic and extrinsic information. Intrinsic information refers to the information inherent in a sample prior to a decoding. Extrinsic information refers to the incremental information obtained through decoding. To maintain as much independence as possible from one iteration to the next only extrinsic information is fed from one stage to the next.

Mathematically, the symbol probability obtained at the output of the BCJR algorithm is the combination of these two quantities. However, for a simple systematic code, the closed form solution to the MAP decoding problem in a memoryless channel gives ##EQU8## where b, y are the transmitted and received bit sequences, respectively, and C is the set of all codewords. Equation (14) shows that they can be separated easily. The first factor in the product on the second line of (14) represents the intrinsic information of the soft channel values at the input to the decoder, and the second term is the extrinsic information which is generated by the decoding. It is more convenient to express this relationship in terms of log-likelihood ratios because then the product becomes a sum and the denominator term cancels. In particular, the extrinsic information at the output of the second stage is

That is, it is the difference of the input and output log-likelihood ratios for the systematic bits--on the first pass, .LAMBDA..sub.1 (x)=.LAMBDA..sub.1 (x). Similarly at the output of the first stage, the extrinsic information provided to the second stage is given by

In the TIurbo decoder, the processing performed in the BCJR algorithm is unchanged from the basic algorithm. The differences occur only in the processing which is performed the inputs to and outputs from the BCJR algorithm. A hard decision can be obtained at any time with the operation

The impressive aspect of Turbo codes is their performance. In FIG. 22, an example of their performance is presented. These results correspond to the encoding strategy described above and for an interleaver size of 64K bits. Performance is shown as a function of the number of iterations.

It is clear that performance with a small number of iterations is not particularly impressive but as the number of iterations is increased performance continually improves. With 18 iterations, the code achieves a BER of 10.sup.-5 at an E.sub.b /N.sub.0 of only 0.7 dB. This is only 0.7 dB away from the Shannon limit for this code rate. What is not apparent from FIG. 22 is that, asymptotically, the performance curves have a much shallower slope. which is determined by the minimum distance of the code. Performance of Turbo codes at low E.sub.b /N.sub.0 is a function of the interleaver size. With smaller interleavers performance degrades. However, even with an interleaver size of 300 bits, a BER of 10.sup.-4 can be obtained at an E.sub.b /N.sub.0 of 2.3 dB.

Because of their amazing performance, Turbo codes have attracted a considerable amount of interest. What appears to be the most complete analysis of their behaviour is described in S. Benedetto and G. Montorsi, "Unveiling Turbo Codes: Some results on parallel concatenated coding schemes," IEEE Trans. Inf. Th., vol. 42. No.2, pp.409-428, March 1996 herein incorporated by reference. This analysis is based on determining the equivalent transfer function of the overall code, and then using standard transfer function techniques to bound performance assuming maximum likelihood detection.

For Turbo codes, the interleaver size effectively defines the block size even if convolutional codes are used. Due to the structure of the code which transmits systematic bits and the parity bits from two parallel encoders, the analysis focuses on the bivariate weight enumerating function AC(X,Z) for the component codes which explicitly shows the weights associated with the information bits, x, and the parity bits, z=.sub.(z1,[l,K],.sub.,z2,[l, K]). In particular, ##EQU9## where X and Z are indeterminants, and .sub.Aw, j is the number of codewords generated by an input information word of Hamming weight w whose parity check bits have Hamming weight j. For a component code, this bivariate weight enumerating function (WEF) can be determined with standard transfer function techniques.

When a conditional weight enumerating function .sup.Aw C1(Z) of each of the component codes is defined as the distribution of the Hamming weights of the paty check bits for any input word of Hamming weight w, then a relationship between the conditional WEF and the bivariate WEF function is ##EQU10##

Analysis of the parallel code structure for a specific interleaver is too complex in general. Consequently, the concept of a uniform interleaver is introduced. A uniform interleaver of length J is a probabilistic device that maps a given input word of weight w into all distinct ##EQU11## permutations of it with equal probability ##EQU12##

Each term of the conditional WEF A.sub.w.sup.C1 (Z) corresponds to a specific error event where the systematic bits have weight w. For each such event, the second encoder will produce a parity stream of a specified weight. From the definition of the uniform interleaver, this second parity stream is not fixed for a given weight it, error event but rather has a distribution described by ##EQU13## and this is independent of the parity stream of the first encoder. Consequently, the combined parity stream has a conditional WEF given by ##EQU14##

The WEF for the overall parallel code is then given by ##EQU15##

The performance with maximum likelihood detection is then bounded using standard transfer function techniques. For instance the union bound on the probability of bit error over an AWGN channel is given by ##EQU16##

From this analysis the following is known. Let an error event have weight .delta.=w+j where w is the number of information bit errors andj is the number of parity bit errors. Let w.sub.min be the minimum w of any finite weight error event. Let .eta..sub.min be the number of error events starting at a particular bit with w.sub.min information errors and lowest weight ##EQU17##

Note that .delta..sub.min does not necessarily equal the free distance of the code. The number of possible information error events of weight w.sub.min in an interleaver of length J is ##EQU18## So over a block of length J, there are .eta..sub.min J events of distance .delta..sub.min. The fraction of w.sub.min events which are the minimum distance .delta..sub.min is ##EQU19##

Equation (24) shows that the interleaver gain on the error coefficient is proportional to J.sup.1-w.sbsp.min. For a recursive convolutional code with a fixed block lenigtlh and terminated trellis w.sub.min .gtoreq.2. For non-recursive convolutional codes and block codes w.sub.min =1. Consequently only for recursive convolutional codes, is there an interleaver gain on the error coefficient. Note that other events with w<w.sub.min may have distances .delta.<.delta..sub.min but will have a significantly greater attenuation, J.sup.1-w, of the error coefficient. Thus, the critical design parameters for Turbo codes are not the minimum distance but rather .delta..sub.min and w.sub.min.

There are two major aspects to recent developments in coding theory. The first was the development of iterative decoding techniques based on soft-output algorithms such as the BCJR algorithm. The second was the development of new coding techniques using parallel recursive convolutional codes with a random interleaver. The latter closely approximates random codino performance yet has a simple structure which permits practical decoding, i.e., via the iterative algorithms. It is the combination of these two developments that resulted in the quantum step in performance which occurred with the introduction of Turbo codes.

For a finite alphabet X, with symbol probabilities p[x] for x(X, the entropy of the random variable x is given by ##EQU20## The various physical interpretations of this quantity are well known and are summarized as follows: the entropy characterizes the unpredictability of the outcome of randomly selecting a symbol from X.

While entropy concerns only a single distribution, cross-entropy is a measure of the difference between two probability distributions. Traditionally, it has had greater importance in statistical applications than in communications. For two distributions p and q of an alphabet X, the cross-entropy is defined as ##EQU21##

The cross-entropy of two distributions is zero only if the two distributions are identical. Otherwise the cross-entropy is positive. The cross-entropy of two distributions is not symmetric, that is,

Furthermore, cross-entropy does not satisfy the triangle equality in general. That is,

is not always true. Consequently, cross-entropy is not a true metric in the mathematical sense, and its usefulness depends upon the application.

There are statistical inference schemes associated with the concepts of entropy and cross-entropy. The first scheme to gain prominence was Jayne's Principle of Maximum Entropy. Jayne promoted this as a method of statistical inference in physics, and showed how some fundamental distributions such as the Boltzmann and Fermi distributions could be derived from this principle. Previously, this inference scheme had been used in statistical applications, and since then its use has spread to countless fields including spectral estimation.

When inferring the distribution p[x] of a random variable x, the Principle of Maximum Entropy selects that distribution which has the maximum entropy subject to the known constraints on the random variable x. The constraints referred to are typically constraints on the range and the moments of x. For example, when a random variable has a range (-.infin., +.infin.), a mean of zero, and a variance of one, the maximum entropy distribution is the normal distribution.

Implicit in the principle of maximum entropy is the assumption that there is no a priori information about the random variable other than the constraints. The Principle of Minimum Cross-entropy is a generalization of the maximum entropy principle, which permits the inclusion of a priori information. When inferring the distribution p(x) of a random variable x with an a priori distribution q[x], the Principle of Minimum Cross-entropy selects that distribution p[x] which has the minimum cross-entropy H[p,q] subject to the known constraints on the random variable x.

Mathematically, the principle of cross-entropy minimization is, given an a priori distribution q[x], determine the "closest" distribution in the cross-entropy sense p[x], which satisfies the given constraints on its moments. That is, find p[x] such that ##EQU22## subject to ##EQU23## which is a normalization constraint and

where E.sub.p is the expectation over the distribution p, and f.sub.i are functions in x that represent the equality constraints on the moments of x.

The minimum cross-entropy (MCF) distribution p[x] is found by the product of the a priori distribution and an exponential function of the constraints ##EQU24## where the {.mu..sub.k } are Lagrange multipliers with .mu..sub.o determined by (30) and the remaining .mu..sub.k determined from (31). That is, from ##EQU25##

This is, in general, a set of F non-linear equations for the parameters .mu..sub.k. The practical application of the principle of cross-entropy minimization to any problem of significant size is a computer exercise in determining the solution of this set of non-linear equations.

Under the following four axioms i) uniqueness; ii) invariance under a change of coordinate system; iii) invariance regardless of whether independent systems are considered separately or together; and iv) invariance regardless of whether independent subsets are treated conditionally or in terms of the full system density, cross-entropy is the only consistent inference scheme; all other inference schemes will lead to contradictions.

The principle of cross-entropy minimization is a very general statistical inference method. To apply cross-entropy to a decoding problem, determination of a priori distribution and constraints for a problem is necessary.

Given space of dimension N where is the set of all bipolar N-tuples

a binary block code of length N and rate J/N is a subset of . Such a code will have 2.sup.J codewords, and for many applications the number of codewords is too many to list or otherwise explicitly identify. Let y.epsilon.z,4.sup.N be the corresponding noisy received bits. Assuming a memoryless channel the (i priori distribution associated with the transmitted bit b(n), derived from the corresponding noisy channel sample, is q[b(n)]=Pr[b(n) .vertline.y(n)]. The corresponding at priori N-tuple probability is given by ##EQU26## That is, the apriori distribution q[b] is determined from the soft-decision output of the channel.

For the decoding problem the moment constraints of (31) correspond to the parity check equations of the code. For example, if the (N,J) code is a linear binary code, there will be F=2 N-J independent parity check equations. A typical parity check equation is expressed as

For this type of constraint, f.sub.i (b)=0 when the code bits satisfy the constraint and f.sub.i (b)=2 when they do not. Since the constraint is non-negative, (36) is equivalent to the expected value constraint, E[f.sub.i (b)]=0.

Theorem 2.1 (Cross-entropy decoding). Given the a priori distribution q[b] as determined from the channel values, and the parity check equations {E[f.sub.i (b)]=0, i=1, . . . ,F}, the a posteriori distribution is given by

where I.sub.i (b) is the indicator function for all vectors b which satisfy constraint f.sub.i. The parameter .mu..sub.0 is a normalization constant.

For constraints of the form (36), equation (33) can be solved by inspection when one notes that f.sub.i (b), q[b], and exp(.) are all non-negative. That is, for any b such that f.sub.i (b)>0, one must have .mu..sub.i =.infin. for (4-10) to hold. However, with .mu..sub.i =.infin., ##EQU27## and (37) follows immediately.

For the code C, the constraints {f.sub.i (b): i=1, . . . ,F} determine the code. In particular, the indicator function for a codeword is

since a codeword must satisfy all of the parity constraints simultaneously. Consequently combining (37) and (39) the aposteriori distribution can also be expressed as ##EQU28## where the required normalization has been performed. For the decoding problem, (40) is a closed form expression for the minimum cross-entropy distribution.

The a priori distribution q[b] is, in general, non-zero for all b.epsilon.. However, the a posteriori distribution is only non-zero when b is a codeword. In particular, equation (40) represents the a posteriori probability distribution of the codewords. It is the a priori probability of each codeword, scaled by the sum of the probabilities of all permissible codewords. Maximum a posteriori decoding is simply choosing that codeword for which p[b] is largest where p[b] is given by (40).

To determine if MAP decoding results in fundamental losses, when represents the processing performed by the minimum cross-entropy decoder i s

where (b; y) is the mutual information between b and y. We have the following fundamental result.

Theorem 2.2 (Lossless Decoding). Minimum cross-entropy decoding is lossless. That is,

Proof: The mutual information is defined as

where H(b) is the entropy associated with b and the conditional entropy of b given y is

It is this conditional entropy which must be shown to be unchanged after processing. For a particular realization, the vector y represents the N soft values obtained from the channel. Let the codewords be represented by ={X.sub.1, X.sub.2, . . . X.sub.2.spsb.J }. . Minimum cross-entropy decoding produces 2.sup.J soft values representing the codeword probabilities p[X.sub.i .vertline.y]. The conditional entropy after decoding is ##EQU29##

When a subset of the constraints is used when one performing the minimum cross-entropy decoding, the following theorem results.

Theorem 2.3 (Lossless Partial Decoding). Minimum cross-entropy decoding with only a subset of the constraints is lossless. That is,

Proof: Let I.sub.C.sbsb.1 (b)=I.sub.1 (b)I.sub.2 (b) . . . I.sub.F.sbsb.1 (b) and let I.sub.C.sbsb.2 (b)=I.sub.F.sbsb.1.sub.+1 (b) . . . I.sub.F (b). Then, the output of the decoding .sub.1 (y) will produce 2.sup.K.sbsp.1 soft values p.sub.C.sbsb.1 [b] corresponding to the 2.sup.K.sbsp.1 words satisfying the constraints C.sub.1. Given these values, one can compute the corresponding values for complete decoding as follows ##EQU30## and consequently, there is no loss of information with the partial decoding.

The above theorem has particular application to decoding in stages as it minimizes losses between stages.

A cross-entropy minimization process provides the probabilities for each codeword, from which is selected one with the greatest probability. Direct application of cross-entropy techniques to the decoding problem is not practical due to the complexity in calculating the probabilities for all 2.sup.J codewords.

The BCJR algorithm provides an interesting approximation to minimum cross-entropy decoding. The BCJR algorithm determines the a posteriori symbol (or bit) probabilities not the a posteriori codeword probabilities. This is equivalent to calculating the marginal distributions for each bit given the codeword probabilities. This marginal distribution can be expressed as ##EQU31##

This a posteriori symbol probability is the soft output value of the BCJR algorithm for each bit. If one restricts the problem to systematic codes, this provides a direct estimate of the transmitted bits and their associated reliability. A hard decision based on (48) is the maximum a posteriori symbol estimate, and provides the lowest probability of symbol error among all decoders.

The two theorems discussed above are extensible to the output of the BCJR algorithm. That is, if (Y) represents the processing of the channel values by the BCJR algorithm to estimate the symbol probability then

Theorem 2.4. For a given symbol the BCJR algorithm is lossless. That is,

Theorem 2.5. For a given symbol, the BCJR algorithm with only a subset of the onstraints is lossless. That is,

These lossless properties only apply to a given symbol. When applied at the codeword level, in general

that is, there is a loss of information with the BCJR algorithm. The only case when this is not true is when the b(n), knowing y are independent, that is, ##EQU32##

To minimize this loss of information, the codes used with the iterative decoding strategies below are designed to make the b(n) as independent as possible.

A drawback of the MCE decoding algorithm is its complexity. In particular, its complexity grows exponentially with the block size. This complexity issue is addressed in two ways: i) by using an iterative algorithm that only applies a subset of the constraints at a time, and ii) using the BCJR algorithm as an approximation to MCE decoding.

To understand the iterative approach conceptually, a geometric interpretation proves insightful. Let the F constraints (parity check equations) be partitioned into MW, not necessarily disjoint, subsets labeled C.sub.i, i=1, . . . , M. The abbreviation C.sub.i is used to denote the three related concepts of i) a constraint subset, ii) those probability distribution functions which satisfy this constraint subset, and iii) those codewords that satisfy this constraint subset. A first geometric observation is that these constraint subsets are convex. Any two distributions, .phi..sub.1 [b] and .phi..sub.2 [b], which satisfy the constraint E[f.sub.i (b)]=0. Let .phi.[b]=.delta..phi..sub.1 [b]+(1-.delta.).phi..sub.2 [b] result in ##EQU33## and thus the set of distributions, C.sub.i, is convex.

The following iterative MCE algorithm is proposed.

i) set p.sub.0 [b]=q[b], the a priori distribution.

ii) determine p.sub.k+1 [b] as the MCE distribution corresponding to the a priori distribution p.sub.k [b] and the constraint set C.sub.v, where v=k mod M+1.

iii) repeat step ii) until convergence occurs.

The "global" MCE distribution is that distribution which satisfies all the constraints and is "closest" to the a priori distribution, whereas a "local" MCE distribution only satisfies a particular constraint subset. This iterative algorithm is illustrated graphically in FIG. 23 for the case of two constraint subsets. The estimate distribution oscillates between two convex hulls representing those distributions, which satisfy the two respective constraint subsets. It eventually converges to a distribution that lies in the interscction of the two. Prior to convergence, the estimated distribution, p.sub.k [b], at any given step will, in general, only satisfy one constraint set. The convergence properties of this algorithm are covered by the following theorem.

Theorem 2.6 (Convergence of the iterative MCE algorithm). For the algorithm described above, the limit ##EQU34## exists and p.sub.* [b] is the MCE distribution satisfying the fixed constraint set ##EQU35## Proof: The MCE distribution p.sub.k+1
[b] satisfies ##EQU36##

Given an a priori distribution p.sub.k, the corresponding MCE distribution p.sub.k+1 based on the constraint set C.sub.v, and any other distribution p.sub.* satisfying C.sub.v ; these three distributions satisfy the triangle equality for MCE distributions. In particular,

Since cross-entropy is non-negative, this implies

and equality holds only if p.sub.k+1 =p.sub.k. For a given k, this is true for any p.sub.* in the constraint set C.sub.v. Assuming that p.sub.* is the unique global MCE distribution, the relation (57) is true for all k since p.sub.* lies in all the constraint sets. Since the cross-entropy in (57) is monotonically decreasing and is bounded below by zero, this implies that in the limit as k.fwdarw..infin. ##EQU37## where c is a nonnegative constant. This, in turn, implies that ##EQU38## for some fixed distribution p.sub.c, otherwise the cross-entropy would not have converged. But then p.sub.c must be the MCE distribution for each constraint set. Consequently, p.sub.c =p.sub.*.

When cross-entropy methods are applied to a two-dimensional product code such as illustrated in FIG. 15, an a priori distribution q(X) determined from the channel samples, as described by (35) is used. Estimates of the a posteriori symbol probabilities for each horizontal codeword, constraint subset, are made. The resulting probabilities are used as an a priori distribution for processing each of the vertical codewords. The algorithm is repeated, making a number of horizontal and vertical passes, until convergence occurs. Convergence generally occurs quite quickly.

This iterative approach is analogous to the repeated application of constraint subsets in the iterative solution to the MCE problem. The key to this approach is the BCJR algorithm, which estimates the a posteriori symbol probabilities. That is, for each codeword it determines the marginal distributions of the MCE solution for the given constraint set, analogous to (48), rather than the complete distribution, analogous to (40). This is equivalent to finding the MCE solution for the given a priori distribution and the constraint subset for a particular codeword, but with the additional constraint that the resulting symbol probabilities are independent.

It is the introduction of this additional constraint or independence assumnption that makes the algorithm suboptimum. Unlike the other constraints (parity check equations) that are time-invariant, the constraint corresponding to the independence assumption is time-varying, that is, at stage k, the constraint depends on p.sub.k-1. The independence assumption has two important consequences. Firstly, it simplifies significantly the calculation of the MCE distribution. Secondly, it is an approximation and as such it results in some degradation in performance.

It is difficult to quantify the amount of degradation that is to be expected with this additional assumption. However, it depends upon the encoding strategy and the choice of the constraint subsets used in the decoding. A major advantage of the product code structure is that when choosing constraint subsets corresponding to the horizontal and vertical parity check equations, then the independence assumption is valid for the first and second decodings (D decodings in a D-dimensional product code). TIhe assumption is not true on subsequent decodings but it is likely approximately true given the good performance of these techniques.

Turbo codes involve a use of recursive systematic codes, random interleavers, and a modified version of the iterative detection algorithm. Initially, it was thought that the function of the random interleavers was to provide approximate independence between the different stages of the decoding process. It was later found that this was only one function of the random interleavers. In addition, they also reduce the coefficient of the dominant error event. This explains why random interleavers work significantly better than block interleavers in the present application.

Referring to FIG. 24, the difference operations performed on the log-likelihoods directly are shown. In addition, the interleaver and de-interleaver are juxtaposed with the second stage BCJR algorithm. This emphasizes the parallel nature of the code, and how the interleaver/de-interleaver pair provides a different second code or constraint set in MCE termninology.

Another form of the Turbo detection algorithm is shown in FIG. 25. This is referred to herein as exirinsic form of the Turbo detection algorithm. This is equivalent to the block diagram of FIG. 24. However, it shows the symmetry between the two stages, and also shows that only the extrinsic information, .LAMBDA., is passed between the two stages. The channel samples are fixed inputs to the two decoding stages. This extrinsicform emphasizes the fact that the objective of the iterative decoding is only to increase the extrinsic information associated with the data.

This alternative representation of the Turbo detection algorithm suggests a modified cross-entropy minimization algorithm. When the BCJR algorithm in FIG. 25 is replaced by the true MCE algorithm for the given constraints and p.sub.k [b] represents the probability distribution at the output of the kth iteration of the MCE algorithm the following is known. Recall that the output distribution of the MCE algorithm has the form

where q.sub.k [b] is the a priori distribution, and g.sub.k (b) has the form ##EQU39## where the summation is over the constraints in the constraint set C.sub.v where v=k modulo M. Computing the difference between the logarithm of the input and the output of the MCE stage ##EQU40## corresponds to the extrinsic information obtained from the kth iteration of the algorithm. Let q.sub.0 [b] be the a priori distribution for b.di-elect cons. corresponding to the channel samples. Then, the a priori distribution for the kth stage is

that is, the combination of the a priori information from the channel and the previous stage's extrinsic information. The parameter .eta..sub.k is a normalization constant. The following corollary to Theorem 4.6 results.

Corollary 1.1 (Convergence of the modified MCE algorithm). The sequence of distributions {p.sub.k [b]} produced with the modified MCE algorithm using the a priori distributions

where .eta..sub.k is a normalization constant, is identical to the sequence of distributions produced with the a priori distributions

In particular, there is asymptotic convergence of the modified algorithm to the global MCE distribution p.sub.* [b].

This modified algorithm can also be interpreted geometrically. TIhe space of probability distributions on with cross-entropy as the measure is not a metric space. However, it does have some geometric similarities to Euclidean space. In particular, the triangle equality for minimum cross-entropy distributions is analogous to the Pythagorean theorem in Euclidean space. The triangle equality for cross-entropy implies that

for any distribution r[b] that satisfies the constraint set C.sub.1. Equation (66) has obvious similarities to the Pythagorean theorem. Thus, in a sense, the line joining q.sub.0 [b] and p.sub.1 [b] is "orthogonal" to the constraint set C.sub.1. That is, orthogonal in a cross-entropy sense not a Euclidean sense. As illustrated in FIG. 26, the first two steps of the modified algorithm are identical to the original algorithm and result in p.sub.2 [b].

Similarly, the line joining p.sub.1 [b] and p.sub.2 [b] is "orthogonal" to the constraint set C.sub.2. The a priori distribution for the third step is p.sub.2 [b] "minus" the information obtain in the first step, that is, p.sub.1 [b]-q.sub.0
[b]. In FIG. 26, the "minus" operation in the vector sense is illustrated as if it were a Euclidean space. However, the resulting distribution q.sub.3 [b] is "collinear"with p.sub.2 [b] and p.sub.3 [b], since the corollary showed that both p.sub.2 [b] and q.sub.3 [b]have the same "orthogonal" projection (MCE distribution) on C.sub.1.

For the case of two constraint sets, the global MCE distribution can be written as

where g.sub.odd (b) is that portion of the MCE distribution due to the constraints C.sub.1, and g.sub.even (b) is that portion due to C.sub.2. As FIG. 26 illustrates, the interim a priori distributions are estimates of these quantities, in particular,

and

In this sense, the modified algorithm is trying to independently estimate the contributions of the two constraint sets to the global solution.

The above is the general form for the modified algorithm. In certain circumstances it can be simplified, in particular, for a parallel concatenated coding strategies as illustrated in FIG. 14 and used in the Turbo coding strategy. Recall the partition ofthe vector b.di-elect cons. into (x, z.sub.1, z.sub.2) where x represents the systematic bits, and z.sub.1 and z.sub.2 represent the parity bits corresponding to the two encoders. The first observation is that for a memoryless channel

Next consider the third stage of the algorithm where the a priori distribution for the modified algorithm is ##EQU41## where the second line follows because the constraint set C.sub.2 does not depend upon z.sub.1 and thus the extrinsic information can only depend on x and z.sub.2. The next step of the algorithm represents a MCE projection onto C.sub.1. Consequently, it is only the dependence of this a priori distribution on x and z.sub.1 that will have a bearing upon the MCE projection. That is, to obtain p.sub.3 [b], we only use that portion of q.sub.3 that depends on x and z.sub.1 since C.sub.1 does not depend upon z.sub.2. In fact, if we use as the a priori distribution ##EQU42## it follows that the local MCE distribution is then ##EQU43##

Consequently, the corresponding extrinsic information ##EQU44## for the next step is unchanged and, thus, neither is the algorithm. The same argument applies for each iteration k. That is, at each iteration, the a priori distribution need only depend on the initial distribution for the systematic bits q.sub.0 [x] and the parity bits relevant for this iteration q.sub.0 [z.sub.v ], as well as the extrinsic information about the systematic bits obtained from the previous stage, g.sub.k-1 (x, z.sub.v-1).

It appears that when using the approximate MCE algorithm, performance is much better with the modified algorithm structure. Heuristically, it is because it improves the independence from one stage to another.

The differences between the modified MCE detection algorithm and the Turbo detection strategy are

i) the Turbo algorithm assumes g.sub.k-1 (x,z.sub.v-1)=g.sub.k-1 (x)g.sub.k-1 (z.sub.v-1). With this separation of the extrinsic information for the systematic and parity bits, only the first factor g.sub.k-1 (x) has to be carried forward to the next stage. And,

ii) the Turbo algorithm assumes that ##EQU45## and this assumption allows replacement of the calculation of the local MCE distribution with the BCJR algorithm.

Cross-entropy can also be useful as a stop criterion in the decoding process. For example to achieve the best performance with Turbo codes may require up to eighteen or more iterations, but the vast majority of the received codewords require only a handful of iterations while a very few require the full complement. Cross-entropy has been shown empirically to be a useful criteria for distinguishing between these cases, and can result in a significant computational saving.

The idea behind the stop criteria is to estimate the cross-entropy between the distributions corresponding to consecutive iterations of the decoding process. This can be expressed as ##EQU46## assuming statistical independence of the symbol probabilities. Ihis empirical estimate of the cross-entropy of successive pairs of distributions is compared to a threshold. Whenever the cross-entropy drops below the threshold, indicating little change in the distribution from one iteration to the next, the process is stopped. Simulations have indicated that cross-entropy drops quite dramatically at convergence, indicating that no more errors can be corrected, and this test provides an excellent stop criteria.

Multiuser detection (MUD) is described below. A problem arises in radio communications where, due to limited spectrum, reuse of a spectrum as frequently as possible to increase capacity is desirable. Spectrum reuse often implies interference between the signals of different users. Multiuser detection is one method of reducing the effect of this interference and potentially increasing system capacity.

Code division multiple access (CDMA) is often proposed as a method for sharing spectrum among several users. The conventional approach to detection in a CDMA system is to demodulate each user's signal as if it were the only one present. The detection of the desired signal is protected against the interference due to the other users by the inherent interference suppression capability of CDMA. There are two limitations to this approach.

i) All users interfere with all other users and the interference adds to cause performance degradation. This is minimized by keeping the cross-correlations between users low but there is a limit to what can be achieved in this area.

ii) In many applications there is a serious near/far problem such that high signal levels can swamp lower power users. This can imply the need for tight power control. One way around these limitations is to employ multiuser detection techniques. However, CDMA is not the only potential application of these techniques. Multiuser detection techniques are particularly attractive on the return link of any point-to-multipoint applications, e.g., at the base station of a cellular system, where all user waveforms are known; there is co-channel or adjacent channel interference; and it is required to detect all users.

A general model for multiple access communications is the following. Assuming there are users simultaneously accessing the same communications channel each using a modulating waveform s.sub.k (t) with a symbol period T. Without loss of generality we assume binary signaling with the kth user's channel bit at time i is represented by b.sub.k (i). These signals are combined in the following manner ##EQU47## where the messages of the K users are assumed to be N bits long. The parameters {.tau..sub.k } represent the relative transmission delays of the different users. Without loss of generality, it is assumed that the delays are ordered, 0<.tau..sub.1 <.tau..sub.2 < . . . <.tau..sub.k <T. For the synchronous case, .tau..sub.1 = . . . =.tau..sub.k =0. The modulating waveforms, s.sub.k (t), are assumed to be normalized to unit energy, and relative received amplitude levels of the different users are characterized by the positive parameters {w.sub.k }. For the case of equal power users, w.sub.1 =w.sub.2 = . . . =w.sub.K =1is assumed. The vector of bits at symbol interval i is represented by b(i).epsilon.{-1,+1}.sup.K, and the collection of all bits over M symbol periods is represented by the matrix b.epsilon.{-1,+1}.sup.K.times.N. The extension of the following to non-binary signaling is straightforward.

With an additive white Gaussian noise channel the corresponding received signal is

The noise process, n(t), is a zero-mean white Gaussian process with spectral density N.sub.0. This model is applied to a variety of multiple access systems including frequency division multiple access (FDMA) and CDMA. With FDMA, the modulating waveforms are essentially the carrier frequencies. Any pulse shaping that is performed will also be part of the modulating waveform. With CDMA the modulating waveforms correspond to the spreading code assigned to each user. The latter application is where multiuser detection is most often considered because the modulating waveforms are less orthogonal.

Assuming the modulating waveform of each user is known at the receiver, and that the K-user coherent receiver locks to the signaling interval and phase of each active user, when the additive noise is a white Gaussian random process, the conditional probability density function, p(r.vertline.b), is proportional to the likelihood function, L(b), defined as ##EQU48## where T.sub.0 is the observation interval. Assuming the transmitted bits are equiprobable bits, the maximum likelihood estimate of the data is that b which minimizes L(b). In particular, b is selected to minimize the log-likelihood ##EQU49##

The integral of first term of this expansion is a constant that is independent of the estimate b, and thus does not affect the minimization. When ##EQU50## so that the yk(i) is the output of a filter matched to the modulating waveform, then the integral of the second term in the expansion of (80) reduces to ##EQU51## where W.di-elect cons..sup.K.times.K is the diagonal matrix with entries W.sub.kk =w.sub.k ; and y(i)=(y.sub.1 (i),y.sub.2 (i), . . . y.sub.K (i)).sup.T.di-elect cons..sup.K is the vector of matched filter outputs. For the third term of the expansion, define the klth entry in the cross-correlation matrix H(i).di-elect cons..sup.K.times.K by ##EQU52## and then the third integral in the log-likelihood function can be written as ##EQU53##

From (80), (82) and (84), maximum likelihood detection is equivalent to choosing b to minimize the log-likelihood function .sub.Ll og (b), that ##EQU54## where the modified log-likelihood, that ignores the constant term, is given by ##EQU55##

This is a relatively complex expression that can be made muchimpler under a variety of circumstances. The only dependence of the optimum estimate on the received signal is through the matched filter samples, y(i). Thus, the vector of matched filter outputs y(i) form a set of sufficient statistics for this problem. The presence of the delay term .tau..sub.k in (81) implies that the matched filtering is individually synchronized for each user.

In general, the samples corresponding to the kth matched filter {y.sub.k (i): i=1 . . . N} are not sufficient statistics for the detection of the corresponding data {b.sub.k (i)=1 . . . N}. However, the complete samples {y(i): i=1 . . . N} do form a set of sufficient statistics for determining {b(i): i=1 . . . N}.

The minimization of .OMEGA.(b), given by (5-10), is a non-trivial problem. The brute force approach is a combinatorial problem that requires evaluating the above integral over all .vertline.A.vertline..sup.K.times.N possible symbol matrices where A={-1,+1} is the symbol alphabet. Analysis indicates that any solution has a complexity which is exponential in the number of users; however, simplification occurs under various assumptions.

In the synchronous case, all of the relative delays {.tau..sub.k } are zero. Assuming that the cross-correlation between modulating waveforms in adjacent symbol intervals is zero, that is, H(i)=0 for .vertline.i.vertline.>0 by, for example, assuming that the modulating waveform is zero outside the symbol interval, i.e., s.sub.k (t).di-elect cons.[0,T] results in Nyquist modulation of waveforms having similar cross-correlation characteristics.

From (77), (78) and (81), the output of the bank of matched filters for a synchronous system can be represented as by the equivalent discrete time system

where vector n(i) is a set of zero-mean correlated noise samples with ##EQU56##

Assuming noise, n(t), is white, it follows from (88) and (83) that E[n(i)n(j).sup.T ]=.sigma..sup.2 H(0).delta.(i-j). Thus, the system has an equivalent discrete structure, which is described by (87) and illustrated in FIG. 27.

Under these assumptions, the modified log-likelihood function reduces to ##EQU57## and, since the matched filter samples are independent from one symbol period to next, the maximization of (89) can be reduced to the independent maximization of each of the terms in the sum. That is, ##EQU58##

While this is a considerably simpler problem to solve, its complexity is still exponential in the number of users.

If the symbol alphabet was analog finding b to maximize (89) or (90) would be equivalent to mean square estimation. In fact, one suboptimum solution to this problem is to use the sign of the mean square estimate, as the data estimate. However, with a discrete symbol alphabet, the optimum solution of (89) or (90) is a combinatorical problem of searching over the 2.sup.K possible values for each b(i). The above is only optimum assuming adjacent symbols are independent. If there is a correlation between adjacent symbol periods, for example, due to FEC coding then the optimum solution will be similar to the asynchronous case.

As a point of comparison, the conventional detector simply takes the sign of the bits at the output of the matched filter, that is,

where the sign(.) operator is applied on an element by element basis. This approach is of much lower complexity than the optimum approach but clearly relies on low cross-correlations, that is, H(0) should be approximately diagonal, for good performance.

In the asynchronous case, it is no longer assumed that the relative delays of all users are zero. Without loss of generality, it is assumed that the time delays are ordered such that 0<.tau..sub.1 <.tau..sub.2 < . . . <.tau..sub.k <T. It is also generally assumed that the modulating waveforms are zero outside of the symbol interval. This implies that the cross-correlation matrices H(i)=0 for .vertline.i.vertline.>1. The corresponding discrete time representation is

which illustrates the dependence of the current output on the preceding and following symbols due to their overlap. If H(i).noteq.0 for .vertline.i.vertline.>1 then (84) can be augmented in the obvious manner to show the dependence on more distant symbols.

The noise process in the above is correlated both at each transition and between transitions according to

Under these conditions, the modified log-likelihood function is given by ##EQU59## where it is assumed the end terms, b(0) and b(M+1), are either zero or known. Relative to the synchronous case, the asynchronous case has two extra terms corresponding to the overlap between adjacent bits.

The asynchronous multiuser detection problem is a generalization of the intersymbol interference problem. It is known to derive a (matrix) whitening filter for y(i) in the general asynchronous case, such that the resulting noise vector is white. This is a precursor to equalizer desion for the present problem. A multi-user channel is modellable as a linear finite-state machine, as shown on the left-hand side of FIG. 28, similar to a convolutional encoder. This is a 2.sup.2K state machine driven by the symbol vectors and with the outputs weighted by the cross-correlation matrices as indicated by (92).

Given the analogy to convolutional encoding. it is clear that dynamic programming such as the Viterbi algorithm is an appropriate method for solving the problem. When .OMEGA..sub.a (x(m),m) is the maximum cumulative metric overall sequences of length m ending in state x(m)=(b(m),b(m+1)) where x(m) ranges over possible 2.sup.2K bit combinations of (b(m),b(m+1)), then ##EQU60## where .phi.(x(m-1),x(m),m) is the branch metric for the transition from state x(m-1) to x(m) with input y(m) and is given by (assuming the transition is permitted) ##EQU61##

The maximum likelihood sequence is the sequence {b(i), i=1 . . . M} corresponding to the maximum metric .OMEGA..sub.a (x(M),M). The Viterbi algorithm would have 2.sup.2K states, and thus would only be appropriate for relatively small systems. The computation of (96) is not as onerous as it first appears because the last three terms are constants, which depend only on the state and not the received signal. Thus they are often precomputed and stored.

H(-1)=H.sup.T (1) and consequently, the log-likelihood function in (86) can be rearranged and written as ##EQU62## with the appropriate end conditions. With this decomposition of the log-likelihood function, the metric at each step, the internal expression, depends only 2K bits and not 3K bits as in (95). This corresponds to a Viterbi algorithm with 2.sup.K states and with a time-complexity per bit of O(4.sup.K /K). A second decomposition of the log-likelihood function wherein time steps do not correspond to symbol intervals but rather to the intervals [lT+.tau..sub.i, lT+.tau..sub.i+1) has been proposed. Over such intervals, the contribution to the overall metric (86) depends only on K bits, and there are only two branches per transition. This corresponds to a finite state machine with 2.sup.K-1 states. The previous algorithms were based on a time-invariant trellis with one transition per symbol period. An alternative prior art approach corresponds to a periodic time-varying trellis with K transitions per symbol period. This latter approach reduces the number of states in the Viterbi algorithm to 2.sup.K-1 and the time complexity per bit to O(2.sup.K). However, even this further reduction in complexity is not sufficient to make it practical for large scale systems.

The efficiency of a detector is defined as the ratio between the required SNR in the multiuser system to the required SNR in an equivalent single user system having the same bit error rate. The limit of the efficiency as the background Gaussian noise level goes to zero is the asymptotic efficiency. It characterizes the underlying performance loss when the dominant impairment is the existence of other users rather than additive channel noise. Asymptotic efficiency is analogous to asymptotic coding gain. Asymptotic coding gain represents the power gain relative to an uncoded system as the SNR becomes large. Asymptotic efficiency is the gain (loss) relative a single user system as the SNR becomes large. Both are directly related to the minimum distance of the signaling waveforms. With the former it corresponds to the free distance, d.sub.free, of the forward error correction code. With the latter, it corresponds to d.sub.k,min, the normalized minimum distance between any two K-dimensional sequences: .sup.k b.sup.1, .sup.k b.sup.2 .di-elect cons.B; where .sup.k b.sup.1 and .sup.k b.sup.2 differ in at least the subsequence corresponding to the kth user, that is, ##EQU63##

If the channel was symmetric, that is, if all the cross-correlations and channel losses are identical then there would be no dependence upon k, and d.sup.2.sub.min, would be the minimum distance between any two distinct K-dimensional sequences: b.sup.1, b.sup.2 .di-elect cons.B. In [Ver86a], Verdu showed that with the optimal detector the asymptotic probability of er