Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent
7243292
Naslund , ; et al.
July 10, 2007
Title
Error correction using finite fields of odd characteristics on binary hardware
Abstract
Binary data representing a code word of an error-correcting code is used for calculating a syndrome, wherein a given portion of the binary data comprises k groups of data bits and represents a field element of the finite field GF(p.sup.k), p being an odd prime number, the field element comprising k coefficients in accordance with a polynomial basis representation, each group of data bits of the given portion representing a corresponding one of the k coefficients. The given portion, is stored in a first general purpose register and is processed such that the k groups of data bits of the given portion are processed in parallel; determining whether the syndrome is equal to zero; and detecting and correcting errors in the binary data if the syndrome is not equal to zero.
Inventors:
Naslund; Mats
(Vallingby,
SE
)
, Blom; Rolf
(Jarfalla,
SE
)
Assignee:
Telefonaktiebolaget LM Ericsson
(,
publ
)
Appl. No.:
10/271,945
Filed:
October 17, 2002
PCT Pub Date:
July 10, 2007
Current U.S. Class:
714/781
714/782
714/784
Current International Class:
H03M 13/00 (20060101)
Field of Search:
714/781-785
U.S. Patent Documents
4354228
October 1982
Moore et al.
4891781
January 1990
Omura
5712861
January 1998
Inoue et al.
6349318
February 2002
Vanstone et al.
Other References
Stephen b. Wicker, Error control Systems for Digital Communication and Storage, Prentice-Hall, 1995, pp. 108-119 and 176-191. cited by examiner.~
Primary Examiner:
Torres; Joseph D.
Claims
What is claimed is:
1. An error-correction apparatus for carrying out arithmetic and logical operations, the apparatus comprising: means for data input to, and data output from a general purpose processing unit, the processing unit for executing a plurality of processing operations on binary data stored in a single, hardware register, wherein the processing operations always operate on all bits of the single, hardware register simultaneously, the binary data comprising multiple coefficients of a field element of an odd-characteristic finite field GF(p.sup.k), the field element comprising: k coefficients in a polynomial basis representation and k groups of binary data bits, each group of binary data bits comprising a corresponding one of the k coefficients, wherein k is greater than 1; and wherein the binary data is processed such that the k groups of binary data bits corresponding to the k coefficients are processed by parallel operations, each parallel operation being performed over a number of clock cycles independent of k during the plurality of operations, wherein at least one of the parallel operations is a finite field addition or multiplication of two arbitrary elements of GF (p.sup.k) and the single, hardware register is arranged such that each parallel operation treats the k coefficients independently, wherein the field element is stored in the single, hardware register utilizing a single guard bit between each group of binary data bits to avoid carry bit problems wherein the single, hardware register is a w-bit register where w is greater than or equal to k(m+1) and m is the logarithm to base 2 of p, rounded up to the nearest integer.
2. The error-correction apparatus of claim 1, further comprising means for loading each element of the finite field into the single, hardware register.
3. The error-correction apparatus of claim 1, wherein the processing unit comprises a plurality of hardware registers.
4. The error-correction apparatus of claim 1, wherein the processing unit is a general purpose w-bit arithmetic logic unit (ALU) and is adapted to process the k groups of binary data bits according to one or more operations including a shift operation, an addition operation, a subtraction operation, and a logical AND operation.
5. The error-correction apparatus of claim 1, wherein binary data comprising a field element is initially stored in two single, hardware registers, and operations are carried out such that a right shift by m bits over all the binary data can be carried out by coordinating the two registers such that the least significant bit in one of the two registers is shifted to the most-significant-bit side of the other of the two registers.
6. The error-correction apparatus of claim 1, wherein the k groups of data bits are stored in the single, hardware register such that at least one guard bit is inserted adjacent to the most significant bit of each group of data bits, each group of data bits being separated from an adjacent group of data bits by a corresponding at least one guard bit.
7. The error-correction apparatus of claim 6, wherein an initial value of zero is assigned to each at least one guard bit.
8. The error-correction apparatus of claim 6, wherein one guard bit is positioned adjacent to the most significant bit of each group of data bits.
9. The error-correction apparatus of claim 6, wherein multiple guard bits are positioned adjacent to the most significant bit of each group of data bits.
10. The error-correction apparatus of claim 6, further comprising: means for calculating a syndrome utilizing a segment of the binary data which comprises k groups of data bits, wherein a field element of the finite field GF(p.sup.k), p being an odd prime number, the field element comprising k coefficients in accordance with a polynomial basis representation, each one of the k groups of data bits of the segment representing a corresponding one of the k coefficients, wherein said segment is stored in a first register and is processed such that the k groups of data bits of the segment are processed in parallel, determining whether the syndrome is equal to zero, and detecting and correcting errors in the binary data if the syndrome is not equal to zero.
11. A method for carrying out arithmetic and logical operations in an error correction apparatus, the method comprising the steps of: inputting data to, and outputting data from a general purpose processing unit having a single, hardware register; executing a plurality of processing operations on binary data stored in the single, hardware register, wherein processing operations always operate on all 32 bits of the hardware register simultaneously, the binary data comprising multiple coefficients of a field element of an odd-characteristic finite field GF(p.sup.k), the field element comprising; k coefficients in a polynomial basis representation and k groups of binary data bits, each group of binary data bits comprising a corresponding one of the k coefficients, wherein k is greater than 1; wherein the binary data is processed such that the k groups of binary data bits corresponding to the k coefficients are processed are processed by parallel operations, each parallel operation being performed over a number of clock cycles independent of k during the plurality of operations, wherein at least one of the parallel operations is a finite field addition or multiplication of two arbitrary elements of GF (p.sup.k) and the hardware register is arranged such that each parallel operation treats the k coefficients independently; and storing the field element in the single, hardware register utilizing a single guard bit between each group of binary data bits to avoid carry bit problems, wherein the single hardware register is a w-bit register and w is greater than or equal to k(m+1) where m is the logarithm to base 2 of p, rounded up to the nearest integer.
12. The method of claim 11, further comprising the step of loading each element of the finite field into the single, hardware register.
13. The method of claim 11, wherein the processing unit comprises a plurality of hardware registers.
14. The method of claim 11, wherein the processing unit is a general purpose w-bit arithmetic logic unit (ALU) and is adapted for processing the k groups of binary data bits according to one or more operations including a shift operation, an addition operation, a subtraction operation, and a logical AND operation.
15. The method of claim 11, further comprising the steps of: storing the binary data comprising a field element in two single, hardware registers, and carrying out operations by coordinating a right shift by m bits over all the binary data such that the least significant bit in one of the two registers is shifted to the most-significant-bit side of the other of the two registers.
16. The method of claim 11, further comprising the step of storing the k groups of data bits in the single, hardware register such that at least one guard bit is inserted adjacent to the most significant bit of each group of data bits, each group of data bits being separated from an adjacent group of data bits by a corresponding at least one guard bit.
17. The method of claim 16, further comprising the step of assigning an initial value of zero to each at least one guard bit.
18. The method of claim 16, wherein the at least one guard bit is positioned adjacent to the most significant bit of each group of data bits.
19. The method of claim 16, wherein multiple guard bits are positioned adjacent to the most significant bit of each group of data bits.
20. The method of claim 16, further comprising the steps of: calculating a syndrome utilizing a segment of the binary data which comprises k groups of data bits, wherein a field element of the finite field GF(p.sup.k), p being an odd prime number, the field element comprising k coefficients in accordance with a polynomial basis representation, each one of the k groups of data bits of the segment representing a corresponding one of the k coefficients, wherein said segment is stored in a first register and is processed such that the k groups of data bits of the segment are processed in parallel, determining whether the syndrome is equal to zero, and detecting and correcting errors in the binary data if the syndrome is not equal to zero.
21. Computer instructions stored in memory for carrying out arithmetic and logical operations, comprising: instructions within the computer readable medium for inputting data input to, and outputting data from a single general purpose processing unit; instructions within the computer readable medium for executing a plurality of processing operations on binary data stored in a single, hardware register, wherein the processing operations are executed on all bits of the hardware register simultaneously, the binary data comprising multiple coefficients of a field element of an odd-characteristic finite field GF(p.sup.k), the field element comprising; k coefficients in a polynomial basis representation and k groups of binary data bits, each group of binary data bits comprising a corresponding one of the k coefficients, wherein k is greater than 1; wherein the binary data is processed such that the k groups of binary data bits corresponding to the k coefficients are processed are processed by parallel operations, each parallel operation being performed over a number of clock cycles independent of k during the plurality of operations, wherein at least one of the parallel operations is a finite field addition or multiplication of two arbitrary elements of GF (p.sup.k) and the hardware register is arranged such that each parallel operation treats the k coefficients independently, and instructions within the computer readable medium for storing the field element in the single, hardware register utilizing a single guard bit between each group of binary data bits to avoid carry bit problems, wherein the single hardware register is a w-bit register and w is greater than or equal to k(m+1) where m is the logarithm to base 2 of p, rounded up to the nearest integer.
22. The computer instructions of claim 21, further comprising instructions within the computer readable medium for loading each element of the finite field into the single, hardware register.
23. The computer instructions of claim 21, wherein the processing unit comprises a plurality of hardware registers.
24. The computer instructions of claim 21, wherein the processing unit is a general purpose w-bit arithmetic logic unit (ALU) and is adapted for processing the k groups of binary data bits according to one or more operations including a shift operation, an addition operation, a subtraction operation, and a logical AND operation.
25. The computer instructions of claim 21, further comprising: instructions within the computer readable medium for storing the binary data comprising a field element in two hardware registers, and carrying out operations by coordinating a right shift by m bits over all the binary data such that the least significant bit in one of the two registers is shifted to the most-significant-bit side of the other of the two registers.
26. The computer instructions of claim 21, further comprising instructions within the computer readable medium for storing the k groups of data bits in the single, hardware register such that at least one guard bit is inserted adjacent to the most significant bit of each group of data bits, each group of data bits being separated from an adjacent group of data bits by a corresponding at least one guard bit.
27. The computer instructions of claim 26, further comprising assigning an initial value of zero to each at least one guard bit.
28. The computer instructions of claim 26, further comprising instructions within the computer readable medium for positioning the at least one guard bit adjacent to the most significant bit of each group of data bits.
29. The computer instructions of claim 26, further comprising instructions within the computer readable medium for positioning multiple guard bits adjacent to the most significant bit of each group of data bits.
30. The computer instructions of claim 21, further comprising: instructions within the computer readable medium for calculating a syndrome utilizing a segment of the binary data which comprises k groups of data bits, wherein a field element of the finite field GF(p.sup.k), p being an odd prime number, the field element comprising k coefficients in accordance with a polynomial basis representation, each one of the k groups of data bits of the segment representing a corresponding one of the k coefficients, wherein said segment is stored in a first register and is processed such that the k groups of data bits of the segment are processed in parallel, instructions within the computer readable medium for determining whether the syndrome is equal to zero, and instructions within the computer readable medium for detecting and correcting errors in the binary data if the syndrome is not equal to zero.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
The present application is related to U.S. patent application entitled "Efficient arithmetic in finite fields of odd characteristic on binary hardware", Ser. No. 10/271,730, and to U.S. patent application entitled "Cryptography using finite fields of odd characteristic on binary hardware", Ser. No. 10/271,947, both filed even date herewith, the disclosures of which are incorporated herein by reference in their entirety.
BACKGROUND
1. Field of the Invention
The present invention relates to methods and apparatuses for efficiently carrying out computations in finite fields of odd prime characteristic on binary hardware. The invention is particularly useful for carrying out such computations in cryptography and in error correction, but is not limited to such uses.
2. Background Information
Some Basic Aspects of Finite Fields
A finite field (also called a Galois field) is a finite algebraic structure, possessing two well-defined operations: an "addition" and a "multiplication". A finite field with N elements exists if and only if N is the power of a prime number, i.e. N=p.sup.n for some prime p=2, 3, 5, . . . such as discussed in R. Lidl and H. Niederriter, Introduction to Finite Fields and Their Applications, Cambridge University Press, Cambridge, Revised ed., 1994. This field is unique up to an isomorphism and is normally denoted GF(p.sup.n). For a prime p, the ground field GF(p) is simply the integers under addition and multiplication modulo p. In general, if F is a field of q=p.sup.k elements (i.e. F=GF(p.sup.k)), the extension field of degree l can be defined, denoted as F[t]/(f(t)), where f(t) is a polynomial of degree l, irreducible over F. This extension field may also be referred to as GF(p.sup.lk). This then gives (the unique) finite field of q.sup.l elements. In other words, this is the field of p.sup.lk=p.sup.n elements. The number p is called the characteristic of the field. The well-known fact that the two fields of the same size are isomorphic does not necessarily mean that the mapping between the fields is trivial. However, constructions of such mappings are not necessary for the present invention and, in any event, are within the purview of one of ordinary skill in the art and are discussed in textbooks, such as Introduction to Finite Fields and Their Applications referred to above.
There are two predominant ways to represent a finite field. One representation is the normal basis representation well known to those of ordinary skill in the art and such as described in Introduction to Finite Fields and Their Applications referred to above. The main advantage with a normal basis is that it facilitates multiplying elements by themselves, i.e. squaring-type operations. The normal basis representation is not discussed further here. Some computational aspects associated with normal basis representations are discussed in U.S. Pat. No. 4,587,627 (Computational method and apparatus for finite field arithmetic), U.S. Pat. No. 4,567,600 (Method and apparatus for maintaining the privacy of digital messages conveyed by public transmission), and U.S. Pat. No. 5,854,759 (Method and apparatus for efficient finite field basis conversion), the entire contents of each of which are incorporated herein by reference.
Another representation is known as the polynomial basis representation. In this representation, field elements of GF(p.sup.k) may be thought of as polynomials of degree at most k-1 whose coefficients are field elements of the ground field GF(p), i.e., integers in the set (0, . . . , p-1). A typical element, .gamma., in the field can therefore be expressed as .gamma.=.gamma..sub.k-1t.sup.k-1+ . . . +.gamma..sub.1t+.gamma..sub.0, (1) for some integers .gamma..sub.i where
0.ltoreq..gamma..sub.i.ltoreq.p-1, and where t is a formal variable. The field element .gamma. may also be viewed as the k-dimensional vector (.gamma..sub.k-1, . . . , .gamma..sub.1, .gamma..sub.0), and the polynomial basis representation as referred to herein is intended to encompass this view. Another aspect of the polynomial basis representation is the choice of a polynomial h(t) of degree k and irreducible over GF(p) that is utilized in multiplication of field elements. This will be discussed in greater detail below. Because any two fields of the same size are isomorphic, it does not matter which irreducible h(t) is chosen. From system point of view, h(t) is a system parameter that is agreed upon for the particular use in mind.
As noted above, an extension field of degree l over the field F=GF(p.sup.k) can be denoted as F[t]/(f(t)) or as GF(p.sup.lk). An element of the extension field can be viewed as a polynomial of degree at most l-1 whose coefficients are elements of GF(p.sup.k). In other words, an element of the extension field may be viewed as a polynomial with other polynomials as field coefficients. An element .gamma. of the extension field can be written as .gamma.=.gamma..sub.l-1t.sup.l-1+ . . . +.gamma..sub.1t+.gamma..sub.0, (2) where each .gamma..sub.J is a polynomial of degree at most k-1 having coefficients in the set (0, . . . , p-1). Thus, the polynomials .gamma..sub.j can be written as .gamma..sub.J=.gamma..sub.k-1,Ju.sup.k-1+ . . . +.gamma..sub.1,ju+.gamma..sub.0,j (3) where another formal variable, u, has been chosen for these polynomials to avoid confusing them with the extension-field polynomial, whose formal variable is t. This extension-field formulation using a polynomial basis representation will be used to describe the present invention.
The sum of two elements .alpha., .beta. in GF(p.sup.k) is defined by simply adding the corresponding polynomials (or, equivalently, vectors): .alpha.+.beta.=(.alpha..sub.k-1+.beta..sub.k-1)t.sup.k-1+ . . . +(.alpha..sub.1+.beta..sub.1)t+(.alpha..sub.0+.beta..sub.0), (4) where each (integer) coefficient (.alpha..sub.i+.beta..sub.i) is computed modulo p. The complexity (in terms of the number of modulo-p operations) of adding two elements by directly using the definition in equation 4 above is equal to k. For example, for the finite field GF(3.sup.2) where p=3, a field element .alpha.=(2, 1) in vector notation can be written as the polynomial .alpha.=2t+1, and a field element .beta.=(2, 2) in vector notation can be written as the polynomial .beta.=2t+2. The sum of these field elements is (.alpha.+.beta.)=(2+2)t+(1+2) where each coefficient is evaluated modulo 3 (mod 3). Thus, the sum reduces to (.alpha.+.beta.)=t because 4 mod 3=1 and 3 mod 3=0. In vector notation, the sum is (1, 0).
The product of two field elements is defined by forming their product modulo h(t), where h(t) is a polynomial of degree k and irreducible (i.e., cannot be factored) over GF(p): .alpha..beta.=.delta..sub.2k-2t.sup.2k-2+.delta..sub.2k-3t.sup.2k-3+ . . . +.delta..sub.1t+.delta..sub.0 mod h(t) (5) where .delta..sub.i=.SIGMA..sub.j.alpha..sub.j.beta..sub.i-J mod p. Here "mod h(t)" means taking the remainder when dividing by h(t), using standard polynomial division. This leaves the result with a degree strictly less that that of h(t), i.e. less than k, as desired. The complexity of multiplying two elements according to this definition is clearly on the order of k.sup.2. Alternatively, using the Karatsuba algorithm known to those of ordinary skill in the art, multiplication can (asymptotically in k) be performed with roughly k.sup.1.6 operations, but this algorithm involves more administration of the computations. The Karatsuba algorithm is, therefore, only beneficial for large values of k, for example, k>100, as noted in .sctn.4.4.3 of D. Knuth, Seminumerical Algorithms, Vol. 2 of The Art of Computer Programming, 2.sup.nd ed, Addison-Wesley, Reading, Mass., 1981.
As an example, to multiply the field elements .alpha.=(2, 1) and .beta.=(2, 2) of finite field GF(3.sup.2), a polynomial h(t) of degree k=2 and irreducible over GF(3) must be chosen, and the polynomials 2t+1 and 2t+2 are then multiplied modulo h(t). An appropriate irreducible polynomial is h(t)=t.sup.2+t+2. Then, .alpha..beta.=(4t.sup.2+6t+2)mod h(t)=4(t.sup.2+t+2)+2t-6=2t (because 2 mod 3=2 and 6 mod 3=0). Thus, .alpha..beta.=2t or (2, 0) in vector notation.
For an extension field (also referred to as a composite field), the formulas for addition and multiplication are the same. However, it is recognized that all coefficient-wise operations are carried out over the ground field, which may itself involve polynomial arithmetic.
Subtraction in a finite field can be done by simply noting that in the field GF(p), the negative of an element x is p-x. Thus, an element x can be replaced with p-x to obtain the negative, and then normal coefficient-wise addition may be carried out to obtain the subtraction. Division can be carried out by multiplying by the inverse as known to those skilled in the art.
Conventional Utilization of Finite Fields
The use of finite fields is central to many applications. In particular, for communication purposes, finite fields are very useful. For example, by embedding messages into a finite field, one can transmit messages so that errors introduced by the transmission medium can be corrected at the receiver end. This is the principle behind error correcting codes. In addition, finite fields can be used to achieve protection (confidentiality, integrity, origin authentication, and non-repudiation) for messages by means of encryption, message authentication, and digital signatures.
To be useful, these coding and encryption operations involving finite fields must be as efficient as possible, especially if the computations are done on a lightweight platform such as a mobile phone or other handheld device. For instance, many cryptographic methods use the following exponentiation operation
.function..times..times..times..times..times..times. ##EQU00001## where g is an element in the multiplicative group of a finite field, x is an integer and "" denotes multiplication in the finite field. The reason for using the exp.sub.g(x) function is that exp.sub.g(x) can be computed with only approximately (log.sub.2 x).sup.3 field multiplications in the ground field, but no efficient (i.e. polynomial-time in log.sub.2 x) algorithm exists for the inverse transformation--finding x from exp.sub.g(x). The latter is known as the discrete logarithm problem. In other words, exp.sub.g(x) is a strong candidate for a so-called one-way function--a function easy to compute, but hard to invert. The discrete logarithm problem is well known to those of ordinary skill in the art and is discussed, for example, in Handbook of Applied Cryptography by A. Menezes, P. van Oorschot, and S. A. Vanstone, CRC Press, Boca Raton, Fla., 1997.
However, on a computationally weak platform, even (log.sub.2 x).sup.3 multiplications may be computationally excessive, and for currently recommended field sizes (e.g., key size) such computations might in many situations take about 30 seconds, for example. A conventional way to improve performance is to restrict the computations to binary finite fields (fields of characteristic two). Restricting computations to binary finite fields improves performance because most available hardware is binary in nature (e.g., CPUs, etc.). Therefore, field operations can be composed of elementary binary operations, such as bitwise XORs, which are directly and efficiently supported by the hardware.
In addition, methods have been devised to improve efficiency by carrying out computations using a binary extension field whose extension degree is a composite number (non-prime), as disclosed in E. De Win, A. Bosselaers, S. Vanderberghe, P De Gersem, and J. Vandewalle, "A fast Software Implementation for Arithmetic Operations in GF(2.sup.n)", Advances in Cryptology, Proceedings of Asiacrypt '96, LNCS 1163, Springer-Verlag, Berlin, 1996, pp. 65 76 (hereinafter "De Win et al."). In the De Win et al. method, a standard binary hardware architecture is assumed to be able to perform operations (normal arithmetic and bit-operations) on k bit quantities (i.e., the word length is k bits). It is further noted that for an even characteristic (binary) field where p=2, forming remainders modulo 2 can be done by a simple bit operation.
When n is not a prime number, the finite field GF(2.sup.n) is viewed as a "non-trivial" extension of degree l over GF(2.sup.k), where n=lk, and l,k>1. Thus, an element in the field can be written as .gamma.=.gamma..sub.l-1t.sup.l-1+ . . . +.gamma..sub.1t+.gamma..sub.0, (7) where each .gamma..sub.i is an element of GF(2.sup.k). Adding field elements .alpha. and .beta. in this representation can be done by carrying out the operation .alpha.+.beta.=(.alpha..sub.l-1+.beta..sub.l-1)t.sup.l-1+ . . . +(.alpha..sub.1+.beta..sub.1)t+(.alpha..sub.0+.beta..sub.0). (8) Since .alpha..sub.i, .beta..sub.i are elements of GF(2.sup.k), their sum, .alpha..sub.1+.beta..sub.1 can be computed as the bitwise XOR between the .alpha..sub.i and .beta..sub.i. Thus, if k is small enough to fit in a hardware register (typically k.ltoreq.32), k additions can be performed in parallel using only one operation in hardware, and a factor of k is gained in the speed of executing the addition.
Multiplication using the De Win et al. method is carried out noting that the multiplicative group of GF(2.sup.k) (or any other finite field) is always cyclic, meaning that there is an element g in GF(2.sup.k) so that any non-zero element, .alpha..sub.j, in the field can be written as .alpha..sub.j=g.sup.x for some integer 0.ltoreq.x<2.sup.k-1 (i.e., x is the discrete logarithm of .alpha..sub.j, and g is known as the generator). If k is moderately large (e.g., k.ltoreq.16), the generator g can be found by exhaustive search. Also, in this case (e.g., k.ltoreq.16), a table, ANTILOG{x}, of g.sup.x for all x where 0.ltoreq.x<2.sup.k-1 can be formed. In addition, a table for the discrete logarithms, DLOG{.alpha..sub.j}, for all non-zero .alpha..sub.J in the field GF(2.sup.k) can also be formed. That is, ANTILOG{DLOG{.alpha..sub.J}}=.alpha..sub.j (9) and DLOG{ANTILOG{x}}=x (10) for all such .alpha..sub.j and x. The product of .alpha. and .beta. in GF(p.sup.n) is computed in accordance with the equation .alpha..beta.=.delta..sub.2l-2t.sup.2l-2+.delta..sub.2l-3t.sup.2l-3+ . . . +.delta..sub.1t+.delta..sub.0 mod f(t) (11) where .delta..sub.i=.SIGMA..sub.j.alpha..sub.j.beta..sub.i-j is computed as a sum of products, and all operations take place in the field GF(2.sup.k). Given that g.sup.xg.sup.y=g.sup.x+y, each term .alpha..sub.j.beta..sub.i-j can be computed by three table look-ups in the above-noted pre-computed tables in accordance with the equation .alpha..sub.J.beta..sub.i-J=ANTILOG{DLOG{.alpha..sub.j}+DLOG{.beta..sub.i- -j} mod(2.sup.k-1)}. (12) The memory requirement is about k2.sup.k-2 bytes, and the number of operations to perform the multiplication is on the order of l.sup.2=(n/k).sup.2. A factor of k.sup.2 is thus gained in speed. The approach requires pre-computation of the tables and requires memory to store those tables. If k is moderate (e.g., k.ltoreq.16), it is feasible to use this method using on the order of 2.sup.k pre-computation operations.
In contrast, for finite fields of odd characteristic p where p is an odd prime, the situation is more complicated than for binary finite fields because the basic operations needed for odd-characteristic finite fields are not modulo-2 operations (bit-operations) but, rather, modulo-p operations. The De Win et al. addition method as described therein, for example, is not applicable to finite fields of odd characteristic (p=3, 5, 7, . . . ), and no similar method for finite fields of odd characteristic has been reported to the knowledge of Applicants. Carrying out odd-characteristic finite-field computations in a conventional manner involves modular arithmetic, which requires long divisions. Most hardware supports modular arithmetic, but only on a word-oriented level. Thus, the above-noted optimizations for computations involving binary finite fields are not realized for computations involving odd-characteristic finite fields.
For the above noted reasons, binary finite fields have been the most widely used finite fields in error correction and cryptography. However, Applicants note that restricting such computations to binary fields can have drawbacks. For example, algorithms for inverting the exp.sub.g(x) function noted above are more efficient if the field has characteristic two (a binary field) than if the field has a characteristic that is odd. Thus, the cryptographic strength of the function exp.sub.g(x) may be expected to be less for binary fields than for general odd-characteristic finite fields. Indeed, it has recently been suggested that implementing cryptography using finite fields of odd characteristic and composite degree can provide enhanced cryptographic security compared to other cryptographic approaches involving finite fields, and that the gains in cryptographic security can be expected to outweigh the computational costs of such computations (see K. Rubin and A. Silverberg, "Supersingular Abelian Varieties in Cryptology", Crypto 2002, Lecture Notes in Computer Science, Vol. 2442, ed. M. Jung, Springer-Verlag, Berlin, pp. 336 353, 2002). In addition, in the case of binary fields of composite degree where the optimizations described in the De Win et al. article referred to above are applicable, attacks on elliptic curve cryptosystems over such fields have been recently found as described in N. P. Gaudry, F. Hess, and N. P. Smart "Constructive and Destructive Facets of Weil Descent on Elliptic Curves", Technical Report CSTR-00-016, Department of Computer Science, University of Bristol, October 2000, and in N. P. Smart, "How secure are elliptic curves over composite extension fields?", Technical Report CSTR-00-017, Department of Computer Science, University of Bristol, November 2000. Thus, it is advisable to avoid such binary fields of composite degree for encryption. These attacks are much less effective if the finite field has odd characteristic (even if the degree is non-prime), so they are not a relevant threat in that case. However, as noted above, utilizing conventional computational methods involving odd-characteristic finite fields requires sacrificing the computational optimizations that would otherwise be gained using a binary finite field structure.
SUMMARY OF THE INVENTION
Applicants have recognized a need for a computational approach that enables speeding up computations involving basic finite field operations (e.g., addition, multiplication, etc.) for non-binary finite fields even if the available hardware is binary in nature and that reduces need for special modulo-p hardware. In addition, Applicants have recognized a need for a computational approach for non-binary finite fields that utilizes register space more efficiently than conventional methods. For example, it is possible to perform conventional modulo-p arithmetic using a 32-bit CPU, but if p is small (e.g., p=3 or p=7) it is inefficient to devote 32 bits of register space for the operations since the involved quantities (field element coefficients) will only have 2 or 3 significant bits. Applicants have recognized that it would be desirable to make more efficient use of the available register space given that the numbers involved are quite small. The present invention fulfils these and other needs and provides advantages as will become apparent to those of ordinary skill in the art upon reading the detailed description in conjunction with the accompanying drawings.
It should be emphasized that the terms "comprises" and "comprising", when used in this specification, are taken to specify the presence of stated features, integers, steps or components; but the use of these terms does not preclude the presence or addition of one or more other features, integers, steps, components or groups thereof.
In one exemplary aspect of the invention, there is provided an error-correction apparatus comprising an input device (e.g., an input/output device) and a processing unit configured to execute a plurality of operations on binary data intended to represent an allowed code word of an error-correcting code to detect and correct errors in the binary data. A portion of the binary data comprises k groups of data bits and represents a field element of a base field GF(p.sup.k), the field element of the base field GF(p.sup.k) having k base coefficients in accordance with a polynomial basis representation. The value p is an odd prime number. Each group of data bits represents a corresponding one of the k coefficients. The portion of the binary data is stored in a register and is processed by the processing unit such that the k groups of data bits are processed in parallel during at least some of said plurality of operations.
In another exemplary aspect of the present invention, there is provided a method of error-correction. The method comprises receiving binary data intended to represent an allowed code word of an error-correction code and calculating a syndrome based upon the binary data, wherein a given portion of the binary data comprises k groups of data bits and represents a field element of the finite field GF(p.sup.k). The value p is an odd prime number, and the field element comprises k coefficients in accordance with a polynomial basis representation, each group of data bits of the given portion representing a corresponding one of the k coefficients. The given portion of the binary data is stored in a first register and is processed such that the k groups of data bits of the given portion are processed in parallel. The method further comprises determining whether the syndrome is equal to zero. The method also comprises detecting and correcting errors in the binary data if the syndrome is not equal to zero. The detecting and correcting can be carried out by processing an error-containing portion of the binary data such that k groups of data bits of the error-containing portion of the binary data are processed in parallel. In another aspect of the present invention, there is provided an apparatus comprising a memory and a processing unit coupled to the memory for executing the steps of the method. In another aspect of the present invention, there is provided a computer-readable carrier adapted to program a computer to execute the steps of the method. Exemplary forms of a computer-readable carrier include solid-state memory, magnetic disk, optical disk or modulated wave containing an appropriate set of computer instructions that would cause a processor to carry out the above-noted steps. A modulated wave can be, for example, a radio frequency modulated wave, an audio frequency modulated wave, an optical frequency modulated wave, or a modulated binary bit stream that can be downloaded via a network connection or modem.
As used herein, the terminology "in accordance with a polynomial basis representation" is intended to include any representation mathematically equivalent to a polynomial basis representation including, for example, a vector representation.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram illustrating a system for carrying out computations involving field elements of an odd-characteristic finite field according to an exemplary aspect of the present invention.
FIG. 2A is a schematic illustration of a hardware register with a data storage scheme configured in a single-guard-bit representation according to an exemplary aspect of the present invention for the example of GF(3.sup.10).
FIG. 2B is another schematic illustration of a hardware register with a data storage scheme configured in a single-guard-bit representation according to an exemplary aspect of the present invention for the example of GF(7.sup.5).
FIG. 2C is a schematic illustration of a hardware register with a data storage scheme configured in a multiple-guard-bit representation according to an exemplary aspect of the present invention for the example of GF(3.sup.8).
FIG. 3 is a flow diagram illustrating a method of processing binary data representing field elements of an odd-characteristic finite field according to an exemplary aspect of the present invention.
FIG. 4 is a flow diagram illustrating a method of processing binary data in order to determine the sum of two field elements where p=2.sup.m-1 in accordance with the method illustrated in FIG. 3 according to an exemplary aspect of the present invention.
FIG. 5 is a schematic illustration of register contents for an example of addition in GF(3.sup.10) in accordance with the method illustrated in FIG. 4.
FIG. 6 is a functional block diagram of a hardware apparatus for carrying out computations involving field elements of an odd-characteristic finite field where p=2.sup.m-1 according to an exemplary aspect of the present invention.
FIG. 7 is another functional block diagram of a hardware apparatus for carrying out computations involving field elements of an odd-characteristic finite field where p=2.sup.m-1 according to another exemplary aspect of the present invention.
FIG. 8A is a schematic illustration of an exemplary guard-bit insertion circuit for use in conjunction with the apparatus illustrated in FIG. 7 according to an exemplary aspect of the present invention.
FIG. 8B is a schematic illustration of an exemplary guard-bit removal circuit for use in conjunction with the apparatus illustrated in FIG. 7 according to an exemplary aspect of the present invention.
FIG. 9 is a flow diagram illustrating a method of processing binary data representing field elements of an odd-characteristic finite field in order to determine the product of those elements according to an exemplary aspect of the present invention.
FIG. 10A is a schematic illustration of a DLOG look-up table for use in the method illustrated in FIG. 9 according to an exemplary aspect of the present invention.
FIG. 10B is an indexing table that reflects the finite-field elements a(t) and corresponding generator powers n corresponding to the binary information illustrated in FIG. 10A.
FIG. 11A is a schematic illustration of an ANTILOG look-up table for use in the method illustrated in FIG. 9 according to an exemplary aspect of the present invention.
FIG. 11B is an indexing table that reflects the finite-field elements a(t) and corresponding generator powers n corresponding to the binary information illustrated in FIG. 11A.
FIG. 12 is a functional block diagram illustrating a hardware apparatus for carrying out multiplication of field elements of an odd-characteristic finite field according to an exemplary aspect of the present invention.
FIG. 13 is a schematic illustration of a compression operation for compressing binary data stored in a register in a multiple-guard-bit representation according to an exemplary aspect of the present invention.
FIG. 14 is a flow diagram illustrating a method of processing binary data in order to determine the sum of two field elements where p=2.sup.m+1 in accordance with the method illustrated in FIG. 3 according to an exemplary aspect of the present invention.
FIG. 15 is a schematic illustration of register contents for an example of addition in GF(5.sup.6) in accordance with the method illustrated in FIG. 14.
FIG. 16 is a functional block diagram of a hardware apparatus for carrying out computations involving field elements of an odd-characteristic finite field where p=2.sup.m+1 according to an exemplary aspect of the present invention.
FIG. 17 is another functional block diagram of a hardware apparatus for carrying out computations involving field elements of an odd-characteristic finite field where p=2.sup.m+1 according to another exemplary aspect of the present invention.
FIG. 18 is a flow diagram illustrating a method of processing binary data in order to determine the sum of two field elements where p=2.sup.m-d and d.ltoreq.(2.sup.m+1)/3 in accordance with the method illustrated in FIG. 3 according to an exemplary aspect of the present invention.
FIG. 19 is a flow diagram illustrating a method of processing binary data in order to determine the sum of two field elements where p=2.sup.m-d and (2.sup.m+1)/3<d<2.sup.m-1 in accordance with the method illustrated in FIG. 3 according to an exemplary aspect of the present invention.
FIG. 20 is a flow diagram illustrating a method of processing binary data in order to determine the sum of two field elements where p=2.sup.m+d and d.ltoreq.p/6 in accordance with the method illustrated in FIG. 3 according to an exemplary aspect of the present invention.
FIG. 21 is a flow diagram illustrating a method of processing binary data in order to determine the sum of two field elements where p=2.sup.m+d and p/6<d<2.sup.m-1 in accordance with the method illustrated in FIG. 3 according to an exemplary aspect of the present invention.
FIG. 22 is a block diagram of a system for carrying out error correction according to an exemplary aspect of the present invention.
FIG. 23 is a flow diagram illustrating a method for carrying out error correction according to an exemplary aspect of the present invention.
FIG. 24A is a functional block diagram illustrating a system for carrying out encryption/decryption according to an exemplary aspect of the present invention.
FIG. 24B is a flow diagram illustrating an exemplary cryptographic method according to the present invention.
FIG. 25 is a flow diagram illustrating an exemplary method for carrying out key exchange according to the present invention.
FIG. 26 is a flow diagram illustrating an exemplary method of public-key cryptography according to the present invention.
FIG. 27 is a flow diagram illustrating an exemplary method of public-key cryptography according to the present invention.
FIG. 28 is a flow diagram illustrating an exemplary method of public-key cryptography according to the present invention.
DETAILED DESCRIPTION OF THE INVENTION
The present invention provides approaches for efficiently carrying out arithmetic and logical operations involving elements of the finite field GF(p.sup.lk) (an extension field) where p is an odd prime number. As will be discussed in detail below, one aspect of the present invention addresses how data representing elements of the field GF(p.sup.k) are stored in binary hardware and how arithmetic operations are then carried out efficiently. As referred to herein, the finite field GF(p.sup.k) should be understood to mean an odd-characteristic finite field wherein the characteristic p is an odd prime number.
Various aspects of the invention will be described below in greater detail in connection with a number of exemplary embodiments. To facilitate an understanding of the invention, many aspects of the invention are described in terms of actions to be performed by elements of a computer system. Further, it will be recognized that in each of the embodiments, the various actions could be performed by specialized circuits (e.g., discrete logic gates interconnected to perform a specialized function), by program instructions being executed by one or more processors, or by a combination of both. Moreover, the invention can additionally be considered to be embodied entirely within any form of computer-readable carrier such as solid-state memory, magnetic disk, optical disk or modulated wave containing an appropriate set of computer instructions that would cause a processor to carry out the techniques described herein. A modulated wave can be, for example, a radio frequency modulated wave, an audio frequency modulated wave, an optical frequency modulated wave, or a modulated binary bit stream that can be downloaded via a network connection or modem. Thus, the various aspects of the invention may be embodied in many different forms, and all such forms are contemplated to be within the scope of the invention. For each of the various aspects of the invention, any such form of embodiment may be referred to herein as "logic configured to" perform a described action, or alternatively as "logic that" performs a described action.
Before addressing aspects of the invention pertaining to computations involving elements of GF(p.sup.k) themselves, algorithms that relate arithmetic operations in the field GF(p.sup.k) to arithmetic operations in the extension field GF(p.sup.lk) will first be described.
Given a polynomial f(t) of degree l, irreducible over GF(p.sup.k), and given that .alpha.(=.SIGMA..sub.i.beta..sub.ix.sup.i, .alpha..sub.i in GF(p.sup.k)) and .beta.(=.SIGMA..sub.i.beta..sub.ix.sup.1, .beta..sub.1 in GF(p.sup.k)) are elements of GF(p.sup.lk) to be operated on at a high level, algorithms for addition, SUM(.alpha., .beta.), and multiplication, PRODUCT(.alpha., .beta.), in GF(p.sup.lk) are provided below. The notation GF_p_k_<op>(.alpha..sub.i, .beta..sub.j) in these algorithms denotes a procedure carrying out the operation <op> (add, multiply, etc.) on field elements .alpha..sub.i and .beta..sub.J in the field GF(p.sup.k).
First, an addition algorithm, denoted SUM(.alpha., .beta.), that relates the addition of elements .alpha. and .beta. of the extension field GF(p.sup.k) to computations to be carried out in the field GF(p.sup.k) is given below.
SUM(.alpha., .beta.): for i=0 to l-1 do .delta..sub.1=GF_p_k_ADD(.alpha..sub.1, .beta..sub.i) end
return .delta..sub.l-1t.sup.l-1+.delta..sub.l-2t.sup.l-2+ . . . +.delta..sub.1t+.delta..sub.0
where GF_p_k_ADD will be described in detail below.
In addition, a multiplication algorithm, denoted PRODUCT(.alpha., .beta.), that relates the multiplication of elements .alpha. and .beta. of the extension field GF(p.sup.lk) to computations to be carried out in the field GF(p.sup.k) is now described. Here it is assumed that necessary initializations of DLOG and ANTILOG tables have already been made. Forms of the DLOG and ANTILOG tables will be described below. In addition, exemplary DLOG and ANTILOG tables are given in FIGS. 10A and 11A for a simple illustration for GF(3.sup.2) to be described later.
TABLE-US-00001 PRODUCT(.alpha., .beta.): for i=0 to 2l-2 do .delta..sub.1=0 for j=max(0,i-l+1) to min(i, l-1) do .delta..sub.1=GF_p_k_ADD(.delta..sub.i, GF_p_k_MUL(.alpha..sub.j, .beta..sub.1-j)) end end return REDUCE(.delta..sub.2l-2 t.sup.2l-2
+ .delta..sub.2l-3 t.sup.2l-3 + . . . + .delta..sub.1t + .delta..sub.0, f(t))
where GF_p_k_MUL and REDUCE(.delta., f) (the latter computing z(t) mod f(t)) will be described in detail below.
It should be noted that the above multiplication algorithm is merely one example of possible multiplication algorithms. For large values of l (e.g., l>100), faster performance may be obtained by using Karatsuba's method instead of the simple PRODUCT algorithm above. Karatsuba's method is known to those of ordinary skill in the art and is described, for example, in Seminumerical Algorithms referred to above.
Finally, a reduction operation "mod f(t)" necessary for completing the multiplication algorithm, PRODUCT(.alpha., .beta.), can be done with a well-known algorithm given below and denoted as REDUCE(.delta., f). This algorithm can also make use of the present inventive approach for efficient arithmetic in the field GF(p.sup.k) to be described. For computational efficiency, f(t) can be chosen to be "sparse", meaning that f(t) has only a few non-zero coefficients (e.g., 3 non-zero coefficients). In this case, f(t) has form f(t)=f.sub.lt.sup.l+f.sub.jt.sup.j+f.sub.0 for some j between l and 0. It should be noted, however, that it is not necessary in general for f(t) to have such a sparse. For any value of l, an irreducible polynomial f(t) of degree l can be readily found by methods known to those of ordinary skill in the art. A general approach for determining an irreducible polynomial f(t) may be found in Seminumerical Algorithms referred to above, for example. With these comments in mind, the reduction algorithm, denoted REDUCE(.delta., f), is as follows.
REDUCE(.delta., f) tmp1=GF_p_k_MUL(GF_p_k_INVERSE(f.sub.l), f.sub.0) tmp2=GF_p_k_MUL(GF_p_k_INVERSE(f.sub.l), f.sub.j) for i=2l-2 downto 1 do .delta..sub.i-l=GF_p_k_SUB(.delta..sub.1-l, GF_p_k_MUL(tmp1, .delta..sub.i)) .delta..sub.i-l+J=GF_p_k_SUB(.delta..sub.1-l+j, GF_p_k_MUL(tmp2, .delta..sub.i)) end
return .delta..sub.l-1t.sup.l-1+.delta..sub.l-2t.sup.l-2+ . . . +.delta..sub.1t+.delta..sub.0.
The REDUCE algorithm above is just a normal polynomial division algorithm adapted for the special form of f(t) given above. It should be noted that tmp1 and tmp2 can be pre-computed because they are fixed once the representation is given, that is, once f(t) is defined. The function GF_p_k_SUB refers to field subtraction in the field GF(p.sup.k), and the function GF_p_k_INVERSE refers to multiplicative inverse computation, both of which are easily implemented given algorithms for GF_p_k_ADD and GF_p_k_MUL and both of which will be described below.
An exemplary apparatus 100 for executing the above-noted algorithms and for implementing other aspects of the invention will now be described with reference to the block diagram of FIG. 1. The apparatus 100 comprises a memory 101 and a processing unit 105 coupled to the memory 101. The apparatus 100 can also comprise an input/output device 103. The processing unit 105 comprises a plurality of registers 107 121, which are controlled by logic circuits (not shown) within the processing unit 105. The processing unit 105 can communicate with the input/output device 103 and the memory 101 via electrical connections (e.g., electrical buses) represented by the arrows shown in FIG. 1. It is also possible for the processing unit 105 to communicate with external registers (not shown) located outside the processing unit 105.
The processing unit 105 can be, for example, any conventional type of processing unit, such as a Pentium-class processor or other CPU typically found in personal computers, or it may be a special purpose processor, such as may be found in wireless phones or other handheld devices. It is common for conventional processors used in personal computers to have eight general purpose registers, such as illustrated by the eight registers 107 121 in FIG. 1 (also denoted as registers a h). The registers 107 can be, for example, 8-bit registers, 16-bit registers, 32-bit registers, 64-bit registers, etc. Present generation processors for conventional personal computers commonly have 32-bit registers.
The memory 101 can be, for example, any suitable memory capable of storing computer programs, such as a magnetic disk, a CD ROM, a magneto-optical disk, a flash memory, or other types of memory. In addition to storing computer programs, the memory 101 can also be used to store intermediate or final computational results generated by the processing unit 105 and can also be used to store look-up tables to be utilized during computations.
The input/output device 103 can be, for example, any suitable device for passing data to and/or from the processing unit 105, such as a hard-wired modem or network interface, a wireless modem, a second memory, an analog-to-digital/digital-to-analog (AD/DA) converter, or other similar types of devices. Separate input and output devices can be utilized in place of a combined input/output device if desired. In addition, the input/output device 103 can be configured to perform guard-bit insertion and guard-bit removal. Guard-bit insertion and guard-bit removal are described later in relation to FIGS. 8A and 8B, for example.
In one aspect, the memory 101 can store one or more computer programs, and the processing unit 105 can access the memory 101 to execute steps of the computer program(s). These computer programs can include, for example, programs representing the algorithms noted above and programs implementing other aspects of the invention as described below.
In addition, although a single processing system 100 having a single processing unit 105 is shown in FIG. 1, it should be understood that the processing system 100 can comprise multiple processing units 105. Moreover, it is possible to embody the present invention using multiple processing systems instead of a single processing system 100.
The remainder of the detailed description will focus on describing the inventive approaches for storing binary data representing field elements of GF(p.sup.k) in hardware registers and for executing operations on such binary data in a manner to enhance the speed of arithmetic computations involving field elements of GF(p.sup.k). In this regard, descriptions of the algorithms GF_p_k_ADD and GF_p_k_MUL, which provide for adding and multiplying field elements of the field GF(p.sup.k), will be described. In addition, other apparatuses for implementing the approaches will also be described.
According to one aspect of the invention, the apparatus 100 illustrated in FIG. 1 can be used to carry out computations involving field elements of an odd-characteristic finite field GF(p.sup.k) in a manner that enhances computational efficiency compared to conventional approaches for carrying out computations involving field elements of odd-characteristic finite fields. In particular, the processing unit 105 is configured (e.g., programmed) to store binary data representing at least a portion of a field element of an odd-characteristic finite field GF(p.sup.k) in a register, such as register 107 shown in FIG. 1, wherein p is an odd prime number and wherein the field element comprises k coefficients in accordance with a polynomial basis representation. The processing unit 105 and the register can be viewed as means for storing binary data representing at least a portion of a field element of GF(p.sup.k). The binary data comprise plural groups of data bits, wherein each group of data bits represents an associated one of the k coefficients. Thus, binary data representing multiple coefficients of a field element of the odd-characteristic finite field GF(p.sup.k) are packed into a single hardware register according to an aspect of the present invention. In contrast, conventional approaches for carrying out computations involving field elements of odd-characteristic finite fields merely place binary data representing a single coefficient of an odd-characteristic finite field into a single hardware register.
In addition, the processing unit 105 is also configured to execute at least one operation on the contents of the above-noted register 107 such that the plural groups of data bits are processed in parallel. For example, one or more operations can include a shift operation, an addition operation, a binary subtraction operation, a logical AND operation, and a NOT operation (logical negation) to name a few. In this regard, the processing unit 105 can be viewed as means for executing at least one operation on the binary data such that the plural groups of data bits are processed in parallel. Thus, by storing binary data representing multiple coefficients of a field element of GF(p.sup.k) in a single hardware register and by processing the plural groups of data bits in parallel, the speed of computations according to the present invention can be greatly increased compared to conventional methods for computations involving field elements of odd-characteristic finite fields. For example, if all k coefficients of a field element of GF(p.sup.k) are represented in a single hardware register, such as register 107 shown in FIG. 1, the speed of processing the binary data representing the field element can be increased by a factor of k for addition and k.sup.2 for multiplication over conventional methods.
Multiple coefficients of a field element of GF(p.sup.k) can be stored in a single hardware register using two exemplary approaches according to the present invention. These approaches are referred to herein as the single-guard-bit representation and the multiple-guard-bit representation, respectively, each of which has different advantages as will be described below. In describing each of these representations, it is assumed that the hardware architecture is capable of performing basic arithmetic and logical operations on w-bit words, e.g., the hardware registers can be w-bit registers for some w.gtoreq.k(m+1) where binary data representing an entire field element is to be stored in a single register. In conventional terms, this means that the hardware architecture can perform arithmetic and logical operations on binary encoded integers in the range (0 . . . 2.sup.w-1). In principle, larger values of w are preferable because more information can thereby be processed per operation. Bit positions are numbered from right to left wherein the least significant bit is indexed by "0", the next bit by "1", the next bit by "2", and so on, up to most significant bit (the word size), which is indexed by "w-1".
Examples of the single-guard-bit representation are shown in FIGS. 2A and 2B for 32-bit hardware registers. FIG. 2A is a schematic illustration of a hardware register 200 with a data storage scheme for storing binary data representing a field element .alpha..sub.1=(.alpha..sub.9,i, . . . , .alpha..sub.1,i, .alpha..sub.0,i) of GF(3.sup.10).
In the example of FIG. 2A, ten groups of bit positions 201-r (unshaded bit positions, wherein `r` is the bit position number) are allocated to store ten groups of data bits representing the field coefficients a9,i , . . . , a1,i, a0,i. Two bit positions are allocated for storing the binary data representing each coefficient aj,i (which is sufficient since aj,i.English Pound.3<22). A group of data bits representing the coefficient a0,i are stored in bit positions zero and one (from the right). Another group of data bits representing the coefficient a1,i are stored in bit positions three and four, and so on. In addition, ten bit positions 203-r are allocated to store "guard bits" (lightly shaded regions), which are initially assigned binary values of 0. In the example of FIG. 2A, bit positions two, five, eight, etc. are allocated for guard bits. The guard-bit positions (also referred to as separating-bit positions) serve to separate binary data representing the field coefficients and to accept any carry bit from an immediately preceding group of bit positions 201-r. For example, when arithmetic and logical operations are carried out, a carry bit from the group of bit positions 201-1 is prevented from carrying over into the adjacent group of bit positions 201-2 and, instead, carries over into the guard-bit position 203-1. Also, in the Example of FIG. 2A, the two most significant bit positions 205 in the register 200 are unused (darkly shaded regions). Generally, unused bit positions are located at the most significant bit locations. However, unused bit positions can also be located at the least significant bit locations. If the unused bit positions are located at the most significant bit locations, it is not necessary to assign any particular values to the unused bit positions. Otherwise, the unused bit positions must initially be assigned values of zero.
In the example of FIG. 2A for GF(3.sup.10), the ground field is GF(3), and the following mapping between integer values of each coefficient and corresponding binary data is applicable (the quantities in parentheses are binary data): 0.about.(0,
0); 1.about.(0, 1); 2.about.(1, 0); 3.about.(1, 1) where 3 also corresponds to 0 (because 3 mod 3=0). Thus, in one aspect of the present invention, a dual representation is provided wherein two different numbers in GF(p) (3 and 0 in this example, where p=3) represent a same value (zero). In GF(3.sup.k), two binary bits are used to represent each coefficient of a field element. In general for GF(p.sup.k), the number of bits used to represent a coefficient of a field element depends on the value of p. Where p is given by p=2.sup.m-1, m binary bits (not including guard bits) are used to represent each coefficient of a field element.
Another example of the single-guard-bit representation is shown in FIG. 2B. FIG. 2B is a schematic illustration of a hardware register 210 with a data storage scheme for storing a field element ai=(a4,i, . . . , a1,i, a0,i) of GF(75). In the example of FIG. 2B, five groups of bit positions 211-r (unshaded bit positions, wherein r is the bit position number) are allocated to store binary data representing the field coefficients a4,i, a1,i, a0,i. In this example, p=7=2m-1. Therefore, m=3, and three bits (not including guard bits) are allocated to store the binary data representing each coefficient aj,i. Binary data representing coefficient a0,i are stored in bit positions zero, one and two (from the right). Binary data representing coefficient a1, i are stored in bit positions four, five and six, and so on.
In the example of FIG. 2B for GF(7.sup.5), the ground field is GF(7), and the following mapping between integer values of each coefficient and corresponding binary data is applicable (the quantities in parentheses are the binary data):
0.about.(0, 0, 0); 1.about.(0, 0, 1); 2.about.(0, 1, 0); 3.about.(0, 1, 1); 4.about.(1, 0, 0); 5.about.(1, 0, 1); 6.about.(1, 1, 0); and 7.about.(1, 1, 1) where 7 also corresponds to 0 (because 7 mod 7=0). Thus, the present invention provides a dual representation wherein two different numbers in the field GF(p) (7 and 0 in this example, where p=7) represent a same value (zero).
In addition, in the example of FIG. 2B, five bit positions 213-r are allocated to store guard bits (lightly shaded regions), which are initially assigned binary values of 0. In addition, bit positions three, seven, eight, eleven, etc. are allocated for guard bits. Also, in the Example of FIG. 2B, the twelve most significant bit positions 215 in the register 210 are unused (darkly shaded regions).
An example of the multiple-guard-bit representation is shown in FIG. 2C. FIG. 2C is a schematic illustration of a hardware register 220 with a data storage scheme for storing a field element .alpha..sub.i=(.alpha..sub.7,i, . . . , .alpha..sub.1,i, .alpha..sub.0,i) of GF(3.sup.8). In the example of FIG. 2C, eight groups of bit positions 221-r (unshaded bit positions) are allocated to store binary data representing the field coefficients .alpha..sub.7,i, . . . , .alpha..sub.1,i, .alpha..sub.0,i, and adjacent groups of bit positions 221-r are separate by a group of two guard bit positions 213-r (lightly shaded bit positions). In this example, p=3=2.sup.m-1. Therefore, m=2, and two bits (not including guard bits) are allocated to store the binary data representing each coefficient .alpha..sub.J,1. Binary data representing coefficient .alpha..sub.0,i are stored in bit positions zero and one (from the right). Binary data representing coefficient .alpha..sub.1,i are stored in bit positions four and five and six, and so on. Eight groups of bit positions 223-r are allocated to store two guard bits each (lightly shaded regions), which are initially assigned binary values of 0. In the example of FIG. 2C, bit positions two, three, six, seven, eight, ten, eleven, etc. are allocated for guard bits. There are no unused bit positions in this example.
It is typically desirable to store binary data representing an entire field element of GF(p.sup.k) in a single hardware register 107. However, in cases where a field element is sufficiently large such that its binary representation exceeds the storage capacity of a single register, it is desirable to store binary data representing at least a portion of the field element in the register 107. The arithmetic and logical operations noted above can be carried out by coordinating the operations in multiple registers that together store binary data representing a single field element of GF(p.sup.k). For example, if two registers are used to store binary data representing a single field element of GF(p.sup.k), a right shift by m bits over all the binary data can be carried out by coordinating the two registers such that the least significant bit in left hand register is shifted to the most-significant-bit side of the right-hand register. (The terminology "right-hand" and "left-hand" are used merely to distinguish the registers in the sense that a left-most-bit position in a register corresponds to the most-significant-bit position. The terminology is not intended to suggest that one register is necessarily physically positioned to the left of another register). It should be noted, however, that where two registers are used to store binary data representing a field element, if unused bit spaces are present in the most-significant-bit positions of the right-hand register, a right-shift operation must be implemented to skip over the unused bit spaces.
According to another exemplary aspect of the present invention, the system 100 illustrated in FIG. 1 can be configured to execute the steps shown in the flow diagram illustrated in FIG. 3. FIG. 3 illustrates an approach 300 comprising a plurality of steps that can be executed by the processing unit 105 shown in FIG. 1. As shown at step 301 shown in FIG. 3, the processing unit 105 stores first binary data representing a first field element of GF(p.sup.k) in a first register (e.g., register 109), p being an odd prime number, wherein the first binary data comprises k groups of first data bits, and wherein each group of first data bits corresponds to an associated one of the k coefficients of the first field element. Similarly, as shown at step 303, the processing unit 105 stores second binary data representing a second field element of GF(p.sup.k) in a second register (e.g., register 111), wherein the second binary data comprises k groups of second data bits, and wherein each group of second data bits corresponds to an associated one of the k coefficients of the second field element. Further, as shown at step 305 the processing unit 105 then generates third binary data by executing at least one operation on contents of the first register and contents of the second register such that the k groups of first data bits are processed in parallel and such that the k groups of second data bits are processed in parallel. For example, the operation or operations referred to in step
305 can include an addition operation, a subtraction operation, a shift operation, a logical AND operation, and a NOT operation just to name a few. Combinations of such operations may be carried out, for example, to generate third binary data that represents a third field element equal to the sum of the first and second field elements or a third field element equal to the product of the first and second field elements as will be described in detail below.
The k groups of first data bits can be structured in the first register 109 such that at least one first guard bit is positioned adjacent to the most significant bit of each group of first data bits, each group of first data bits being separated from an adjacent group of first data bits by a corresponding at least one first guard bit. The k groups of second data bits can be structured in the second register 111 such that at least one second guard bit is positioned adjacent to the most significant bit of each group of second data bits, each group of second data bits being separated from an adjacent group of second data bits by a corresponding at least one second guard bit. In addition, the third binary data can comprise k groups of third data bits stored and structured in a third register (e.g., register 113) such that at least one third guard bit is positioned adjacent to the most significant bit of each group of third data bits, each group of third data bits being separated from an adjacent group of third data bits by a corresponding at least one third guard bit. In this regard, the third field element comprises k third coefficients in accordance with the polynomial-basis representation, and each group of third data bits represents an associated one of the k third coefficients. (In the discussion above, "first", "second" and "third" are used as labels.)
The processing unit 105 and a first register (e.g., register 109) can be viewed as means for storing first binary data representing a first field element of GF(p.sup.k). The processing unit 105 and a second register (e.g., register 111) can be view as means for storing second binary data representing a second field element of GF(p.sup.k). The processing unit 105 and a third register (e.g., register 113) can be viewed as means for storing third binary data representing a third field element of GF(p.sup.k). The processing unit 105 can be viewed as means for executing at least one operation on the first binary data and the second binary data such that the k groups of first data bits are processed in parallel and such that the k groups of second data bits are processed in parallel.
At step 307, it is determined whether or not more data should be processed. If more data should be processed, the flow then proceeds back to step 301. If the additional processing involves processing binary data that have already been stored in a manner consistent with steps 301 and/or 303 as a result of another calculation, steps 301 and/or 303 can be skipped as appropriate. If it is determined at step 307 not to process more data, the algorithm ends.
Exemplary approaches for executing step 305 shown in FIG. 3 will now be described. Step 305 can be implemented, for example, using an algorithm GF_p_k_ADD or an algorithm GF_p_k_MUL, which will be described below. GF_p_k_ADD and GF_p_k_MUL were referred to above in the discussion of the algorithms SUM(.alpha., .beta.), PRODUCT(.alpha., .beta.), and REDUCE(.delta., f). As will be described below, certain aspects of algorithms for both GF_p_k_ADD and GF_p_k_MUL depend upon the functional form of the characteristic value p and upon whether the single-guard-bit representation or the multiple-guard-bit representation is used. In particular, certain aspects of these algorithms depend on whether p is written as p=2.sup.m-1, p=2.sup.m+1 or p=2.sup.m.+-.d for some integer m and some small integer d. The integer d is to chosen such that d<2.sup.m-1. However, choosing d to be smaller, e.g. d.ltoreq.p/6, has some advantages as will be described below. Accordingly, exemplary forms for GF_p_k_ADD and exemplary forms for GF_p_k_MUL will be described below with reference to the functional form of the characteristic value p and with reference to whether the single-guard-bit representation or the multiple-guard-bit representation is used.
In view of the comments above, a question arises as to which form of GF_p_k_ADD or which form of GF_p_k_MUL should be used where a given odd prime p can be written in more than one functional form. For example, p=5 can be written as p=2.sup.m+1
for m=2, and p=5 can also be written as p=2.sup.m-d for m=3 and d=3). Generally, it is preferable to utilize the approach for p=2.sup.m-1 over approaches for the other two functional forms. In addition, it is preferable to use the approach for p=2.sup.m+1 over the approach for p=2.sup.m+d with d>1. In general, for p=2.sup.m+d, it is desirable to choose d odd and as close to 1 as possible. Given a value of p, a good (m, d)-pair can be found by trying all m=1, 2, . . . , (2 log.sub.2(p)), and for each such m, selecting d to satisfy p=2.sup.m.+-.d, until a small d is found.
Addition Using Single-Guard-Bit Representation, p=2.sup.m-1
A form of GF_p_k_ADD for the single-guard-bit representation where p=2.sup.m-1 will now be described, and it will be shown how one full addition of two field elements in GF(p.sup.k) (i.e., k additions pertaining to the k coefficients of each field element), including the associated modular reduction, can be performed with a small, fixed number of operations (and without modular reductions which require long divisions) on a hardware architecture having at least w=k(m+1) bit word size. For example, for a 32-bit architecture, full additions in GF(3.sup.10) can be performed using only five instructions.
In the single-guard-bit representation, first binary data representing a first field element .alpha..sub.1=(.alpha..sub.k-1,i, . . . , .alpha..sub.1,i, .alpha..sub.0,i) of GF(p.sup.k) is stored in a first single hardware register (e.g., register
107 shown in FIG. 1) by storing binary data representing .alpha..sub.0,1 in bit positions 0 through m-1, binary data representing .alpha..sub.1,i, in bit positions m+1 through 2m, etc., such that a group of data bits representing one field coefficient is separated by one bit position from an adjacent group of data bits representing another field coefficient. Second binary data representing a second field element .beta..sub.J is stored similarly in a second single hardware register (e.g., register 109). Bit positions v(m+1)-1 where v=1, 2, . . . , k are allocated to separate the binary data representing the coefficients .alpha..sub.0,1, .alpha..sub.1,1, etc. These positions are referred to as guard-bit positions or separating-bit positions and are initially assigned values of "0". Examples of storing binary data according to the single-guard-bit representation for a w=32-bit architecture are shown in FIGS. 2A and 2B described previously for elements of the fields GF(3.sup.10) and GF(7.sup.5), respectively. For example, in FIG. 2A for GF(3.sup.10), two bit positions are reserved for each .alpha..sub.J,i, (which is sufficient since .alpha..sub.J,i.ltoreq.3<2.sup.2).
With first and second binary data representing first and second field elements of GF(p.sup.k) stored in first and second registers, respectively, operations can be carried out to determine the sum of the first and second field elements. The contents of the first and second registers may be referred to as a and b, respectively. Let M2 be a binary quantity whose only "1" bits are in positions j(m+1)-1, j=1, 2, . . . , k, and "0" elsewhere (i.e., M2=2.sup.m+2.sup.m+1+ . . . +2.sup.k(m+1)-1), and let M1 be a binary quantity given by M1=NOT(M2) (bitwise negation). The sum of the first and second field elements can be determined by carrying out the operations given in Equation 13 c=((a+b)&M1)+(((a+b)&M2)>>m) (13) where "&" denotes bitwise logical AND, ">>" denotes right shift, "+" denotes addition with carry, and c refers to the register contents comprising third binary data that represents a third field element equal to the sum of the first and second field elements. The operations reflected in equation 13 can be executed in any manner that is desired. For example, the intermediate quantity (a+b) can be stored in a given register, and the given register can then be overwritten with the final result given by the quantity c, such that the operation (a+b) is performed only once. The binary quantities M1 and M2 may be thought of as mask quantities because, when combined with the quantity (a+b) via the respective logical AND operations as shown in Equation
13, the binary quantities M1 and M2 mask out (set to zero) bits in certain bit positions in the quantities ((a+b) & M1) and ((a+b) & M2). The binary quantity M1 masks out bits in the quantity ((a+b) & M1) corresponding to guard-bit positions. The binary quantity M2 masks out bits in the quantity ((a+b) & M2) corresponding to non-guard-bit positions.
In carrying out Equation 13 with guard bits at positions m, 2m+1, etc., no carry bit will propagate from an m-bit segment corresponding to some .alpha..sub.j,i (or .beta..sub.j,i), into the segment representing .alpha..sub.J+1,i (or .beta..sub.j+1,i). Thus, the field-element sum is really computed component-wise, modulo p, on .alpha..sub.i and .beta..sub.i. The mask operation by M1 ensures the result will have the correct representation with zeros in the guard-bit positions. In the above discussion, M2 is determined first, and then M1 is defined in terms of M2. However, it would be equivalent to first determine M1 as a binary quantity having values of zero at bit positions corresponding to bit positions of first guard bits stored in the first register and having values of one elsewhere and to then determine M2 as M2=NOT(M1).
An example of this form of GF_p_k_ADD where p=2.sup.m-1 is shown in the flow diagram of FIG. 4. The operations shown in FIG. 4 can be executed by a system such as system 100 shown in FIG. 1. It is assumed that steps 301 and 303 shown in FIG. 3
have already been executed by the processor 105 such that first binary data representing a first field element of GF(p.sup.k) are stored in a first register (e.g., register 107 shown in FIG. 1) and such that second binary data representing a second field element are stored in a second register (e.g., register 109). The steps illustrated in FIG. 4 then represent an exemplary implementation of step 305 shown in FIG. 3.
As indicated at step 401, the processing unit 105 adds the contents, a, of the first register 107, and the contents, b, of the second register 109. The addition may involve a carry into a given next most significant bit where necessary. The result of the addition can be stored in another register 111. As indicated at step 403, the processing unit 105 then executes a logical AND operation between the quantity (a+b) stored in register 111 and a first predetermined binary quantity M1 stored in one of the registers (e.g., register 113). The quantity M1 has values of zero at bit positions corresponding to bit positions of first guard bits stored in the first register 107 and has values of one at bit positions corresponding to bit positions of the groups of first data bits stored in the first register. The result of this operation can be referred to as first intermediate data c1 and is stored in one of registers (e.g., register 115).
As indicated at step 405, the processing unit executes a logical AND operation between the quantity (a+b) store in register 111 and a second predetermined binary quantity M2 where M2 is given by M2=NOT(M1). The NOT operation is bitwise logical negation. The result of this operation is stored in one of registers (e.g., register 117). Also indicated at step 405, the processing unit 105 then executes a right shift by m bits on the quantity given by ((a+b)&M2). The result of this operation can be stored in the same register 117 or in a different register. The result of this operation may be referred to as second intermediate data c2 as shown in step 405. At step 407 the processor executes addition between the first intermediate binary data c1 and the second intermediate binary data c2 to generate the third binary data, represented by c, which can be stored in one of the registers (e.g., register 119). According to this approach, the third binary data c represents the sum of the first field element and the second field element.
The algorithms according to FIGS. 3 and 4 have been described in terms of a specified sequence of steps to facilitate the description. However, it is not necessary to carry the steps indicated in FIGS. 3 and 4 in the exact order illustrated. Those of ordinary skill in the art will recognize that the order of steps can be varied and that some of the steps can be carried out simultaneously. For example, steps 301 and 303 shown in FIG. 3 can be carried out simultaneously, and steps 403 and 405
shown in FIG. 4 can be carried out simultaneously.
Additional insight into aspects of the exemplary form for GF_p_k_ADD, described above can be gained by considering the following special case for k=1. In the description above, the number "0" has two representations: both 0 itself and also p=2.sup.m-1. It is only necessary to take this duality into account during input and output operations. Given that p=0 mod p, there is no mathematical problem with this dual representation. Integers in this dual representation can be added modulo p in accordance with the following equation (a+b)mod p=((a+b)mod 2.sup.m)+((a+b)div 2.sup.m) (14) where div 2.sup.m refers to a function that returns the floor of a quotient where the divisor is 2.sup.m. Stated differently, the sum of a and b (in the dual representation) is a+b if a+b<2.sup.m; otherwise, the sum is ((a+b)mod 2.sup.m)+1. These two cases (depending on whether the sum is less than 2.sup.m or not) can thus jointly be treated by the formula (a+b)mod p=[(a+b)mod 2.sup.m]+[(a+b)div 2.sup.m]. Observe that (a+b).ltoreq.2(2.sup.m-1)=2.sup.m+1-2 and that the mod and div operations can be efficiently implemented as bit operations (logical AND, shift) since the module and the divisor are each powers of 2. Thus, given a hardware architecture that can perform operations on (at least) m+1 bit quantities and given the dual representations for a and b, the quantity (a+b) mod p (in the dual representation) can be determined in accordance with Equation 15: c=((a+b)&(2.sup.m-1))+(((a+b)&2.sup.m)>>m) (15). Because a+b<2.sup.m+1-2, no overflow results from carrying out Equation 15 if w.gtoreq.m+1, where w is the register size. Thus, instead of one addition and one modular reduction (a long division) by p, five simple operations are performed where the quantities 2.sup.m and 2.sup.m-1 are fixed and can be considered constant bit-masks. In the discussion above, it was assumed that k=1 to facilitate the discussion. Of course, the present invention is to be carried out using a value of k that is greater than one. Nevertheless, the discussion for k=1 provides insight into the form of GF_p_k_ADD and the choices of the binary quantities M1 and M2 according to the present invention for use where k is greater than one.
In addition, the dual representation, in which the number "0" is represented as both 0 itself and also as p=2.sup.m-1, facilitates determining the sum of two field elements according to the approach described above. As noted above, instead of using one addition and one modular reduction (a long division) by p to determine the sum of two field elements, the dual representation allows using five simple operations on binary data representing the two field elements to determine their sum.
With regard to the extension field GF(p.sup.lk), as noted above in the discussion regarding SUM(.alpha., .beta.) and PRODUCT(.alpha., .beta.), each element of the extension field is represented as a vector (polynomial) of length l, where each component (coefficient) is an element of GF(p.sup.k) and can be stored according to the single guard-bit representation as described above. Adding two elements in the extension field GF(p.sup.lk) can now be done using 5l operations instead of lk operations as would be required using conventional approaches. Thus, even for relatively small values of k, a significant increase in computational speed can be achieved.
In addition, as will be described later, the above-described exemplary form or GF_p_k_ADD is also applicable to binary data stored according to the multiple-guard-bit representation for p=2.sup.m-1.
EXAMPLE 1
A numerical example illustrating the approach shown in FIGS. 3 and 4 will now be described with reference to FIG. 5. The operations described below can be carried out using a system such as system 100 shown in FIG. 1, which has been previously described. FIG. 5 illustrates register contents resulting from carrying out the operations as described above with regard to FIGS. 3 and 4. In FIG. 5, reference numerals 501 517 refer to 32-bit registers, and the binary data stored within the registers
501 517 are configured according to a single guard-bit representation. In addition, in this example the binary data represents field elements of the finite field GF(3.sup.10), and the characteristic p is given by p=2.sup.m-1=3. Accordingly, M=2, and 2
bits of register space are allocated for each coefficient of the finite field element. A single guard bit (lightly shaded bit locations) separates adjacent binary data representing adjacent coefficients of the finite field element. In addition, in this example there are two unused bits of register space (darkly shaded bit locations) at the most significant bit positions of each register 501 517.
In this example, first binary data, a, representing a first field element (2, 2, 0, 2, 0, 3, 2, 2, 0, 0) (in vector notation) and second binary data, b, representing a second field element (0, 1, 2, 2, 0, 2, 1, 3, 0, 0) (in vector notation) are stored in first and second registers 501 and 503, respectively (steps 301 and 303). Each coefficient of the field elements is itself an element of the ground field GF(3), and each coefficient is represented by binary data according to the following associations: 0.about.(0, 0); 1.about.(0, 1); 2.about.(1, 0); 3.about.(1, 1). A dual representation is provided wherein two different numbers in GF(p) (3 and 0 in this example, where p=3) represent a same value (zero). Thus, binary data given by (1,
1), which corresponds to 3, also represents 0 (because 3 mod 3=0). Each guard-bit position in registers 501 and 503 is initially assigned a value of zero.
The register contents a and b stored in registers 501 and 503, respectively, are then added via addition (corresponding to step 401). The result (a+b) is stored in a third register 505. The contents (a+b) of register 505 are then combined via a logical AND operation with the contents of register 507, in which the quantity M1 has been stored), and the result c1=(a+b)&M1 is stored in register 509 (corresponding to step 403). In addition, the quantity M2=NOT (M1) is stored in register 511. The contents (a+b) of register 505 and the contents M2 of register 511 are then combined via a logical AND operation, and the result (a+b)&M2 is stored in register 513 (corresponding to step 405). The contents (a+b)&M2 of register 513 are then right shifted by m=2 bits, and the result is stored in register 515 (corresponding to step 405). The contents c1 of register 509 and the contents c2 of register 515 are then added via addition, and the result is stored in register 517. The result is given by (2, 3,
2, 1, 0, 2, 3, 2, 0, 0) (in vector notation) and is equivalent to (2, 0, 2, 1, 0, 2, 0, 2, 0, 0) as expected.
In the above example, carries are generated into three guard-bit positions (bit positions eight, fourteen and twenty) as shown in register 505 upon adding the first binary data, a, and the second binary data, b. The guard-bit positions prevent the carry bits from affecting the values of the adjacent group of data bits. Accordingly, in this example, the guard-bit positions (lightly shaded bit positions) allow carrying out operations on ten groups of data bits in parallel, where the ten groups of data bits represent the ten field coefficients.
As a matter of convenience in describing the above operations, the binary results of various steps as shown in FIG. 5 have been described as being stored in separately identified registers. However, those of ordinary skill in the art will recognize that various steps can be carried out by reusing registers in a manner that over-writes previously stored binary data from an earlier step. For example, the first and second intermediate binary data c1 and c2 shown in registers 509 and 515 can instead be stored in registers 501 and 503 by over-writing the previously stored binary a and b to utilize register space more efficiently. This completes the discussion of Example 1.
In another aspect of the invention relating to computations involving field elements of an odd-characteristic finite field where p=2.sup.m-1, a hardware apparatus can be provided for carrying out operations for the exemplary form of GF_p_k_ADD illustrated in FIG. 4. FIG. 6 is a functional block diagram of such an exemplary hardware apparatus. In particular, the apparatus 600 illustrated in FIG. 6 provides another approach for generating third binary data, denoted as c in FIGS. 4 and 6, that can represent the sum of a first field element and a second field element of GF(p.sup.k). In FIG. 6, solid lines represent electrical connections for the flow of data, and dotted lines represent electrical connections for the flow of control signals. Solid lines that cross are not connected unless a black dot is present at the intersection of the lines, such as connection 623. The apparatus 600 is described here in the discussion pertaining to the single-guard-bit representation, but the apparatus
600 is equally applicable to a multiple-guard-bit representation, which is described later.
The apparatus 600 comprises a first register 601 and a second register 603 for holding first binary data (register contents "a") and second binary data (register contents "b"), respectively. The first binary data and the second binary data represent field elements of the finite field GF(p.sup.k). Here, it is assumed that the first and second binary data in the first and second registers 601 and 603 are already configured with zeros at guard-bit locations such as illustrated, for example, as in FIG. 2A. The apparatus 600 also comprises a combinatorial logic and clock device (clock/logic) 605, an addition gate (+) 607 (also referred to as an adder), a register 609 for holding the sum of register contents a and b, a first logical AND gate (&1) 611, a mask register 613 for generating and holding a first predetermined binary quantity M1 upon input m, a NOT gate (NOT) 615, and a second logical AND gate (&2) 617. In addition, the apparatus 600 comprises a right shift gate (>>) 619 and an output register 621 for holding a result "c". Right shift gates are known to those of ordinary skill in the art, and such gates shift the values therein to the right by a selected number of bits and enter a corresponding number of zeros into the most significant bit positions. The clock/logic unit 605 can also have an output terminal (not shown) for providing a signal to be input to another hardware apparatus to initiate computations in another hardware apparatus when computations in the apparatus
600 are complete. For example, another hardware apparatus can be another apparatus 600 or a multiplier apparatus 1200 such as illustrated in FIG. 12 to be described later.
The operation of the apparatus 600 illustrated in FIG. 6 will now be described. First binary data representing a first field element and second binary data representing a second field element of GF(p.sup.k) are input on lines labeled a and b to the first register 601 and the second register 603, respectively. It is assumed that the first and second binary data are already configured with zeros at guard-bit positions (e.g., by a processor that is not shown). Binary data representing the quantity m is also provided to the right-shift gate (>>) 619. Mask register 613 receives a first predetermined binary quantity M1 (a mask quantity) from a processor (not shown), where M1 is a quantity with values as described previously. Alternatively, mask register 613 can also comprise a circuit that generates the quantity M1 upon input of binary data representing the quantity m. Making such a circuit is within the purview of one of ordinary skill in the art.
Computation is initiated by a start signal on the input line labeled s. The first and second binary data, the binary data representing the quantity m, and the start signal can be provided from a processor (not shown) or from another hardware apparatus (not shown), such as a multiplier apparatus as illustrated in FIG. 12 to be described later, via a conventional routing circuit, for example.
When the values of the first binary data and second binary data in the registers 601 and 603 are stable, a signal s1 locks those values into the registers 601 and 603, respectively. The adder 607 then adds the values provided at its two inputs from register 601 and 603. When the output of the adder 607 is stable, the output from adder 607 is locked into register 609 by a signal on the line labeled s2. The time required for a given value to become stable in a given register can be conventionally determined by one of ordinary skill in the art in view of the circuitry design, and a locking signal (e.g., on line s1 or line s2) can be timed to occur after this time. At this point, the register 609 holds binary data representing corresponding to the quantity a+b shown in step 401 of FIG. 4.
The binary data in register 609 are then directed from register 609 to the AND gate 611. The AND gate 611 performs a logical AND between the binary data from register 609 and the mask quantity M1 from mask register 613. The result of this logical AND operation is equivalent to the quantity c1 illustrated at step 403 of FIG. 4. The output from the first AND gate 611 is then directed back to the input of the first register 601, and another signal on the line labeled s1 then locks the corresponding values into the first register 601 at the appropriate time. In this regard, it will be understood that a signal on the line s1 can be timed appropriately such that it is unnecessary to provide a multiplexer or switch at the point labeled
623 to route data output from register 609. Of course, the apparatus 609 could be provided with a multiplexer or switch at the point 623 for routing data if desired.
While the operations described in the immediately preceding paragraph are being carried out, the following operations are more or less simultaneously carried out. Output from the register 609 is directed to the second AND gate 617, and the first predetermined binary quantity M1 is directed to a logical NOT gate 615. The output from the logical NOT gate 615 is also directed to an input of the second AND gate 617. The data at these the inputs of the second AND gate 617 are then combined via a logical AND operation and are directed to a right-shift gate 619. The right-shift gate 619 executes a right-shift by m bits on the data input from the second AND gate 617 according to the input on the line labeled m. The output of the right-shift gate
619 is then directed to the input of the second register 603. The result of this group of operations, which is input to the second register 603, corresponds to the quantity c2 referred to in step 405 of FIG. 4.
When the values of the binary data now stored in the first and second registers 601 and 603 are stable, the signal s1 locks these values into the first and second registers 601 and 603. At this point, the adder 607 adds the binary data from the first and second registers 601 and 603 and directs the output to output register 609. The binary data now stored in register 609 is then directed to the output register 621, and a signal on line s3 locks the binary data into the register 621 at the appropriate time. This binary data corresponds to third binary data denoted as c at step 407 of FIG. 4.
Those of ordinary skill in the art will appreciate that many variations of the apparatus 600 are possible according to the present invention. For example, each internal w-bit register 601, 603, 609, 613, and 621 can be replaced with multiple parallel (i.e., coordinated) registers, at least one of which holds binary data representing at least two coefficients of a field element. Further, the first AND gate 611, the adder 607, the second AND gate 617, the NOT gate 615, and the right-shift gate 619 shown in FIG. 6 are accordingly replaced with multiple parallel (i.e., coordinated) copies of each.
In the apparatus 600 as described with reference to FIG. 6, first and second binary data are input to first and second registers 601 and 603, respectively, with zeros already configured at appropriate guard-bit positions. The first and second binary data may be provided in this configuration by a processor (not shown), for example, that inserts zeros at guard-bit positions as appropriate. Thus, the processor (not shown) and the first register 601 can be viewed as means for storing first binary data, and the processor (not shown) and the second register 603 can be viewed as means for storing second binary data. Further, the register 621 and/or the register 609 can be viewed as means for storing third binary data. Moreover, the clock/logic device 605 and any or all of the remaining devices illustrated in FIG. 6 can be viewed as means for executing at least one operation on the first binary data and the second binary data.
By utilizing the apparatus 600 along with a processor (not shown), the apparatus 600 has flexibility to be used with field elements for various choices of p and k for the finite field GF(p.sup.k), where p is of form p=2.sup.m-1. In particular, the quantity m is a variable, and the right-shift gate 619 responds accordingly to the input value of m. In addition, the mask register 613 holds an appropriate form of the first predetermined binary quantity M1 that depends upon the quantity m. The quantity M1 is "predetermined" in the sense that once the quantity m is chosen (which determines the quantity p), the form of the quantity M1 directly follows as described above.
In another exemplary aspect of the invention, the apparatus 600 can be modified, such as shown by hardware apparatus 700 illustrated in the block diagram of FIG. 7, for a situation in which a dedicated choice of the finite field GF(p.sup.k) is made and remains unchanged. That is, the quantities m, p, and k, as well as the choice of whether the representation is a single-guard-bit representation or a multiple-guard-bit representation, are fixed, and the hardware apparatus 700 is dedicated to those choices. In this situation, the hardware apparatus 700 can receive initial binary data representing field elements wherein the initial binary data are not configured with zeros in guard-bit positions. Rather, the hardware apparatus 700 itself configures the initial binary data with zeros in appropriate guard-bit positions to generate first and second binary data without the need for a processor to configure the first and second binary data with zeros in guard-bit positions. The hardware apparatus 700 illustrated in FIG. 7 will now be described.
The hardware apparatus 700 illustrated in the functional block diagram of FIG. 7 shares various common features and operational aspects with the apparatus 600 illustrated in FIG. 6, and like features are given like reference numerals in FIGS. 6
and 7. For example, reference numerals 707, 711, 715 and 717 in FIG. 7 correspond to the reference numerals 607, 611, 615 and 617 in FIG. 6. Discussion of aspects of the apparatus 700 that are common to the apparatus 600 will not be duplicated here. Rather, aspects in which the apparatus 700 differs from the apparatus 600 will be discussed.
The apparatus 700 possesses several features not found in the apparatus 600. In particular, the apparatus 700 possesses guard-bit-insertion circuits 701' and 703' (GB insertion) and a guard-bit-removal circuit 709' (GB removal). Exemplary implementations of these circuits will be described in FIGS. 8A and 8B. As shown in FIG. 7, the guard-bit-insertion circuits 701' and 703' are functionally arranged at the input to the hardware apparatus 700, and the guard-bit-removal circuit 709' is functionally arranged between the register 709 and the output register 721. The guard-bit-insertion circuits 701' and 703' operate to receive initial binary data a' and b' (without guard bits) corresponding to first and second field elements of GF(p.sup.k) and to insert appropriate guard bits into that data. In other words, the guard-bit-insertion circuits 701' and 703' transform the initial binary data a' and b' into first binary data and second binary data having guard bits, in particular, with zeros at guard-bit positions. The guard-bit-removal circuit 709' has the opposite function--namely, to receive third binary data c representing a computational result and having guard bits and to remove those guard bits, thereby forming final binary data c' representing the computational result, but without guard bits.
The apparatus 700 also lacks certain features present in the apparatus 600 because they are not needed in the apparatus 700. In particular, the apparatus 700 lacks an input line for the quantity m into the right-shift gate 719 and into the mask register 713. Such an input line is not necessary given that m is fixed. Rather, the right-shift gate 719 is initialized once with the value of m to execute the appropriate right shift. Similarly, the mask register 713 is initialized once with the appropriate form of M1. Conventional electrical connections can be used for carrying out these initializations and are not shown in FIG. 7. The operation of the apparatus 700 illustrated in FIG. 7 is substantially similar to that described for the apparatus 600 illustrated in FIG. 6 except for the operational distinctions noted above.
In the apparatus 700 as described above, the first register 701 and the guard-bit insertion circuit 701' can be viewed as means for storing first binary data. The second register 703 and the guard bit insertion circuit 703' can be viewed as means for storing second binary data. Further, the register 709 can be viewed as means for storing third binary data. Moreover, the clock/logic device 705 and any or all of the remaining devices illustrated in FIG. 7 can be viewed as means for executing at least one operation on the first binary data and the second binary data.
The guard-bit-insertion circuits 701' and 703' and the guard-bit-removal circuit 709' referred to in FIG. 7 will now be described in greater detail with reference to FIGS. 8A and 8B. The exemplary circuits illustrated in FIGS. 8A and 8B reflect a 32-bit register arrangement configured for the field GF(7.sup.8); however, the concepts reflected in FIGS. 8A and 8B are generally applicable to registers of others sizes and to other finite fields GF(p.sup.k). FIG. 8A illustrates an exemplary guard-bit-insertion circuit 800 that can be used for guard-bit-insertion circuits (GB insertion) referred to by reference numerals 701' and 703', respectively, in FIG. 7. As shown in FIG. 8A, the circuit 800 comprises a first register 801 with a plurality of bit positions 803 (e.g., 32 bits). The circuit 800 also comprises a register 805 having plural groups 807 of bit positions intended to store binary data representing field coefficients of a field element of GF(p.sup.k) and a plurality of guard-bit positions 809 (lightly shaded regions). The circuit 800 also comprises a plurality of electrical connections 811 configured to route data from register 801 to register 805 in a manner that provides a guard-bit position 809 adjacent to the most significant bit position of the preceding group of 3-bit positions 807. The guard-bit positions 809 are electrically grounded to provide zeros for these bit values, but these electrical connections are not shown in the FIG. 8A. Such a circuit can be formed, for example, using conventional lithographic techniques.
In this manner, each 3-bit group 807 of bit positions in register 805 can store binary data representing a coefficient of a field element of GF(7.sup.8), and each group 807 of bit positions in register 805 is separated from an adjacent group 807
of bit positions by a single guard bit 809. Accordingly, the guard-bit-insertion circuit 800 allows initial binary data representing coefficients of a field element to be transferred in parallel from register 801 to register 805 in a manner that inserts guard bits between groups of data bits representing coefficients of the field element.
Similarly, an exemplary guard-bit-removal circuit 820 is illustrated in FIG. 8B for a 32-bit GF(7.sup.8) configuration. As illustrated in FIG. 8B the guard-bit-removal circuit 820 is the mirror image of the guard-bit-insertion circuit 800 shown in FIG. 8A. The guard-bit-removal circuit 820 comprises a register 825, a register 821 and a plurality of electrical connections 831. The register 825 comprises plural groups 827 of bit positions and a plurality of guard-bit positions 829, each guard-bit position being located adjacent to the most significant bit of a given group 827 of bit positions. As illustrated in FIG. 8B, the electrical connections 831 are configured such that binary data representing field coefficients of a field element stored in register 825 are transferred to register 821 in a manner that eliminates guard bits between adjacent groups of data bits representing field coefficients. That concludes the discussion of FIGS. 8A and 8B.
Multiplication Using Single-Guard-Bit Representation
According to another aspect of the invention, an exemplary form of GF_p_k_MUL for the single-guard-bit representation will now be described for computing the product of two (non-zero) field elements .alpha..sub.1 and .beta..sub.J in GF(p.sup.k). The case where one field element is zero is trivial and does not need to be described. This discussion is applicable to p written in the functional forms p=2.sup.m-1, p=2.sup.m+1, p=2.sup.m-d and p=2.sup.m+d.
As noted previously in the discussion pertaining to Equations 9 12, a multiplicative group is cyclic, and a field element g, therefore, can be found such that any other non-zero field element can be written as g.sup.x for some integer x<p.sup.k. Thus, the discrete logarithms of all field elements, as well as the corresponding anti-logarithms, can be pre-computed once, and table "look-ups" can be used to calculate the product of two field elements. Similarly, with regard to the present invention for binary data "a" representing any .alpha..sub.j according to the single-guard-bit representation (i.e., where guard bits of value zero are placed in bit positions v(m+1)-1 where v=1, 2, . . . , k--that is, every successive m-th bit position), the following relations are applicable: DLOG{a}=x (16) ANTILOG{x}=a (17) where 0.ltoreq.x<p.sup.k such that g.sup.x=.alpha..sub.J. Accordingly, in the single-guard-bit representation, multiplication of field elements of GF(p.sup.k) can be accomplished in accordance with the relation: c=ANTILOG{(DLOG{a}+DLOG{b})mod(p.sup.k-1)} (18) where "a" is first binary data (register contents) stored according to the single-guard-bit representation representing a first field element, "b" is second binary data (register contents) stored according to the single-guard-bit representation representing a second field element, and "c" is third binary data (register contents) representing a third field element equal to the product of the first and second field elements. For example, in terms of the notation used with the algorithms PRODUCT(.alpha.,.beta.) and GF_p_k_MUL described previously, "a" can represent a field element .alpha..sub.j of GF(p.sup.k), and "b" can represent a field element .beta..sub.1-J of GF(p.sup.k). Accordingly, the product of two field elements of GF(p.sup.k) in the single-guard-bit representation according to the present invention can be computed using only three table look-ups and one modular addition.
According to an exemplary aspect of the present invention, the system 100 illustrated in FIG. 1 can be used to implement the above-noted approach for multiplication of field elements. In particular, the system 100 can be configured such that the processor 105 executes the exemplary series of steps illustrated in FIG. 9 to generate third binary data referred to in step 305 of FIG. 3. It is assumed that steps 301 and 303 shown in FIG. 3 have already been executed by the processor 105 such that first binary data representing a first field element of GF(p.sup.k) are stored in a first register (e.g., register 107 shown in FIG. 1) and such that second binary data representing a second field element are stored in a second register (e.g., register
109). The steps illustrated in FIG. 9 then represent an exemplary implementation of step 305 shown in FIG. 3. The approach 900 illustrated in the flow diagram of FIG. 9 will now be described.
FIG. 9 is a flow diagram illustrating steps executed by the processing unit 105 for processing the first and second binary data to generate third binary data that represents the product of the first and second field elements. As indicated at step 901 of FIG. 9, the processor determines the quantities DLOG(a) and DLOG(b) where "a" represents the contents of the first register and "b" represents the contents of the second register. In step 901, the DLOG operation represents a look-up operation from a look-up table of discrete logarithms in binary form of non-zero field elements of GF(p.sup.k). The look-up table of discrete logarithms can be stored in memory 101 illustrated in FIG. 1. An example of a simple look-up table for the DLOG operation for GF(3.sup.2) is given in FIG. 10A. The look-up table in FIG. 10A will be described in greater detail below.
At step 903, the processing unit 105 executes addition of the quantities DLOG(a) and DLOG(b) and reduces the result of this addition modulo (p.sup.k-1). At step 905, the processing unit 105 determines the quantity ANTILOG{(DLOG(a)+DLOG(b))mod(p.sup.k-1)}. The ANTILOG operation represents a look-up operation from a look-up table of anti-logarithms, wherein ANTILOG(X)=g.sup.x, where g is a generator of GF(p.sup.k). An example of a simple look-up table for the ANTILOG operation for GF(3.sup.2) is given in FIG. 11A. The look-up table in FIG. 11A will be described in greater detail below. The result of the operations set forth in step 905 is third binary data referred to in step 305 of FIG. 3 which, in this example, represents a third field element that is the product of the first and second field elements.
Exemplary look-up tables of discrete logarithms and anti-logarithms referred to above for the single-guard-bit representation will now be described. For the sake of brevity, a look-up table of discrete logarithms may hereinafter also be referred to as a DLOG table, and a look-up table of anti-logarithms may hereinafter also be referred to as an ANTILOG table.
In one example, the DLOG table can be indexed by binary data corresponding to field elements of GF(p.sup.k) wherein the binary data are viewed as memory addresses. This type of DLOG table is referred herein as a "directly addressed" DLOG table and uses consecutive binary data corresponding to field elements as memory addresses. For p=2.sup.m-1, for example, the DLOG table can be indexed (addressed) by k(m+1)-bit binary strings. Alternatively, the DLOG table can be indexed using binary strings k(m+1)-1 bits in length if the most significant guard bit of the binary data representing each field element not used for addressing (the most significant guard bit is not needed for table look-ups). If the most significant guard bit is not used for addressing, the size of the DLOG table can be reduced by a factor of two. Assuming that binary data representing field elements is referred to as a(t) (where t is the polynomial variable in a polynomial basis representation), the corresponding looked-up value from the DLOG table at an address corresponding to a(t) is the integer "x" where x=DLOG{a(t)}. In a directly addressed DLOG table for p=2.sup.m-1, given that the single-guard-bit representation of field elements allocates every m-th bit to be a guard bit (for a total of k guard bits), there are only 2.sup.km relevant entries in the DLOG table that are actually used during look-up. These relevant entries are those for which a quantity a(t) has zeros in the guard-bit positions. Such a DLOG table overall has 2.sup.k(m+1) reserved memory locations (rows), if indexed using binary strings k(m+1) bits in length, or 2.sup.k(m+1)-1 rows, if indexed using binary strings k(m+1)-1 bits in length. Stated differently, rows in a direct addressing DLOG table for which the field element a(t) (the memory address) has a "one" in any guard-position are not used.
For example, if k=8 and p=3 (i.e., m=2), each element of the field can be represented as a 24-bit string (with zeros in every third bit position), and the DLOG table can, accordingly, be indexed by strings of length 8(2+1)=24 (equivalent to
8(log.sub.2(3)+1)). In this example, there are 224 reserved memory locations in the DLOG table, but only one in every 256 of these will contain data that is actually accessed, since look-ups will only be conducted for addresses corresponding to field elements, i.e. strings whose guard-bit locations are zeros. In the above example, there are k=8 guard-bit locations, so only one table index in every 2.sup.8 (=256) will be used for actual table look-ups. However, for moderate k and m (e.g., k(m+1)<25), implementing a look-up table in this manner is in many cases still feasible.
As noted above, the most significant guard bit is not needed to index a directly addressed DLOG table, and this observation allows saving a factor of two in the size of the DLOG table. As will be described below, for p of the form p=2.sup.m+1, p=2.sup.m-d and p=2.sup.m+d (d>1 and d odd), each coefficient of a field element is represented using m+2, m+1 and m+2 bits of binary data (not including guard bits), respectively, instead of m bits as for p=2.sup.m-1. Directly addressed DLOG tables for p of forms other than p=2.sup.m-1, therefore, are correspondingly larger than directly addressed DLOG tables for p of the form p=2.sup.m-1.
An exemplary DLOG table 1000 illustrating concepts described above for the single-guard-bit representa