One of the problems with most of the identification techniques such as ID cards, credit cards and computer passwords is that a party P proves his identity by revealing a word i(P) that is memorized or printed on a card. An adversary cooperating with a dishonest verifier can either get a copy of the card or otherwise learn the word i(P). The adversary can later on use i(P) to pretend to be P and, thus, is granted the access or services implied by i(P).

An obvious solution to this problem is to use a zero-knowledge proof to convince the verifier V that the prover P knows i(P) without revealing a single bit about i(P). Such a proof goes one step further than the zero-knowledge proofs considered in the preceding section. Previously P revealed one bit of information to V, namely, that the theorem is true, there is a 3-coloring or a satisfying truth-value assignment, etc. Not a single bit of information is now revealed. This difference can be expressed briefly by saying that, while we previously were talking about zero-knowledge proofs of theorems, we are now talking about zero-knowledge proofs of knowledge.

Of course, the latter types of proofs can be extended to concern proofs of theorems as well. This means, for instance, that P convinces V that he has settled Fermat's Last Theorem without revealing a single bit of his information, not even whether he has established the theorem or found a counterexample! A way to do this is to let i(P) consist of Fs information, beginning with the statement of the theorem or its negation and followed by the proof or counterexample.

In the following protocol the existence of a trusted agency is assumed. The only purpose of the agency is to publish a modulus n which equals the product of two large primes p and q but to keep the primes themselves secret. For a technical reason to be explained later, the primes are assumed to be congruent with 3 (mod 4). After publishing n, the agency may cease to exist.

P's secret identification i(P) consists of k numbers c,, ... , c_{k} with 1 <c.< n. His public identification pi(P) consists of k numbers d_{f}, ... ,d_{k} with 1 < dj < n, and each d_{j} satisfying one of the congruences

(*) dfj = ± 1 (mod n).

The verifier V knows the public n and pi(P). P wants to convince her that he knows i(P). The following four steps constitute one round of the protocol. The number of rounds decreases the probability of P cheating.

Step 1: P chooses a random number r, computes the numbers (+ r^{2}, mod n) and tells one of them, call it x, to V.

Step 2: V chooses a subset S of the set {1,.. . , k} and tells it to P. Step 3: P tells V the number

y = (rT_{c}, mod n), where T_{c} is the product of the numbers Cj such that j belongs to S.

Step 4: V verifies the condition

x = ± y^{2} T_{d} (mod n),

where T_{d} is the product of the numbers dj such that j belongs to S. If it is not satisfied, V rejects. Otherwise, an eventual new round is begun.

Observe first that the verification condition in Step 4 should hold because

y^{2}T_{d} ee r^{2}T^{2}T_{d} = ±r^{2} = ± x(mod«),

the second congruence being a consequence of (*). The use of r is necessary because, otherwise, V would find out any c_{}} by choosing S = {j}. The special form of the primes p and q guarantees that the ^-numbers can range over all integers with the Jacobi symbol + 1 (mod n). This implies that V can be sure that the c-numbers exist. A tacit assumption needed for (*) is that (Cj, n) = 1, for all j. If this is not the case, then n can be factorized, and the whole world collapses! A minor technicality useful in practical implementations is to use the inverses of the squares of the c-numbers rather than the squares themselves when defining (*). Of course, the whole protocol is based on the intractability of extracting square roots (mod n) when the factorization of n is unknown.

This implies that Fgets no information about the c-numbers and, in fact, Fcan play both the roles of P and V in the protocol. On the other hand, the only way for P to cheat is to guess the set S in advance, and provide ( ± r^{2}T_{d}, mod ri) as x in Step 1 and y — r in Step 3. The probability for a successful guess is 2~^{k} and, hence, 2~^{kt} in t rounds. A reason for this rapid convergence is that the k numbers in P's identification invoke an element of parallelism in the protocol. Assuming the intractability of factorization and extraction of square roots (mod n), our protocol constitutes a zero-knowledge proof of identity. It follows that not even a crooked Vera can extract any information that could later on be used to convince the true Vera about the knowledge of i(P).

We remark in passing that a rigid formalism is not needed for our purposes in this chapter. In such a formalism P and V would be machines executing algorithms within certain time bounds and having access to common or separate random numbers. In the discussion of many subtleties such a more penetrating formalism is helpful.

Example 6.5. The trusted agency has published the modulus n = 2773. P's secret identification i(P) consists of the 6-tuple

c, = 1901, c_{2} = 2114, c_{3} = 1509 ,

c_{4} = 1400, c_{s} = 2001, c_{6} = 119 .

(See here also Example 4.1.) The squares of these numbers (mod n) are, in the same order,

582, 1693, 448, 2262, 2562, 296.

P now chooses his public identification pi(P) to consist of the 6-tuple

d_{x} = 81, d_{2} = 2678, d_{3} = 1207 ,

<2_{4} = 1183, d_{5} = 2681, d_{6} = 2595.

Then the congruences (*) will be satisfied for j = 1,..., 6 and, moreover, + 1 appears on the right side for j = 1, 3,4, 5 and — 1 appears for j = 2,6. Assume that P chooses r = 1111 and tells V the number

x = ( — r^{2}, mod n) = 2437 .

Assume that V chooses S = {1,4, 5,6} and computes T_{d}= 1116. P computes T_{c} = 96 and tells V the number y = 1282. Because

y^{2} T_{d} = 1282^{2} -1116 = 2437 = x (mod n),

The verification condition holds. Similarly, the choices r = 1990,

x = (r^{2}, mod n) = 256

and S = {2,3, 5} give the values

T_{d} = 688, T_{c} = 1228, y = 707 . The verification condition — y^{2}T_{d} = —2517 = x (mod n) is satisfied. □

We observed that not even a crooked verifier can gain any information that could later on be used to convince the true verifier about the knowledge of i(P). Still, some more subtle on-line cheating schemes are conceivable. Assume that a crooked verifier and prover, V_{c} and P_{c}, collaborate in trying to convince the true verifier Kthat P_{c} knows the identification i(P) of the true prover P. Assume, further, that V_{c} is in the position to test P's knowledge of i(P). For instance, P wants to pay a bill to V_{c}. Then at the same time P_{c}, who can secretly communicate with V_{c} by radio or telephone, tries to gain access to a top-secret area, the access being granted by V if knowledge of i(P) is shown. Now P_{c} and V_{c} can act as communication links and, in fact, the whole protocol will be executed between V and P. V will be convinced of the knowledge of i(P) but gets the wrong idea that P_{c} knows i(P)l

We now discuss another identification scheme, based on a knapsack-type problem. We present first a simple version of the scheme, and then a bit more involved one. The latter can be further generalized but it is not yet properly understood to what extent, if any, such generalizations and complications contribute towards the security of the scheme.

Let A = (a,,. .., a_{n}) be a knapsack vector with an even n. It is an /VP-complete problem to pick up half of the components of A in such a way that they sum up to the same number as the remaining half. Hence, also the following problem, being more general, is /VP-complete. Given a knapsack vector A and a vector B = (fcj,.. ., b_{n}) with integer components, maybe some of them negative. Find, if possible, a permutation B_{p} of the vector B such that AB_{p} = 0. For instance, if

A = (3,7, 8,2,12,14), B = (1,1,1, - 1, - 1, - 1),

then the permutation p transposing the 2nd and 5th components but leaving the other components fixed satisfies the condition because

,4(1, - 1,1, - 1,1, - 1) = 0.

(Here the second vector in the product is understood to be a column vector.)

The setup is now as follows. The trusted agency has published the knapsack vector A = (a„ . .., a_{n}). (w need not be even.) P's public identification pi(P) is the vector B = {b_{t},.. . ,b_{n}) with integer components. His secret identification i(P) is a permutation p such that AB_{p} = 0. The protocol uses, in addition, a cryptographic hash function h(x, y). We do not define hash functions formally. The points essential for us are that the value h(x, y) can be easily computed from x and y, whereas x and y cannot be recovered from the value and, moreover, h(x,y) is not long in comparison with x and y. The previously discussed XOR-operator is a simple hash function, if xXORy does not leak information about x or y. The hash function h(x, y) can be published by the agency or agreed upon between P and the verifier V. Of course, it is desirable that not even h(x, y) and one of the arguments gives away the other argument. This condition is clearly not satisfied by the XOR-operator.

Each round in the protocol, where P tries to convince V about his knowledge of i(P), consists of the following steps.

Step 1: P chooses a random vector R and a random permutation q (both having the dimension n), and tells V the values h(q, AR) and h(pq, R_{q}).

Step 2: V chooses a number d = 0 or d = 1 and asks from P the vector C = R_{q} + d-B_{pq}. After receiving C, V asks from P either the permutation q or the permutation pq.

Step 3: If she asked for q, V verifies the condition h(q, A_{q}C) = h(q, AR). If she asked for pq, V verifies the condition

h(pq, C - dB_{pq}) = h(pq, R_{q}) .

Observe first that V has all the data needed for the verification in Step 3, either from Steps 1 and 2 or from the public information. The validity of the second verification condition is obvious by the definition of C. The validity of the first condition follows because

(The equation A_{q}B_{pq} = 0 holds because AB_{p} = 0 and, hence, the product equals 0 also if both factors are permuted by the same permutation.)

Coming back to the illustration before the protocol, assume that

R = (15,1,5,9,2,6), d= 1 and q = (1234).

We use here the customary notation for permutations: q is a mapping that permutes the components 1,2,3,4 cyclically and leaves the two other components fixed. Then

Hence, the verification condition will be satisfied.

Observe that the number of permutations is huge even for relatively small values of n. This is very important from the point of view of security.

In a more sophisticated version of the protocol, A is an m x n matrix with integer entries, and A_{p} is the matrix obtained from A by applying the permutation p to the columns of A. A small prime s is fixed, typically s = 251. A and s are published by the agency, or otherwise agreed upon by all users. As before, P's public identification pi(P) is an n-vector B. His secret identification i(P) is a permutation p such that

AB_{p} = 0 (mod s),

where the right side means an m-vector of zeroes. (In our earlier simple version, m = 1 and the congruence is an equation.) The protocol is basically the same as before but also the choice of d will be more general, now 0 < d < s. The components of all vectors are reduced modulo s and, moreover, AR and A_{q}C are now m-vectors. Everything else remains the same. Also the validity of the verification conditions follows exactly as before and, hence, an honest prover P always passes the test. It is easy to see that the probability of success for a dishonest prover P (not knowing the permutation p) is at most (s + l)/2s. The protocol is zero-knowledge because the individual messages sent by P convey no knowledge. Further generalizations of the scheme are obtained, for instance, by replacing the matrix-vector products with matrix-matrix or even with tensor-tensor products.