CATEGORIES:

Biology Chemistry Construction Culture Ecology Economy Electronics Finance Geography History Informatics Law Mathematics Mechanics Medicine Other Pedagogy Philosophy Physics Policy Psychology Sociology Sport Tourism

Iteration of Morphisms

Many public-key cryptosystems based on the theories of automata and formal languages have been proposed. Some of them will be discussed in this and the next section. As we have emphasized already before, the purpose is rather to give a feeling of the diverse possibilities to construct public-key cryptosystems than to evaluate the resulting systems. Apart from security issues, such an evaluation should take into account also other aspects: ease of legal application, length of cryptotexts, etc. Some of these aspects will be mentioned below. Language- theoretic notions will be explained to the extent they are needed for the understanding of the systems. Some further language theory will be used without detailed explanations, for instance, in cryptanalysis. As regards language theory, the interested reader may consult [Sal].

Let I and A be alphabets. Recall that Z* denotes the set of all words over Z, including the empty word /. In what follows, Z and A may be equal, disjoint or partially overlapping. A mapping h: Z* A* is termed a morphism iff h(xy) = h(x)h(y) holds for all words x and y over Z. It follows that always /?(/) = / and that a morphism is completely determined by its values for the letters of Z. A finite substitution ct is a mapping of Z* into the set of finite subsets of A* such that o(xy) = a(x)(j(y) holds for all words x and y over Z. The two conclusions made for morphisms hold also now. For instance, let Z = A = {a, b} and

a (a) = {a, ab}, a(b) = {b, bb} .

Then

a(ab) = {ab, abb, abbb} .

Observe that a(ab) contains only three elements because the word abb is obtained in two different ways. In the sequel it will sometimes be convenient to use the notation (x)h instead of h(x), and similarly for finite substitutions. If L is a language, then

<j(L) = {y\y e a{x), for some x e L} .

We now begin the description of the cryptosystem. Consider two morphisms h₀, h_x : Z* -» Z*, as well as a nonempty word w over Z. We say that the quadruple G = (X, h₀, h₁, w) is backward deterministic iff the condition

(w)h_h ...h_in = (w)h_u ...h_jm

always implies the condition

i, . . . i_n ...j_m.

Here each i, and j, belongs to the set of indices {0,1}. Thus, backward determinism means that in an application of a sequence of morphisms, one after the other, the outcome uniquely determines the sequence; it is not possible that two different sequences lead to the same outcome.

Example 5.1. Consider the morphisms defined by

h₀(a) = ab, h₀(b) = b, h_t(a) = a, h_l{b) = ba.

If we choose w = a, then the resulting quadruple is not backward deterministic because the outcome a is obtained by a sequence of l's of any length. The same conclusion holds if w = b. On the other hand, the quadruple ({a, b}, h₀, h_}, ab) is backward deterministic. This follows because the last letter of a word reveals the morphism last applied. Using this principle one can "parse" a word w' back to the initial word, provided w' was obtained by some sequence of morphisms from w. □

Backward deterministic quadruples G can be used as classical cryptosystems in the following obvious fashion. A sequence of bits i, . . . i„ is encrypted as the word (w)h_u . . . h_in. Backward determinism guarantees that decryption will be unique.

For instance, if G is the quadruple in Example 5.1 with w = ab, then some plaintexts are encrypted as follows.

Plaintext	Cryptotext
	abb
	aba
	abbb
	ababa
	abbab
	abaa
	abaabaa

Of course, G has to be kept secret if it used as a classical cryptosystem in the sense described. Otherwise, there will be no difference between legal decryption and cryptanalysis. Cryptosystems of this type are referred to as functional. In general, a Junctional cryptosystem is specified by two functions f₀ and /, and an initial value x. A sequence of bits i₁ ... i„ is encrypted as the value (x)f_h .. ,f_in. A condition corresponding to the backward determinism defined above has to be satisfied to guarantee the uniqueness of decryption. More than two functions are needed if plaintexts contain more than two characters.

An obvious way to transform a functional cryptosystem into a public-key one is to provide a trapdoor leading from the publicized functions and values to some easily parsable situations. More specifically, we know the initial value x and functions f₀,f_lt as well as a value y such that

y = (x)fi,

for some composition of the functions f₀, /,. With this information it is hard to find the sequence of bits i, . . . i„ determining the composition, although we know the sequence is unique. However, with the trapdoor information the equation can be transformed into the form

/ = (x')g_h . . . g_in,

where x', y', g₀, are known. Moreover, now the sequence of bits (which is the same as the original sequence) can be found easily.

Let us see how the trapdoor is constructed when the two functions are morphisms. In fact, the trapdoor will lead to two easily parsable morphisms. The publicized setup uses an alphabet much bigger than I, and two finite substitutions instead of two morphisms. The substitutions and the initial word are defined in such a way that the bit sequence remains unaltered when the trapdoor is used to go from the "public" equation to the easily parsable one.

More specifically, let G = (£, h₀, h₁, w) be backward deterministic. Let A be an alphabet of a much greater cardinality than I. Typically, I consists of five letters, whereas zl consists of 200 letters. Let g:A*—>Z* be a morphism mapping every letter to a letter or to the empty word in such a way that g~^x{a) is nonempty for all letters a of Z. This means that every letter d of A is either a descendant of some letter in I or a dummy. The letter d is a descendant of a if g(d) = a. The additional condition of g'¹(a) being nonempty implies that every letter of Z has at least one descendant. The letter d is a dummy if g{d) = X.

Consider a quadruple H= (A, <j₀, a_p u), where <r₀ and c, are finite substitutions on A defined below and u is a word over A satisfying </(w) = w. Equivalently, u belongs to g~¹ (w). In general, u is not unique because dummies may occur in arbitrary positions and each descendant may also be chosen arbitrarily.

Also the finite substitutions ct₀ and are not unique. For each d in A, a₀(d) is a nonempty finite set of words y such that if h₀ maps g(d) into x in Z*, then g(y) = x. Equivalently, a_Q(d) is a finite nonempty subset of g~ ^l(h₀(g(d))). (Customarily we write the arguments of functions on the right as here. It should cause no confusion that we write them on the left while encrypting, in order to preserve the proper order in the bit sequence.) A substitution a, is defined in the same way, using h_l.

The quadruple H = (A, <r₀, a_u u) is publicized as the encryption key. A bit sequence ij . . . i„ is encrypted by choosing an arbitrary word x from the finite set

("K-, ■ • • ^ffi„ ■

If the bit sequence is long, it can be divided in an arbitrary fashion into blocks that are encrypted separately.

Everything else, that is, Z,h₀,h_l,w,g remains as a secret trapdoor. The essential item is the "interpretation" morphism g: all other items can be computed from g and the public information. We mention in passing that in the terminology of L-systems G is a DTOL-system and H a TOL-system. L-systems, named after A. Lindenmayer, are mathematical models very suitable for computer simulation of biological growth processes. The reader is referred to [RS] for details.

The idea behind the public-key cryptosystem just described is that a cryptanalyst has to parse according to the messy TOL-system H, whereas the legal receiver who knows the trapdoor can operate in the simple and easily parsable DTOL-system G. More comments will be given below. That the public-key cryptosystem works as intended is a consequence of the next lemma.

Lemma 5.2. Let G = (Z, h₀, h_t, w) be backward deterministic, and let g and H = (A, a₀, <x, , u) be defined as above. Use G and H to encrypt bit sequences in the way described above. Then decryption according to H is unique. Moreover, if the bit sequence . . . i_n is encrypted as y according to H, then ij . . . i„ is the decryption of g(y) according to G.

Proof. Consider the last sentence. Assume that y is a word in the set (u)<7_M . . . o_in. Then

g(y) = (g(u))h_u . . . h_in = (w)h_u . . . h_in.

This follows by the definition of the substitutions and u. (An algebraically minded reader will notice that substitutions, as well as morphisms, commute with catenation by their very definition.)

To prove the uniqueness of decryption according to H, assume that some y can be decrypted both as the bit sequence i and the bit sequence j. By the last sentence, g(y) is decrypted both as i and j according to G. Since decryption according to G is unique by backward determinism, we must have i = j. □

Continuing Example 5.1, let A = {c,, c₂, c₃, c₄, c₅} and define the interpretation morphism by

g(c_x) = b, g(c₂) = g(cj = a, g(c₃) = g(c₅) = A .

Thus, c₂ and c₄ are descendants of a, c_t is the only descendant of b, and c₃ and c₅are dummies. We choose u = c₄c₃cj, then g(u) — ab = w. To construct the substitutions, recall that the morphisms were defined by

h₀\a-*ab, b b; h_t:a->-a, b-*ba.

We now define <j₀ and a^ using the same descriptive notation.

o₀:c₁-+c₁,c₃c₁ Cj: Cj —» CjC₂, c₃CjC₄

c₂ —* C₄C₁₍C₂c₁c₅ c₂ —» c₂, c₃c₅c₄

c₃->c_s,c₃c₃ c₃^c₃,c₅c₅

c₄ ► C₄Cj, , C₄C j c₃ c₄ *c₂,c₄c₃

c_s —* C₅, C₃C₅C₃ c₅—>c₃,c₅c₃

This is a correct definition because when the interpretation morphism g is applied, a₀ and reduce to h_g and h_t:

cr₀: b -* b, b ct, : b ba, ba

a-> ab,ab a->a,a

X-> A, A k-*k,X

a -* ab, ab, ab a a, a

A->A,A A-*A,A

To encrypt 011 using the public-key, we first choose the word y, = c₄c_lc₅c₁ from (u)ff₀, then the word y₂ = c₂c₃c₁c₄c₃c₁c₂ from (y₁)a_l and, finally, the word

y ^{= C}2^C5^C5^C1^C2^C2^C3^C1^C2^C2

from (>'₂)^cri ■ The legal receiver may compute

g(y) = abaabaa ,

from which the plaintext 011 can be immediately recovered using the special property of h₀ and h₁ mentioned above. □

Not all DTOL-systems, that is, quadruples G = (Z,h₀,h₁,w) are backward deterministic. For instance, if all words h₀(a), where a ranges over letters of I, are powers of the same word x, then G cannot be backward deterministic. This follows because it is easy to verify that

(w)h_Qh_xh₀h₀ = (w)/i₀/i₀/i,/i₀ .

On the other hand, backward determinism does not guarantee easy parsing. For this purpose, the notion of strong backward determinism is more appropriate.

By definition, a quadruple G = (I, h₀, h_t, w) is strongly backward deterministic iff the condition

(w)h_h ...h_in = (x)h,

always implies the conditions

t = i_n and x = (w)h_is . . . h_in , .

Thus, every word generated by a strongly backward deterministic G has a unique predecessor in I*, and is derived from this predecessor by a unique morphism. This means that the parsing sequence of a word in a strongly backward deterministic DTOL-system depends only on the word and, consequently, parsing (decryption) can be carried out from right to left without any look-ahead. This is not necessarily true if G is only backward deterministic. In order to find the last bit, one may even have to go back to the axiom.

Example 5.2. Clearly, every strongly backward deterministic DTOL-system is backward deterministic. Consider G = ({a, b}, h₀, h_t, ab), where

h_Q: aab, b -» bb; h_x\ a -» bb, b^ab .

That G is backward deterministic is easy to show by an inductive argument: a counter example immediately leads to a shorter counter example, which is of course impossible. On the other hand, G is not strongly backward deterministic because

(ab)h₀h₁ = bbababab = (abbb)h_l = (baaa)h₀ .

One can prove that strong backward determinism is a decidable property, whereas backward determinism is undecidable. L

An issue important in system design is the word length. Cryptotexts should not be too long compared with plaintexts. Fortunately, there are big classes of DTOL- systems with linear growth rate. In the transition to TOL-systems, the growth rate remains essentially the same as regards descendants of letters. The substitutions for the dummies should be defined in such a way that exponential growth is not likely to occur. Besides, block division of the plaintext can always be used to reduce growth.

As regards cryptanalysis, preprocessing is not likely to succeed. Consider trapdoor pairs (G, g) such that G is a DTOL-system resulting from H by the interpretation morphism g. Given H, there may be several such trapdoor pairs. Only one of them, say (G_l,g₁), has been used by the cryptosystem designer. If some other pair (G₂, g₂) giving rise to H is found, it can be used in decryption with the following warning. G₂ is not necessarily backward deterministic and, therefore, a cryptotext may lead to several plaintexts. However, the correct plaintext is always among them.

This observation does not make the cryptanalysis by preprocessing essentially easier. It can be shown that it is an /VP-complete problem to find any trapdoor pair. (Some other preprocessing method might still exist.) Consequently, also finding the dummy letters is an /VP-complete problem. For if the dummies have been found, the construction of a trapdoor pair will be easy. This result means that it does not help much to know that dummies always have to be replaced by words consisting of dummies.

Altogether, the cryptosystem seems to be safe against cryptanalysis by preprocessing: trapdoor pairs are not easy to be found. A cryptanalytic algorithm running in time kn³, where n is the length of the intercepted cryptotext and the constant k is fairly large, can be constructed using the theory of finite automata, [Kar2],

In the following generalization, finding the dummy symbols is no longer sufficient for successful cryptanalysis.

Recall that the interpretation morphism g was supposed above to assume as its values only letters or the empty word. Such a very restrictive definition is not necessary. We now assume that the interpretation morphism is any surjective morphism g: A* -» I*. This means that all words of I* appear as values of g, a condition certainly satisfied by our original definition. Otherwise, the cryptosystem design remains unaltered. However, let us be more specific.

As before, assume that G = (I, h₀, h^, w) is backward deterministic (preferably: strongly backward deterministic). Choose A to be much bigger than I, and let the morphism g: A* -> I* be surjective. Let u be a word over A such that g(u) = w. Such a u exists because g is surjective. For d in A, let a^d), i = 0,1, be a finite nonempty set of words x with the property

g(x) = hMd)) ■

Again, the surjectivity of g is needed to assure that there are such words x. As before, the decryption of a cryptotext y can be carried out by parsing the word g(y) according to G.

Lemma 5.2 remains valid also now. Cryptanalysis by preprocessing seems to be more difficult. However, the cubic-time algorithm mentioned above for analyzing intercepted cryptotexts is applicable also to the generalized system.

Example 5.3. Consider G = ({a, b}, h₀, h_i,ba), where the morphisms are defined by

h₀: a^ab h_l: a ba b->b b-*a

Then G is strongly backward deterministic by the obvious reasons: the last letter of a word determines the morphism used, and the predecessor of a word is unique because both morphisms are injective. We choose A = {c,,. .., c₁₀} to consist of

5.2 Iteration of Morphisms 173 ten letters, and define the interpretation morphism by

^C1	-> ab	c₆	- A*
C₂	b	c₇	-» bab
C₃	- A*	C₈	-»A
C4	->b	c₉	-»ba
C₅	- a*	^C10	aa

Next we may choose u = c_g because g(c₉) = ba. To complete the definition of H, we define the substitutions a₀ and a, by

Ci	-+c 1C4	^ffl^{: C}1	~~* ^C4^C10' ^c9^c5
c₂	-"C₆c₂	C₂	c₈c₅
^c3		C₃	"►C₆C₈
C4	—►c₂, c₃c₄	C4	-♦C3C5
^c5	—► c,, c₅c₂		—► C9, c₂c₅
c₆	-c_ac₃*	C₆	->C₆C₃
^c7	—* c₇c₈c₄	^c7	^C1^C10
c₈		C₈	-"cg
c₉	^ c₃c₇, c₄cj	c₉	—> cjc₆c₅, c₅c,
^C10	-» c₅c₇, c!c₅c₂	^C10

This definition is correct because, for all i and d, a^d) contains only words x satisfying g(x) = h^gid)). For instance, from the first and two last lines we obtain

0(c₄c_lo) = g(c₉c₅) = baa = h^ab) = Mfffci)) > 3(c₃c₇) = g(c_Ac,) = bab = h₀(ba) = h₀(g(c₉)), 0(^C5^C7) = 9(cic₅c₂) = abab = h₀(aa) = h₀(g(c_lQ)).

The plaintext 01101 is encrypted according to the public key H, for instance, as follows:

C₉ —* C^Cj —*C ₃C ^C₉C ^ * c^cgc₉c ^c₉c₉—* CgC₃C_sC₃C-jC₁C₄C₁C₄C_l

^c8^c6^c8^c8^c6^c8^c1^c10^c4^c10^c3^c5^c9^c5^c3^c5^c4^c10 ⁼ ^ ■

For the legal decryption one first computes

g(y) = abaabaaabaaabaa .

By the parsing rule of G, one further obtains the equations h_i(bababbabbab) = g(y), h₀(baababa) = bababbabbab , h^abaa) = baababa , h^bab) = abaa , h₀(ba) = bab ,

where the indices of h give the plaintext 01101.

In the generalized version of the cryptosystem, where the interpretation morphism is chosen more freely, dummies are not essential for safety, as they are in the basic version. On the contrary, careless use of dummies may be a security risk. In the illustration above, some cryptanalytic conclusions can be based on the first six characters of the cryptotext y. This issue will become clearer if we assume that in the cryptosystem design the choice was u = c₉c₃ instead of u = c₉. Then one is immediately to separate the suffix z of any cryptotext, generated by the letter c₃ in u. This follows because c₃ generates only letters c₃, c₆, c₈ and, moreover, the last letter of any word generated by c₉ is not among the three mentioned. In z all occurrences of c_s may be ignored, and parsing can be based on the morphisms s₀and Sj determined by c₀ and a^.

V ^C3~*^C3^C6 ^Sl' ^Ci~*^C6

c₆ ->^Ci ^C6~* ^C6^C3

For instance, z = c₃c₃c₃c₃c₆c₃c₃c₃c₃c₆c₃ can be analyzed as follows:

^sO(^C6^C6^C6^C3^£'6^C6^C6^C3^C6) ^{= 2} '

^S1 (^C3^<-3^C6^C3^C3^C6^C3) ⁼^C6^C6^C6^C3^C6^C6^C6^C3^C6

^so(^C6^C3^C6^C3^C6) ^{= C}3^C3^C6^<-3^C3^C6^C3 '

«l(^C6^C6^C3)= C₆C₃C₆C₃C₆ ,

Si(c₃c₆) = C₆C₆C₃ ,

⁵o(^C3) ^{= C}3^C6 ■

The plaintext 011010 can be read from the indices of s. In many cases this decryption method might be even easier than legal decryption based on G!

Date: 2015-02-16; view: 851

<== previous page	\|	next page ==>
Chapter 5. Other Bases of Cryptosystems	\|	Automata and Language Theory

doclecture.net - lectures - 2014-2025 year. Copyright infringement or personal data (0.257 sec.)