ATGGAAGAAGAATATCGTTATATTCCTCCTCCTCAACAACAA
The first question was how to divide the letters up. He tried dividing them up in twos and in threes. First, by twos:
Taking a pair of letters as one unit, the four letters available yielded a possible sixteen different combinations. He wondered if each combination might represent one letter.
But this immediately led him to another problem: what language was this message written in?
It probably wasn't the Japanese syllabary. There were nearly fifty characters in that, far more than the sixteen allowed by the pair method. The English and French alphabets both had twenty-six letters, while Italian only used twenty. But he also knew he couldn't overlook the possibility that the message was in romanized Japanese. Identifying the language of a code is sometimes half the battle.
But this was a problem that had already been solved for Ando. The fact that he'd been able to replace the numerals 178136 with the word "ring" could probably be taken as a hint from Ryuji that the present code would also yield something in English. Ando was sure of this point. And so the question of language was as good as settled.
The forty-two base letters could be split into twenty-one pairs. But several pairs were identical: there were four AA's, three TA's, three TC's, and two CC's. There were only thirteen unique pairings. Ando jotted these numbers down on a piece of paper and then paged through a book on code-solving until he found a chart showing the frequency of appearance in English of different letters of the alphabet.
He knew that although the English alphabet contains twenty-six letters, not all of them occur in equal numbers in everyday use. E, T, and A, for example, are common, while Q and Z might appear only once or twice per page. Most handbooks on code-breaking will include various kinds of letter frequency charts in the back, among other statistical references. Using such tables and statistics made it easier to determine the language a coded message was in.
In this case, what the figures told him was that in an English phrase of twenty-one letters, the average number of different letters used was twelve. Ando clicked his heels. What he had was thirteen different letters, not far off the average at all. This told him that, statistically speaking, there was nothing wrong with him dividing the sequence into twenty-one pairs and assuming that each pair stood for a letter.
Putting that possibility on hold for a moment, Ando next tried dividing up the sequence into sets of three:
Date: 2015-12-24; view: 766
|