BACK TO DISCUSSIONS


B2. THE REASON FOR SHORT WORDS, TRANSPOSITION CIPHER, THE LATIN IN THE STATISTICS

Please note:
The photographs of the VM pages are at:
- the beautiful new scans at Beinecke http://webtext.library.yale.edu/beinflat/pre1600.ms408.htm clck there on the llink.
Black and wite are here:
- http://www.geocities.com/Athens/Delphi/8389/voygal1.htm
- http://www.almaleh.com/v1.htm French site with colored photos.
2) Complete transcript of the VM in the EVA alphabet:
- http://www.voynich.com/pages/index.htm
- Another transcription is at: http://www.dcc.unicamp.br/~stolfi/voynich/

DISCUSSION: b2. (B16) .


1) New explanation of the "shortness" of words in the VM. (Jan Hurych).

While searching in libraries, I found one interesting transposition cipher there. It bears some interest, since it would explain the shortness of words in the VM ( I have changed it a little bit and the result is in Fig.1)

For Latin text I have used the words "VITA NOSTRA BREVIS EST, BREVI FINIETUR, VENIT MORS . . ." etc., (the third verse of "Gaudeamus Igitur", just in case you didn't recognize it :-). The text is written horizontally, with spaces included between words. The enciphered text can be read vertically, with additional spaces after the end of each column.

The result is:
"VRE V IASFE T TIN AB NI RBIT NERE OVETM SIVUO TSIRR". The receiver of the message will just write words vertically in the matrix and the result can then be read horizontally. I have found a slight problem, however: let's consider the column, which ends with a space before its end (i.e. column no.6, NERE_), it should be written in cipher with two spaces. Otherwise the receiver of the message would not know how to start the fifth column properly - that is as "_ RBIT" (correct) and not "RBIT_" (incorrect). I haven't seen any "double spaces" in the VM, so it was apparently replaced by some other insignificant letter, functioning as "null".

As we can see, original text has quite long words (i.e. "FINIETUR") and yet the enciphered text has at max 5-letter words. Well, so far weare for the most part looking for the "short-worded" language. This method also gives an explanation of why the author of the VM wrote the letters separately (in the 15 century there was already "connected" script): it would be otherwise difficult to separate individual letters, since some letters contain the other ones as their integral part :-). Also, that's probably why the author avoided commas, they do not need to be enciphered, being otherwise the clear giveaway to the method.

The encipherment is elegant and simple - no mathematics - but really not that simple. We still do not know the dimensions of the above matrix. What we do know, however, is that the transposition cipher retains the letter frequency intact. We can now look at those frequency tables . . .

Comments (by Lukas Palatinus): I have to say openly that this theory has some drawbacks and it will require at least some improvements:

a) The grammar cannot be explained by this cipher. By "grammar" I mean certain rules for the building of words. Could this happen in this cipher too? How do we reach the state when the combination "aiin" is almost always at the end of words?

b) If we replace the end of each column by a space, the frequency of the words (in ciphered text) with column length (at 5 as in the example) would be much higher than that of the others. But the curve of the VM is that of a nice Bell shape. If we don't put a space at the end of the column, we may have words of different lengths and with Pascal distribution.

c) And last but least, transposition is the most known cipher and can be solved quite easily. I would be surprised if no one who tried that approach would not finish it. Especially Friedman and Manly were experts who would not pass this option. Well, maybe they did not have an EVA transcription.

The answers (by J.H.):
a) Well, grammar . . . Take for instance "controller" and "closer" - they both end with "er" but each "er" has a different function - and we don't even know what function"aiin" has in the VM language (even if it as a plain language which is not encoded). The repetition of "suffixes" may very well be some code.
b) True, the listed example has 4 five-letter words and 8 shorter, 2 of each, but this is not a typical example and we do not really know the number of columns and rows used. Besides, I am not claiming that this is the solution :-), I am only describing the method of how we can "shorten" the words, nothing more (yet). And I do not think that the idea to shorten the words was so silly, on the contrary: it kept us looking for a mysterious language with "super-short" words.

c) True, a transposition cipher is easy to write - no mathematics - but it is not so easy to solve; after all, a six letter word can be written in 6!=720 ways and we are not even talking about double transposition. Both features of such cipher are advantageous for the writer. And we cannot use a frequency table, because the letters are already the "true ones". I believe that the transposition idea surely occurred to experts, but I haven't read anywhere that it was because they tried to explain the "shorter" words in the VM.
2) Letter frequency tables.(by Lukas Palatinus)

Prior discussion: My congratulation to your discovery that the page lettering was done already by author himself. It also proves that the script is quite artificial - some letters certainly look like numerals - the author must have known that.

I am also enclosing the letter statistics for the VM, which I compiled from http://www.dcc.unicamp.br/~stolfi/voynich/Notes/015/majority.evt )
by Mr. Stolfi. There are several visible jumps in frequencies, typical for the VM.

The table of letter frequencies (fig. 2)
(the one for the VM is by Lukas Palatinus ("EVA" transcription), the English and Latin are by Jan Hurych)



3) Could it be the Latin language? (Jan Hurych).

Comparison shows that the curves for the VM and Latin are not only close, they even have those typical jumps and letter "groupings", see fig.3. Simply said, the result looks almost incredible, but it is still preliminary.



For my Latin table, I used the text of medieval Latin by St. Augustin (Confessions, Book 1, http://ccat.sas.upenn.edu/jod/latinconf/ , since I could not find any table on the Net. Still, the method of enciphering is yet unknown - for instance, we do not know the number of rows and columns in the transposition matrix. What's more, we have to try other literature to get more accurate statistics, eventually to confirm our findings.



To know more about differences, I have also prepared a table for the medieval English of Francis Bacon, from The advancement of learning, Book 1,, at: http://darkwing.uoregon.edu/~rbear/adv1.htm The difference between English and the VM isapparent here, not only in magnitude, but in the shape of the curve itself. The English curve cuts acreoss the VM curve irregularly; it does not have the typical steps and letter "gatherings". What surprised us, however, was the close similarity to the Latin language.


Save this page     Print this page