Статья 'Автографы Петра I: чтение технологиями искусственного интеллекта и создание электронного архива' - журнал 'Историческая информатика' - NotaBene.ru
по
Journal Menu
> Issues > Rubrics > About journal > Authors > About the Journal > Requirements for publication > Peer-review process > Article retraction > Ethics > Online First Pre-Publication > Copyright & Licensing Policy > Digital archiving policy > Open Access Policy > Article Processing Charge > Article Identification Policy > Plagiarism check policy > Editorial Board > Council of Editors
Journals in science databases
About the Journal

MAIN PAGE > Back to contents
Historical informatics
Reference:

Autographs of Peter the Great: Reading with Artificial Intelligence Technologies and Creating an Electronic Archive

Bazarova Tat'yana Anatol'evna

ORCID: 0000-0001-9380-5921

PhD in History

Head of the Scientific and Historical Archive and a Group of Source Studies

195196, Russia, Saint Petersburg, Gromova str., 12

tbazarova@yandex.ru
Proskuryakova Mariya Evgen'evna

ORCID: 0000-0003-3000-999X

PhD in History

Senior researcher, Saint Petersburg Institute of History of Russian Academy of Sciences

197082, Russia, Saint Petersburg, Optikov str., 47

m-proskuryakova@mail.ru

DOI:

10.7256/2585-7797.2022.4.39224

EDN:

QMWYXE

Received:

21-11-2022


Published:

30-12-2022


Abstract: The article is devoted to modern digital methods of working with the handwritten heritage of Peter I. They were applied within the framework of the scientific project "Autographs of Peter the Great: Reading by artificial intelligence technologies". The project was initiated by the Russian Historical Society and implemented by specialists of the St. Petersburg Institute of History of the Russian Academy of Sciences, Sberbank PJSC. The article describes the methodology of preparing a data set for creating a program for machine reading of the manuscripts of Peter the Great ("Digital Peter"). Special emphasis is placed by the authors on the principles of transcribing of the historical text developed during the project. In addition, the cases of the use of non-letter characters by Peter I and the difficulties caused by this in the formation of a data set are analyzed. The article also reflects the results of the created algorithm and identifies variants of the organization of the text of Peter I, which reduce the quality of recognition. The authors also paid attention to the electronic archive "Autographs of Peter I", which became a continuation of the project on machine reading of the manuscripts of the first Russian emperor. The archive, which is being worked on, contains digital copies of Peter's autographs, the results of their recognition by the Digital Peter program, as well as scientific publications of these unique historical sources. The Internet portal "Autographs of Peter I" is associated with the resource: "Biochronics of Peter the Great day by day" (created on the HSE website). The connection of the two sites opens up additional opportunities for researchers: each digitized autograph is introduced into a historical context.


Keywords:

Data preparation, Processing of digital archives, Machine reading, Electronic archive, Peter I, Paleography, Computer vision, Digital Peter, Autographs of Peter the Great, Biochronics of Peter the Great

This article is automatically translated. You can find original text of the article here.

The first publications of Peter I 's autographs were published at the end of the XVIII century . and they were not yet of a scientific nature. At the dawn of the formation of Russian historical science, scientists were interested only in the content of the document and the possibility of interpreting it [4, 8, 9]. The situation changed by the second half of the XIX century, when historians developed and began to apply the principles of scientific publication. The 200th anniversary of Peter the Great prompted scientists to pay close attention to his epistolary legacy. In the wake of the growing interest in the life and work of the reformer tsar in Russian society, academician Afanasy Fedorovich Bychkov (who had considerable experience in studying and publishing Peter's documents) put forward the idea to identify, copy and publish all the letters and papers of Peter I dispersed among various state, departmental and private collections. The idea of A. F. Bychkov was supported by the Minister of Public Education Dmitry Andreevich Tolstoy and approved by Emperor Alexander II.

At the end of 1872, a Commission was established to publish letters and papers of Emperor Peter the Great (the commission included St. Petersburg and Moscow scientists S. M. Solovyov, N. A. Popov, K. N. Bestuzhev-Ryumin, E. E. Zamyslovsky, N. V. Kalachov, A. E. Viktorov). In 1887, the first volume of "Letters and Papers of Emperor Peter the Great" was published. It included documents for 1688-1701, both written in a clerical handwriting and certified by the signature of the reformer tsar, as well as handwritten letters and decrees of Peter I [5, 6]. In the preface, explaining the reasons for the delay in the publication of the first volume, the members of the commission also noted "the difficulty of finding people who can correctly read the coherent and illegible handwriting of Peter the Great" [5, vol. 1, p. XIII].

A total of six volumes were published before the revolution; the first issue of the seventh volume went out of print in 1918, and the second — only in 1946. The work, which was carried out slowly and with long interruptions, has not yet been completed. In 2022 the first issue of the fourteenth volume was published, covering the events of January—June 1714. Materials related to the last ten years of the life of the first Russian emperor have practically not been introduced into scientific circulation. In this regard, the digitization and creation of a single database for storing digital copies of Peter I's autographs, as well as the use of modern computer technologies for machine reading of his handwritten heritage, is of particular importance.

In June 2020, employees of the St. Petersburg Institute of History of the Russian Academy of Sciences (hereinafter - SPBI RAS) and Sberbank PJSC started the project "Autographs of Peter the Great: Reading with Artificial Intelligence Technologies", which was initiated by the Russian Historical Society [2, 3]. The main task of the project was to create software for recognizing autographs of Peter the Great and its placement in open access on the Internet. Historians, specialists in the field of auxiliary historical disciplines and computer technology were looking for ways to solve the problem in close cooperation.

A working group was created at the St. Petersburg Institute of History of the Russian Academy of Sciences, consisting of specialists in the history of the Petrine era, as well as paleography and archeography. Her key task was the selection and transcription of Peter's autographs. The selection of documents was carried out in volumes 7-13 of the edition "Letters and Papers of Emperor Peter the Great". These volumes contain modern (or close to them) archival ciphers of documents for 1708-1713 from the collection of the RGADA. The staff of the RGADA digitized the materials and transmitted copies of the SPBI RAS. The members of the working group transcribed the received documents, while the results of reading each text were double-checked. The variants of reading the documents given in the "Letters and Papers of Emperor Peter the Great" could not be used. To train artificial intelligence, it was necessary to develop other principles of text transmission, which differed significantly from the requirements of modern archaeography. It was necessary to convey as accurately as possible all the signs (letters) that Peter I used. A group of Sberbank specialists headed by D. V. Dimitrov and M. S. Potanin, created a trial algorithm using a small number of transcriptions (about 90 autographs). Based on the first iteration of the algorithm, it was possible to clarify the principles of transcribing text. They boiled down to the following: a) spaces between words were placed; b) abbreviations of words were not disclosed; c) letters written above the line were inserted into the line; d) capital letters were not placed; e) punctuation marks and line break marks were not placed; f) obsolete letters "i" and "?" were used; g) crossed-out words and lines were reproduced; h) lines inscribed vertically along the field, were reproduced at the end of the text; and) there are symbols for reproducing the signs used by Peter I (for example, the sign of the cross in a circle, which he used to indicate the place of text insertion). With computer typing, the text was transcribed line by line, and the lines were numbered (Figure 1).

Text, letter  Description automatically generated

Figure 1. Peter's autograph and line—by-line transcription of the text (Letter of Peter I to P. P. Shafirov, September 18, 1711 // RGADA. F. 142. Op. 2. d. 7. L. 157).

During testing of the algorithm, two technical errors were identified. The first was the use of an incorrect symbol for the letter "i": in some transcriptions, instead of a sign from the Latin alphabet (conventional version), a sign from the Ukrainian alphabet (unconventional) was used. The second error occurred due to the variety of symbols used by paleographers when transmitting the signs used by Peter I to indicate insertion into the text. In the multi-page texts that the tsar himself ruled, he used several options for the insertion mark (at least ten): a) a cross [7, l. 32, 33 vol., 37, 180, 195 vol., 221, 222, 385 vol., 409; b) a cross enclosed in a semicircle on the left [7, l. 157, 180, 181 vol., 230, 286, 304, 385 vol., 409]; c) a cross in a circle [7, l. 240, 279 vol., 367., 409]; d) two crosses placed side by side [7, l. 196 vol., 197]; e) oblique (St. Andrew's) cross [7, l. 367 vol., 385, 395]; f) oblique a cross enclosed in a semicircle on top [7, L. 424]; g) an oblique cross enclosed in a semicircle on the left [7, L. 181 vol.]; h) an oblique cross in a circle [7, L. 409]; i) a slash twice crossed horizontally [7, L. 39 vol., 40, 279 vol.]; k) two vertical lines crossed out by two horizontal lines ("grid") [7, l. 151 vol.]; l) an oval at the top of the line, connected to the oval at the bottom of the line by a slash [7, L. 306a, 307]. Some of the listed symbols were used in isolated cases, so the method of their transmission was not specified in advance. During testing of the algorithm, strings with characters-"not letters" and characters-"not numbers" remained unrecognized: all text disappeared after an unidentified character. Therefore, after the errors were detected, the entire array of transcriptions was reviewed again, and the incorrect characters were replaced (Figure 2).

Text, letter  Description automatically generated

Figure 2. Signs used by Peter I to indicate insertion into the text ("Maxim" about the actions of the Russian army and Navy in Finland, May 11-12, 1713 Fragment // RGADA. F. 9. Ed. I. Book 21. L. 409).

 

After completing the transcription of the manuscripts, the team switched to marking up the data in a specialized program, access to which was provided by Sberbank employees. Photocopies of all transcribed documents (719 pages) were uploaded to the program. Paleographers needed to use broken lines to separate one line from another. At the same time, it was necessary to trace the elements of the text as carefully as possible, including the letters written above the line, and all the elements of the letters (loops, half-loops and masts). This was required in order for the program to receive the largest possible set of templates (variants of writing letters and numbers by Peter I) for correct text recognition in the future. At the described stage, graduate students and undergraduates of the Higher School of Economics Research University joined the team. In total, 10512 lines were transcribed and marked up by joint efforts, which made up a set of data transmitted for further work of programmers and holding a competition for the best solution. The competition in the fall of 2020 was held by Sberbank PJSC in order to obtain other variants of the algorithm for recognizing autographs and to determine different approaches to solving the task (Figure 3).

 

Text, letter  Description automatically generated

Figure 3. Marking of a photocopy of Peter's autograph and line-by-line transcription of the text (RGADA. F. 142. Op. 2. d. 7. L. 157).

 

The result of the work of a team of historians, paleographers and data analysis specialists was a computer program that is available on the Digital Peter website at https://www.sber.ru/digital-petr / [2]. Digital copies of Peter's autographs are uploaded to the resource for recognition. After transcription, the interface reflects both a copy of the document and the text read line by line (all lines are numbered). The site also has the ability to download recognized text in docx format.

After the work on the site was completed, the software was tested on several dozen of Peter's manuscripts that were not included in the training data set and had not been previously published. To check the quality of recognition, a document from the collection of the SPBI RAS was uploaded to the site: "Peter's Remarks I to the "Line of Battle" of Russian ships" 1719 [1, l. 4] (Figure 4).

 

Text  Description automatically generated

Figure 4. Recognition of Peter I's autograph on the Digital Peter website https://www.sber.ru/digital-petr / (Peter I's remarks on the "Line of Battle" of Russian ships, 1719 // Archive of the Russian Academy of Sciences. Call. 277. Op. 1. d. 3. L. 4).

 

Of the 365 letters of the Petrovsky text (of which 64 are written above the line), the algorithm "made a mistake" only in four cases. These are difficult words to read: in the 1st line, a crossed-out word with traces of editing was incorrectly read: "they will see" instead of "they will see"; in the 13th line, "he" was read instead of "with us", here Peter I moved one syllable under the line, and the ink was smeared. In the 9th line, artificial intelligence incorrectly put a space, while correctly recognizing all the letters in the word "in // to the petitioner". Finally, the words "on the main wall" were not transcribed, in the 4th line it was "read" as follows: "nagra stange". The word "mainmast" has become new to the program, it was not found in the autographs of Peter I, used to train the computer. When transcribed by a computer, the meaning of the document was not distorted. The recognition accuracy was 97.4% (Figure 5).

 

Text, letter  Description automatically generated

Figure 5. Text recognition results on the Digital Peter website (Archive of the Russian Academy of Sciences. Call. 277. Op. 1. d. 3. L. 4).

 

During the testing of the program, four groups of problematic cases were identified when the algorithm does not "read" the text or makes mistakes: a) lines written vertically (next to the main text or under the text); b) lines inscribed by Peter I between the lines of the clerical text; c) text divided into columns (two or more); d) letters of the Latin alphabet. When transcribing texts that contain the listed features, the percentage of reading accuracy decreases. However, as far as can be judged from the studied set of Peter's autographs, the total number of such documents is small. Of course, the result also depends on the quality of the digital copy of the archive document uploaded to the program.

During the implementation of the project dedicated to the autographs of Peter I, the algorithm was tested on cursive documents of the second half of the XVII century (3500 lines were transcribed and marked up). Artificial intelligence showed high accuracy of text recognition (95.8%). The results of this work cannot currently be used, since the website created within the framework of the project "Autographs of Peter the Great: Reading with Artificial Intelligence Technologies" is designed exclusively for deciphering Peter's manuscripts. Therefore, when loading other documents, the program does not work. However, during the implementation of the project, the team of the SPBI RAS has accumulated a unique experience in the formation of data sets necessary for the training of the algorithm. This experience can be used in the future to create new models focused on machine reading texts of various epochs and alphabets.

In the jubilee year of Peter the Great, 2022, the project "Digital Peter" was further continued. At the initiative of the Russian Historical Society, researchers together with IT specialists A. Y. Khodot and E. V. Rigin began work on a new website, also aimed at studying and popularizing the epistolary heritage of Peter I. The inspiration was the project of the digital academic edition of the works of A. S. Pushkin "Pushkin Digital" (https://pushkin-digital.ru /), which has turned into a real multimedia encyclopedia of the works of the great Russian poet.On the website "Autographs of Peter I" developed by IT specialists (https://peterscript.ru /) the following pages were programmed:

1. Home page.

2. A catalog of documents with two output options (a list and a tile), as well as contextual search for documents, sorting by various parameters, grouping by year and the ability to select the number of results on a page with a selection memory.

3. A document page with two columns for tablets and personal computers and one for mobile devices. The first column displays the images of the manuscript with the ability to zoom in and switch between images, the second one shows the transcription of each page of the manuscript via Digital Petr, publication images and publication in pdf format. The transcript via Digital Petr and the images of the publication were linked to each image of the manuscript (Figure 6).

 

Text  Description automatically generated

Figure 6. The handwritten letter of Peter I G. F. Dolgoruky, June 13, 1709 (RGADA. F. 9. Ed. I. Book 60. L. 30) and the results of reading the Petrovsky text by the program "Digital Peter", presented on the website "Autographs of Peter I. Electronic archive" https://peterscript.ru/document/14

 

4.                      A text page with control via the Admin Panel.

Currently, the site contains over 170 letters, decrees and notes of Peter I for 1707-1713, the originals of which are stored in the RGADA and the Archive of the SPBI RAS. Each image of the autograph is provided with a short title with the date, an indication of the place of storage and publication. The user has the ability to sort and search by these parameters, as well as search by the recipients of Peter's letters. Each digital copy of the archival document is accompanied by a copy of its publication in the "Letters and Papers of Emperor Peter the Great" in pdf and jpg formats, as well as transcribed on the Digital Peter website. Thus, the same text appeared to be presented in different guises. A specialist or an interested reader has the opportunity to read Peter's handwriting independently, compare the result with the publication in "Letters and Papers of Emperor Peter the Great" (Figure 7) or with recognized artificial intelligence.

Text  Description automatically generated

Figure 7. The handwritten letter of Peter I G. F. Dolgoruky, June 13, 1709 (RGADA. F. 9. Ed. I. Book 60. L. 30) and the scientific publication of the text of this letter in the "Letters and Papers of Emperor Peter the Great" (Moscow, 1950. Vol. 9. Issue 1. № 3227. P. 208), presented on the website "Autographs of Peter I. Electronic archive" https://peterscript.ru/document/14

 

It was also decided to "link" the documents posted on the website to the biochronics of Peter the Great (Itinera petri: biochronics of Peter the Great day by day; https://spb.hse.ru/humart/history/peter / ). Prepared by E. V. Anisimov and the database posted on the website of the HSE in St. Petersburg, allows you to trace the life and activities of the first Russian emperor day by day. The connection of the two sites opens up additional opportunities for researchers. For example, to find out what the first Russian emperor did on the day of sending the letter posted on the website, what other decrees he signed, whom he met or who accompanied him on the trip. Consequently, each digitized autograph is introduced into a historical context.

Thus, researchers and everyone interested in the history of Modern Russia will have a unique opportunity to work with Peter's legacy even before the completion of the fundamental publication of "Letters and Papers of Emperor Peter the Great" and evaluate the contribution that Peter I made to the formation of Russian statehood.

References
1. Archive of Saint Petersburg Institute of History Russian Academy of Sciences. Coll. 277 (Collection of documents of Peter I). Opis’ 1. Delo 3.
2. Bazarova, T. A., & Dimitrov, D. V., & Potanin, M. S., & Proskuryakova, M. E. (2020). Распознать и транскрибировать: Автографы Петра Великого и технологии искусственного интеллекта [To recognize and Transcribe: Autographs of Peter the Great and Artificial Intelligence Technologies]. Vorontsovo pole, 4, 36–41.
3. Bazarova, T. A., & Potanin, M. S., & Proskuryakova, M. E. (2022). “…Он нисколько не заботился о синтаксисе”: Может ли искусственный интеллект прочитать письма Петра I? [“He did not care about the syntax”: Can artificial intelligence read the letters of Peter I?]. Rodina, 6 (June), 112–115.
4. Golikov, I. I. (1837–1843). Деяния Петра Великого, мудрого преобразителя России, собранные из достоверных источников и расположенные по годам [Acts of Peter the Great, the wise reformer of Russia, collected from reliable sources and arranged by years]. Vol. 1–15. Moscow: Printing house of N. Stepanov.
5. Письма и бумаги императора Петра Великого [Letters and papers of Emperor Peter the Great]. (1887–2022). Vol. 1–14. Saint Petersburg (Leningrad), Moscow.
6. Pod’’yapol’skaya, E. P. (1974). Об истории и научном значении издания “Письма и бумаги императора Петра Великого” [On the history and scientific significance of the publication “Letters and Papers of Emperor Peter the Great]. In: S. O. Shmidt (Ed.), Arheograficheskij ezhegodnik za 1972 god (pp. 56–70). Moscow: Nauka.
7. Russian State Archive of Ancient Acts. Fond 9 (Documents from Peter I office). Otdelenie I. Kniga 21.
8. Tumansky, F. O. (Ed.). (1787–1788). Собрание разных записок и сочинений, служащих к доставлению полного сведения о жизни и деяниях государя императора Петра Великого [A collection of various notes and writings that serve to deliver complete information about the life and acts of the Emperor Peter the Great]. Part. 1–10. Saint Petersburg: Printed by Shnor.
9. Shcherbatov, M. M. (Ed.). (1774). Тетрати записныя всяким писмам и делам, кому что приказано и в котором числе от его императорского величества Петра Великого 1704, 1705 и 1706 годов: С приложением примечаний о службах тех людей, к которым сей государь писывал [Notebooks for all sorts of letters and acts, to whom what is ordered and in which number from His Imperial Majesty Peter the Great 1704, 1705 and 1706: With notes on the services of those people to whom this sovereign wrote]. Saint Petersburg: Senate printing house.

Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

Peter the Great's Autographs: reading with artificial intelligence technologies and creating an electronic archive // Historical computer science. The political and scientific and educational events of recent days, the speech of the head of our state at the international conference "Journey into the world of artificial Intelligence" on November 24, 2022, struck the imagination of many humanitarians. Against the background of this sensation, the relevance of the reviewed article becomes obvious. Its scientific novelty is determined both by a new formulation of the problem and by new solutions to it. The article is an important event in the practical implementation of artificial intelligence in the process of introducing manuscripts into scientific circulation. It is almost impossible for a modern person to read the manuscripts of Peter the Great if they do not have a special education. The article begins with a brief reference that the scientific idea to read Peter's manuscripts arose in the middle of the XIX century, and at the end of the XIX century the first publications of Peter's documents appeared. It should be noted that the recognition of Peter's texts was carried out in the early twentieth century by the famous bibliophile, the vice-governor of the Arkhangelsk province, A.F. Shidlovsky, when he discovered Peter's manuscripts in the archives of Arkhangelsk and published them in 1909. The authors of the article note that the reading, study and publication of the "Letters and Papers of Emperor Peter the Great" progressed slowly and with long interruptions and have not been completed so far. The task was to use modern computer technologies for machine reading of the handwritten heritage, to digitize and create a single database for storing digital copies of Peter the Great's autographs, as well as to create software for recognizing autographs and placing it in open access on the Internet. To train artificial intelligence, it was necessary to develop new principles of text transmission. They differ significantly from modern archaeography. In fact, a special alphabet of letters and signs was created, which Peter operated with. This greatly facilitates further work. The article explains how painstaking the reconciliation of the meaning of each sign or letter in Peter's papers was, because the machine needed to not confuse them and read them correctly. The results of this work will be of interest to a wide variety of readers. The text is greatly enlivened and the illustrations make clear, allowing you to see the intermediate results of the work. In the final part of the article, it is said how high the accuracy of text recognition is (97.4%). The information in this article is diverse and impressive, because more than 170 letters, decrees and notes of Peter I for 1707-1713 have already been received, the originals of which are stored in two archives. The style, structure and content of the article are consistent and accessible to the average young reader. The bibliography reflects the central archives, which contain Peter's documents, and the main research on the problems of studying hard-to-read texts. I recommend publishing the article.
Link to this article

You can simply select and copy link from below text field.


Other our sites:
Official Website of NOTA BENE / Aurora Group s.r.o.