Rinchinov O.S. - The Diachronic Corpus of the Buryat Language as a Digital Tool for Historical Research: Approaches, Solutions and Experiments pp. 26-34


Abstract: The article studies the diachronic corpus of the Buryat language compiled on the basis of annals written in old Mongolian used to reconstruct the history and historical geography of the Buryat people. In this regard, the article discusses the main problems of semantic markup of corpus data. The size of the corpus currently exceeds 82,000 words. The research novelty is that classical Mongolian texts presented in Latin transliteration are addressed by computer linguistics methods for the first time. The author describes approaches to develop the ontological outline of the historical and cultural subject area as well identifies the kinship and geographical context elements. The MS Access and SQL simulation experiment demonstrates the advantages of the authority control methodology, in particular the “family” and “place” categories, for the initial analysis of corpus data and the formation of semantic clusters. The use of authoritative records has significantly accelerated the accumulation of empirical data for automation of the substantive analysis of texts in the corpus. These experiments allowed the author to see further steps to create and improve the Buryat language diachronic corpus semantic markup tools and transform this language into a convenient tool for historical research.
Thorvaldsen G. - Automating Historical Source Transcription with Record Linkage Techniques. Work in progress on the 1950 census for Norway pp. 94-103


Abstract: The article addresses the issue of transcribing handwritten materials of the 1950 Norwegian Population Census. These are 801 000 scanned double sided questionnaires. Optical character recognition programs have been improving for over four decades.  Now researchers aim to extend similar techniques to handle handwritten historical source material. The article analyzes studies carried by the Center of Historical Documents at the University of Tromsø which address handwritten text recognition as well as considers the use of various text recognition techniques as far as nominative sources are concerned. Since it is difficult to distinguish and separate individual handwritten characters, the words are mathematically clustered according to image similarity or searched for within sources that have been transcribed earlier. After the recognition quality control, the software uses the line numbers to place the information taken from the transcribed cells. After that the latter become a part of the census database. Moreover, special software has been developed to process handwritten numerical codes, data on occupations and education, etc. The methods offered in the article provide for handwritten texts transcribing quality improvement and can be used to recognize nominative source notes in Russia, for instance, parish registers and vital records. The main goals are still the search for methods and algorithms which optimally link different variables as well as the rationalization of interactive proofread methods.  
Akasheva A.A., Chechin A.V. - A Technique to Reconstruct Nizhniy Novgorod Land Survey Plan and Borders in 1784 Based on Special Geodetic Software pp. 111-142


Abstract: A present-day task of historical GIS is to geotag ancient maps within modern coordinate system. These maps are sure to have many inaccuracies. In this regard, there is a need to develop algorithms accounting for these inaccuracies and allowing one to position sources with the smallest deformations and drawbacks. This task is also relevant for Russian plans of the General Survey. Their peculiarity is that they have accurate geodetic characteristics of plots. The research subject is a set of Nizhny Novgorod plans of the late 18th entury which were the basis for a technique used to reconstruct the city borders and land survey plans. The research methodology is based on the historicism principal, systematicity and objectivity. The authors emphasize the role of statistical methods and apply specifically historical (historical and typological as well as historical and genetic) methods, the geodetic method to process and equalize transit traverse, modeling and cartometry. The research novelty is determined by the algorithm of city borders and historical land survey plans reconstruction, technological solutions for studying the object by means of geodetic programs, new data on land management and cartographic materials based on land management results in the specific region of Russia. The main conclusions are the positioned borders of Nizhny Novgorod in the conditional coordinate system. It was found that transit traverses of plots studied had significant angle linear errors. For settlement plots they are 3°29' and 1/31 and for pasture plots they are 2°49' and 1/80. For Blagoveshchenskiy Monastery they are 0°37’and 1/139. A raster land survey plan of Nizhny Novgorod has been made. It can be further used for geotagging and creating historical GIS.
Lyakhovitskii E.A., Tsypkin D.O. - Infrared Text Visualization to Study Old Russian Scripts pp. 148-156


Abstract: The article studies the script as a material object that is the system of traces left by a writing medium on a writing material (paper or vellum). Traces of the writing medium are a combination of a relief and a dye (for instance, ink). The text understood as a combination of such traces is characterized by different dye thickness and its chemical composition on different text structure levels. Such differences are determined by varying aspects of the writing ability and can be used to characterize it. The article aims at presenting the advantages of a new electro-optical spectrozonal examination of historical inks to study handwritten scripts. It discusses the technology of digital visualization of documents in the near-infra-red region followed by computer processing of the image. The result of the work is the main research paths to study information potential of the text as a physical object (system of traces) by means of spectrozonal visualization. These paths are the study of writing medium traces to reconstruct the system of movements and the writing technique, the finding of zones written in different time and the search for corrections.
Kuznetsov A. - Computer Analysis of Latin Texts: Latent Semantic Analysis of Historia de regibus Gothorum, Wandalorum et Sueborum by Isidoro de Sevilla pp. 202-217


Abstract: The article attempts to study the Latin text of the chronicle “Historia de regibus Gothorum, Wandalorum et Sueborum” written by the famous 17th c. theologist and scholar Isidoro de Sevilla by means of advanced methods of intellectual text analysis. The main goal is to verify the hypothesis that the author had ideas about the hierarchy of barbarians. The main focus is to clarify the implicit semantic relationship between different parts of the chronicle in order to find out the author’s attitude to these three barbaric groups. The analysis of the text was performed with the R programming language. The specific method is that of latent semantic analysis providing for comparing clustering of texts on the basis of semantic space designed through the singular decomposition of term-document matrix. The research novelty of the study is that it is the first time when a full cycle latent semantic analysis of a Medieval Latin text has been performed which covered the text preprocessing, the creation of the semantic space and the calculation of the semantic similarity of texts on the basis of cosine similarity measure. The analysis results suggest that Isidoro de Sevilla really built the hierarchy of three barbarian groups providing greater similarity to the description of the Visigoths and the Suebi and putting the Vandals apart.
Thorvaldsen G. - Record Linkage in the Historical Population Register of Norway pp. 212-231


Abstract: The historical population register of Norway contains data on the country's population from 1800 to 1964. Information on the country's population from 1964 to the present is collected in the Central Population Register. The historical register consists of these metric books and civil records, filling in the gaps between population censuses conducted every ten years. In 1801 and, beginning in 1865, these censuses were nominative, that is, contained the names of people. This article is devoted to the problems of linking census records and metric books (record linkage) from 1800 to 1920. Special attention is paid to the identification of individuals and the difficulties of linking records. The main problem is to identify a person by the records belonging to different years, in terms of a significant number of namesakes and variations in the fixation of their names, as well as age. The creation of stable identifiers for individuals and the procedure for linking records from various sources required the development of new software combining automatic and manual methods. Analysis of local databases allows us to hope for successful linking from 2/3 to 90% of records for various periods and regions of the country. The historical register of Norway is unique in its coverage of the territory and the variety of historical sources related to it.
Frolov A. - Tools of Geoinformatics to Study Pistsovye Knigi pp. 218-233


Abstract: The article discusses methods of systematization and visualization of codicological observations on an archival manuscript by means geoinformatics. This solution provides for summarizing the information of a historical source and its maximum accessibility for a wide range of Internet users. The web project created can be used not only for research but educational purposes as well. The paper grounds on the results of 1542 Semen Klushin’s codicological study of Novgorod pistsovaya kniga covering Vodskaya Pyatina (The work is stored in the Russian State Archive of Ancient Manuscripts, RGADA). The physical medium of a historical text, i.e. a manuscript, is considered as a special space in its own reference system. This makes geoinformatics methods applicable to determine the topology (i.e. the mutual relationship) of its objects. The approach proposed is tested for the first time that's why the main attention is paid to the description of the most important stages followed when processing the source codicological materials to turn them into a GIS project based on a relational database. The web resource created provides for visualizing a significant bulk of manuscript data. However, it should not be considered a map or a spatial model. It may be determined as a manuscript codiological GIS scheme published as a web resource but without a map. The scheme is adjusted and controlled by tools which are used when working with databases and are not limited to the cartographic interface.
Bryukhanova E.A., Eremin A.A. - 1897 Census Primary Data Representativeness: Cartographic Approach pp. 232-241


Abstract: The authors assess how 1897 Census papers stored in Russian and foreign archives are represented and preserved. The study of primary data document collections leads to a conclusion that the term “census papers” is heterogeneous and includes several different forms used depending on a type of household and region as well as first, second and third copies of census forms. A peculiar feature of the article is the presentation of conclusions in the form of cartograms based on modern and historical maps. The study has used source studies analysis and spatial analysis as well as a complex approach treating census papers as a unified historical source irrespective of their storage place. The research novelty is identification and introduction of a complex of nominative 1897 Census data. In addition, the authors propose an original approach that takes into account both the number of areas populated and the number of census papers preserved in them which allowed them to assess the degree of preservation of census materials in Russian Empire uezds. The article concludes that census papers with different preservation state have been identified for 47 % of guberniyas and 25.5% of uezds.  Census paper collections cover regions of European Russia and Siberia, partly those of the Caucasus and Central Asia. The volume of census paper data preserved and their "territorial spread" allows one to consider them a complex source on the history of the Russian Empire population at the turn of the 19th century..
