Статья 'Компьютерный анализ текстов на латинском языке: Латентно-семантический анализ «Истории готов, вандалов и свевов» Исидора Севильского' - журнал 'Историческая информатика' - NotaBene.ru
по

 

 

Journal Menu
> Issues > Rubrics > About journal > Authors > About the Journal > Requirements for publication > Peer-review process > Peer-review in 24 hours: How do we do it? > Article retraction > Ethics > Copyright & Licensing Policy > Publication in 72 hours: How do we do it? > Digital archiving policy > Open Access Policy > Open access publishing costs > Article Identification Policy > Plagiarism check policy > Editorial Board > Council of Editors
Journals in science databases
About the Journal

Публикация за 72 часа - теперь это реальность!
При необходимости издательство предоставляет авторам услугу сверхсрочной полноценной публикации. Уже через 72 часа статья появляется в числе опубликованных на сайте издательства с DOI и номерами страниц.
По первому требованию предоставляем все подтверждающие публикацию документы!
MAIN PAGE > Back to contents
Historical informatics
Reference:

Computer Analysis of Latin Texts: Latent Semantic Analysis of “Historia de regibus Gothorum, Wandalorum et Sueborum” by Isidoro de Sevilla

Kuznetsov Alexey

PhD in History

Research worker, General History Institute of the Russian Academy of Sciences

119334, Russia, g. Moscow, Leninskii prospekt, 32 a, of. 1426

kuznetsovaleks@rambler.ru

DOI:

10.7256/2585-7797.2020.2.32961

Review date:

22-05-2020


Publish date:

30-07-2020


Abstract.

The article attempts to study the Latin text of the chronicle “Historia de regibus Gothorum, Wandalorum et Sueborum” written by the famous 17th c. theologist and scholar Isidoro de Sevilla by means of advanced methods of intellectual text analysis. The main goal is to verify the hypothesis that the author had ideas about the hierarchy of barbarians. The main focus is to clarify the implicit semantic relationship between different parts of the chronicle in order to find out the author’s attitude to these three barbaric groups. The analysis of the text was performed with the R programming language. The specific method is that of latent semantic analysis providing for comparing clustering of texts on the basis of semantic space designed through the singular decomposition of term-document matrix. The research novelty of the study is that it is the first time when a full cycle latent semantic analysis of a Medieval Latin text has been performed which covered the text preprocessing, the creation of the semantic space and the calculation of the semantic similarity of texts on the basis of cosine similarity measure. The analysis results suggest that Isidoro de Sevilla really built the hierarchy of three barbarian groups providing greater similarity to the description of the Visigoths and the Suebi and putting the Vandals apart.

Keywords: vector space text representations, text mining, semantic space, cluster analysis, singular value decomposition, latent semantic analysis, computational text analysis, early Middle Age historiography, Isidore of Seville, term-document matrix
This article written in Russian. You can find full text of article in Russian here .

References
1.
Ukolova V. I. Antichnoe nasledie i kul'tura rannego srednevekov'ya (konets V-nachalo VII veka). M., 1989. – 320 s.
2.
Velázquez I. Pro patriae gentisqve Gothorvm statv (4th council of Toledo, Canon 75, A. 633) // Regna and Gentes: The Relationship between Late Antique and Early Medieval Peoples and Kingdoms in the Transformation of the Roman World / ed. by Goetz H.-W., Jarnut J., Pohl W. Leiden, Boston: Brill, 2003. P. 161-217.
3.
Wood J. The Politics of Identity in Visigothic Spain. Religion and Power in the Histories of Isidore of Seville. Leiden, Boston: Brill, 2012. – 287 pp.
4.
Marei E. S. Fenomen «vestgotskoi simfonii» v 75-om kanone IV-go Toledskogo sobora 633 g. (k probleme perekhoda k srednevekovoi gosudarstvennosti) // Elektronnyi nauchno-obrazovatel'nyi zhurnal «Istoriya». 2012. T. 3. Vypusk 3 (11) [Elektronnyi resurs]. Dostup dlya zaregistrirovannykh pol'zovatelei. URL: https://history.jes.su/s207987840000372-8-1/ (data obrashcheniya: 22.05.2020).
5.
Syamtomov I. V. Ponyatie «Gens» i korolevskaya vlast' v sisteme vestgotskogo prava (IV–VIII vv.) // Vox medii aevi. 2015. №2-3. S. 90-110.
6.
Vorontsov S. A. Wood J. The politics of identity in Visigothic Spain. Religion and power in the histories of Isidore of Seville. Brill, 2012 // Vestnik PSTGU. Seriya 1: Bogoslovie. Filosofiya. 2012. №42 (4). S. 125-131.
7.
Kwartler T. Text mining in practice with R. New Jersey: John Wiley & Sons, 2017. – 320 pp.
8.
Lane H., Howard C., Hapke H. Natural Language Processing in Action: Understanding, analyzing, and generating text with Python. Manning Publications Co., 2019. – 544 pp.
9.
Sahlgren M. The Distributional Hypothesis. From context to meaning // Rivista di Linguistica. Vol. 20(1). 2008. Pp. 33–53.
10.
Günther F., Dudschig C., Kaup B. LSAfun-An R package for computations based on Latent Semantic Analysis // Behavior Research Methods. Vol. 47. 2015. Pp. 930-944.
11.
Anandarajan M., Hill C., Nolan T. Practical Text Analytics. Maximizing the Value of Text Data. (Advances in Analytics and Data Science. Vol. 2.) Springer, 2019. – 426 pp.
12.
Korshunov A., Gomzin A. Tematicheskoe modelirovanie tekstov na estestvennom yazyke // Trudy Instituta sistemnogo programmirovaniya RAN. Tom 23, 2012. S. 215-244.
13.
Sarkar D. Text Analytics with Python: A Practitioner's Guide to Natural Language Processing. Bangalore, 2019.
14.
Deerwester S., Dumais S. T., Landauer T. K., Furnas G., Beck L. Improving Information Retrieval with Latent Semantic Indexing // Proceedings of the 51st Annual Meeting of the American Society for Information Science, 25, 1988. Pp. 36-40.
15.
Deerwester S., Dumais S. T., Furnas, G. W., Landauer, T. K., Harshman, R. Indexing by Latent Semantic Analysis // Journal of the American Society for Information Science, 41, 1990. Pp. 391-407.
16.
Dumais S.T. LSA and Information Retrieval: Getting Back to Basics // Handbook of Latent Semantic Analysis. / ed. by: Landauer T. K., McNamara D. S., Dennis S., Kintsch, W. Mahwah, New Jersey: Erlbaum. 2007. Pp. 293-322.
17.
Manning, K. D., Ragkhavan P., Shyuttse Kh. Vvedenie v informatsionnyi poisk. Moskva: I. D. Vil'yams, 2011. – 528 s.
18.
Dumais S., Nielsen J. Automating the assignment of submitted manuscripts to reviewers // SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval. Copenhagen, Denmark, June 21-24, 1992. Pp. 233–244.
19.
Ozsoy M., Cicekli I., Alpaslan F. Text Summarization of Turkish Texts using Latent Semantic Analysis // COLING 2010, 23rd International Conference on Computational Linguistics, Proceedings of the Conference, 23-27 August 2010, Beijing, China. Vol. 2. Pp. 869-876.
20.
Landauer T. K., Dumais S. T. A solution to Plato’s problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge // Psychological Review, 104(2), 1997. Pp. 211–240.
21.
Velichkovskii B. M. Kognitivnaya nauka. Osnovy psikhologii poznaniya. Tom II. Moskva, 2006. – 448 s.
22.
Landauer T. K. LSA as a Theory of Meaning // Handbook of Latent Semantic Analysis. / ed. by: Landauer T. K., McNamara D. S., Dennis S., Kintsch, W. Mahwah, New Jersey: Erlbaum. 2007. Pp. 3-32.
23.
Denhière G., Lemaire B., Bellissens C., Jhean-Larose S. A semantic space modeling children’s semantic memory // Handbook of Latent Semantic Analysis. / ed. by: Landauer T. K., McNamara D. S., Dennis S., Kintsch, W. Mahwah, New Jersey: Erlbaum. 2007. P. 143-167.
24.
Voronin V. M., Kuritsyn S.V. Latentnyi semanticheskii analiz i ponimanie teksta // Psikhologicheskii vestnik Ural'skogo gosudarstvennogo universiteta. Vyp. 9. Ekaterinburg, 2010. S. 15-27.
25.
Kuralenok I. E., Nekrest'yanov I. S. Avtomaticheskaya klassifikatsiya dokumentov na osnove latentno-semanticheskogo analiza // Trudy pervoi vserossiiskoi nauchno-metodicheskoi konferentsii «Elektronnye biblioteki: perspektivnye metody i tekhnologii, elektronnye kollektsii». SPb, 1999. C. 89–96.
26.
Kou G., Peng Y. An Application of Latent Semantic Analysis for Text Categorization // International Journal of Computers Communications & Control. 10(3). June, 2015. Pp. 357-369.
27.
Krasnov S. A., Ilatovskii A. S., Khomonenko A. D., Arsen'ev V. N. Otsenka semanticheskoi blizosti dokumentov na osnove latentno-semanticheskogo analiza s avtomaticheskim vyborom rangovykh znachenii // Trudy SPIIRAN. 2017. № 54 (5). C. 185-204.
28.
Isidorus Hispalensis. Historia de regibus Gothorum, Wandalorum et Suevorum // Patrologiae Cursus Completus. Series Latina. Vol. 83. Paris: 1850. Col. 1057-1082.
29.
Patrologia Latina Database [Elektronnyi resurs]. URL: http://pld.chadwyck.co.uk/ (data obrashcheniya 22.05.2020).
30.
Sancti Isidori Hispalensis Episcopi Historia de regibus Gothorum, Wandalorum et Suevorum [Elektronnyi resurs]. URL: https://www.thelatinlibrary.com/isidore/historia.shtml (data obrashcheniya 22.05.2020).
31.
Kuznetsov A. V. The computer analysis of Latin texts: Latent Semantic Analysis of «Historia de regibus Gothorum, Wandalorum et Suevorum» by Isidore of Seville [Elektronnyi resurs]. URL: https://github.com/alexeyvkuznetsov/Latin_Text_LSA/ (data obrashcheniya 22.05.2020).
32.
Gefen D., Endicott J. E., Fresneda J. E., Miller J., Larsen K. R. A Guide to Text Analysis with Latent Semantic Analysis in R with Annotated Code: Studying Online Reviews and the Stack Exchange Community // Communications of the Association for Information Systems. Vol. 41, Article 21. November 2017. Pp. 450-496.
33.
Kuznetsov A. V. Primeneniya instrumentov text mining dlya analiza srednevekovykh latinoyazychnykh tekstov: predvaritel'naya obrabotka tekstov // Nauchnye issledovaniya i razrabotki. Sbornik nauchnykh rabot 57i Mezhdunarodnoi nauchnoi konferentsii Evraziiskogo Nauchnogo Ob''edineniya (g. Moskva, noyabr' 2019). Moskva: ENO, 2019. C. 68-70.
34.
tm: Text Mining Package [Elektronnyi resurs]. URL: https://CRAN.R-project.org/package=tm (data obrashcheniya 22.05.2020).
35.
Natural Language Processing with R and UDPipe. Tokenization, Parts of Speech Tagging, Lemmatization, Dependency Parsing and NLP flows [Elektronnyi resurs]. URL: https://bnosac.github.io/udpipe/en/ (data obrashcheniya 22.05.2020).
36.
Perseus Stop Words. [Elektronnyi resurs]. URL: http://www.perseus.tufts.edu/hopper/stopwords/ (data obrashcheniya 22.05.2020).
37.
Dumais S. T. Improving the retrieval of information from external sources. // Behavior Research Methods, Instrumentation, and Computers, 23(2), 1991. Pp. 229-236.
38.
Salton G., Buckley C. Term-weighting approaches in automatic text retrieval // Information Processing and Management, 24(5), 1988. Pp. 513-523.
39.
Wild F. lsa: Latent Semantic Analysis. (R package version 0.73.2). [Elektronnyi resurs]. URL: https://CRAN.R-project.org/package=lsa (data obrashcheniya 22.05.2020).
40.
Martin D. I., Berry M. W. Mathematical foundations behind Latent Semantic Analysis Handbook of Latent Semantic Analysis. / ed. by: Landauer T. K., McNamara D. S., Dennis S., Kintsch, W. Mahwah, New Jersey: Erlbaum. 2007. Pp. 35-56.
41.
Turney P. D., Pantel P. From frequency to meaning: Vector space models of semantics // Journal of Artificial Intelligence Research, 37. March 2010. Pp. 141-188. 42. James G., Witten D., Hastie T., Tibshirani R. An Introduction to Statistical Learning with Applications in R. Springer, 2015.
42.
James G., Witten D., Hastie T., Tibshirani R. An Introduction to Statistical Learning with Applications in R. Springer, 2015. – 440 pp.
43.
Kuznetsov A. V. Komp'yuternyi analiz tekstov na latinskom yazyke: tematicheskoe modelirovanie «Istorii gotov, vandalov i svevov» Isidora Sevil'skogo // Elektronnyi nauchno-obrazovatel'nyi zhurnal «Istoriya». 2020. T. 11. Vypusk 3 (89) [Elektronnyi resurs]. Dostup dlya zaregistrirovannykh pol'zovatelei. URL: https://history.jes.su/s207987840009681-8-1/ (data obrashcheniya: 22.05.2020).
Link to this article

You can simply select and copy link from below text field.


Other our sites:
Official Website of NOTA BENE / Aurora Group s.r.o.
"History Illustrated" Website