Статья 'Проект Chekhov Digital: задачи и проблемы реализации семантической разметки текстов (на примере рассказа А. П. Чехова «Смерть чиновника»)' - журнал 'Litera' - NotaBene.ru
по
Journal Menu
> Issues > Rubrics > About journal > Authors > About the Journal > Requirements for publication > Editorial collegium > Editorial board > Peer-review process > Policy of publication. Aims & Scope. > Article retraction > Ethics > Online First Pre-Publication > Copyright & Licensing Policy > Digital archiving policy > Open Access Policy > Article Processing Charge > Article Identification Policy > Plagiarism check policy
Journals in science databases
About the Journal

MAIN PAGE > Back to contents
Litera
Reference:

The Chekhov Digital project: tasks and problems of implementing semantic markup of texts (on the example of A. P. Chekhov's story "The Death of an Official")

Severina Elena Mikhailovna

ORCID: 0000-0001-6518-2771

Doctor of Philosophy

Professor, Institute of Philology, Journalism and Intercultural Communications, Southern Federal University

344006, Russia, Rostov-on-Don, Universitetskiy, 93

emkovalenko@sfedu.ru
Other publications by this author
 

 
Larionova Marina Chengarovna

ORCID: 0000-0002-2955-2621

Doctor of Philology

Head of the Department of Humanities Research, Southern Scientific Center of the Russian Academy of Sciences

344006, Russia, Rostov-On-Don, 41 Chekhov str.

chengarovna@yandex.ru
Other publications by this author
 

 

DOI:

10.25136/2409-8698.2023.10.68862

EDN:

IHSMSE

Received:

30-10-2023


Published:

06-11-2023


Abstract: The article considers a model of preparation of machine-readable (semantic) markup of texts for the Chekhov Digital project on the example of philological interpretation of individual significant elements of A. P. Chekhov's story "Death of an Official" and presentation of this information explicitly based on the standards of digital publication Text Encoding Initiative (TEI/XML). Based on the work of literary researchers, significant entities have been identified for marking up the corpus of the writer's texts, but the question of their representation in the text remains quite complex. A philological examination of such aspects as "properties, states and events; character features" in an excerpt from the story of A.P. Chekhov was carried out from the point of view of the TEI markup capabilities for preserving philological knowledge in a machine-readable format. One of the objectives of the Chekhov Digital project is to go beyond a simple digitized text and provide useful digital tools for the researcher. The elements of machine-readable markup are presented, which make it possible to mark up significant entities in Chekhov's texts for organizing semantic search through the corpus of the writer's texts, the problems and research tasks arising in the process of implementing such interdisciplinary projects due to the need to combine the efforts of specialists from different fields of knowledge are considered. The project implements the principle of Open research data, the most important task of which is to create scientific communities around data. The work on the project led to the development of scientific cooperation between researchers of the Higher School of Economics, the UNC RAS and the SFU.


Keywords:

project Chekhov Digital, digital edition, Chekhov, philological expertise, TEI, machine-readable markup, semantic search, expert annotation, author's technologies, natural language processing

This article is automatically translated. You can find original text of the article here.

The Chekhov Digital project is a semantic edition of the Complete Works and letters of A. P. Chekhov in 30 volumes (hereinafter ? PSSiP) [1], developed in the format of the Text Encoding Initiative (TEI/XML) digital publication standard [2]. The academic edition of the texts of the writer's works and letters [1], including early editions and various versions of the text, consists of two series: the texts of the writer's works ? Essays (tt. I-XVIII) and Letters (tt. I-XII), for which semantic machine-readable markup is being developed. The semantic edition of the PSSiP makes it possible to include the writer's texts in a digital cultural context, opens up new opportunities for conducting academic research in a digital format and using literary texts in digital projects and applications [3]. The project is being developed by the Center for Digital Humanities Research of the Institute of Philology Journalism and Intercultural Communication of the Southern Federal University (IFZHIMKK SFU) together with the International Laboratory of Language Convergence of the Higher School of Economics and the Department of Humanities Research of the Southern Scientific Center of the Russian Academy of Sciences of the UNC RAS.

Within the framework of the Chekhov Digital project, a digital resource is being developed that includes not only structural markup based on the TEI/XML digital publication standard, but also markup of significant entities in Chekhov texts, notes and comments, which makes it possible to make documents machine-readable and develop tools for a rather complex semantic search for information in the texts of the writer's works and letters. In the Text Encoding Initiative system, there is an extensive list of tags for marking up a wide variety of information, including external to the text, from the specifics of presenting information in different types of text (play, story, novel or letter) to proper names and social categories such as social status, professional affiliation, etc. of this kind information creates difficulties for automatic text processing technologies, since it can be presented in different ways, but described explicitly using TEI markup reduces the complexity of processing using computer methods. At the same time, any markup is a form of text interpretation, so the digital format is a variant of the interpretation of Chekhov's texts. The texts marked up in this way are placed in open access on the project website (http://chekhov-digital.sfedu.ru /), which additionally presents some digital research tools, such as semantic search, visualization tools, etc..

Based on the work of literary researchers, significant entities were identified for marking up the corpus of the writer's texts - these are "names, titles; dates; seasons; colors; properties, states and events; character traits; natural phenomena; social status; profession; animals / plants; comments; notes" [4]. At the same time, the question of the representation of these entities in the text remains quite complex and requires development. Some meanings have quite distinct lexical forms of representation, however, aspects such as "properties, states and events; character features" require consideration of specific texts both from the point of view of a philological approach and from the point of view of TEI markup capabilities for preserving philological knowledge in a machine-readable format. As an example of this kind of research, we consider an excerpt from A.P. Chekhov's short story "The Death of an Official":

"One fine evening, an equally fine executor, Ivan Dmitrich Chervyakov, was sitting in the second row of chairs and looking through binoculars at the Kornevil Bells. He looked and felt at the height of bliss. But suddenly... In stories, this "but suddenly" is often found. The authors are right: life is so full of surprises! But suddenly his face wrinkled, his eyes rolled up, his breathing stopped... he took the binoculars away from his eyes, bent down and... ahhhh!!! Sneezed, as you can see. Sneezing is not forbidden to anyone anywhere. Men sneeze, and police officers, and sometimes even privy councillors. Everyone sneezes. Chervyakov was not at all embarrassed, wiped himself with a handkerchief and, like a polite person, looked around him: did he bother anyone with his sneezing? But then I had to be embarrassed. He saw that the old man sitting in front of him, in the first row of chairs, was diligently wiping his bald head and neck with a glove and muttering something. In the old man, Chervyakov recognized the state general Brizzhalov, an employee of the Department of railways" [5, 164].

The word "one" in the first sentence is used as an indefinite pronoun (some, some), designed to emphasize the typicality of the situation in the context of the story - it is obvious that this is not the hero's first appearance in the theater, therefore, the pronoun becomes a marker of the properties, characteristics of the situation. The phrase "beautiful evening" is a marker, rather a property of subjective perception than an objective characteristic of the time of day; this is how Worms perceive it. The expression "no less beautiful executioner" is, from a semantic point of view, a complex construction ? this is again the self-assessment of the hero, while the combination of "a wonderful evening" with "a wonderful executioner" expresses the author's irony: "executioner" in combination with two "beautiful" forms a contrast, since the word "executioner", on the one hand, means low social status ("executor M. lat. ? (the executor) is an official at the chancery or a public place where police and economic duties lie" [6, 662]), and on the other hand, it is akin to "execution", that is, punishment, execution.

Already on this example, it is clear that the standard morphological markup will show only the specifics of language forms, therefore, a completely different approach is required for marking semantics, which will take into account the corresponding markers ? properties, characteristics of the situation (typicality); properties of subjective perception – time and profession /social status; author's irony, built on the principle of contrast: profession – beautiful, according to the hero's self-esteem, but giving a low social status, and hinting at the possibility of punishment.

"Ivan Dmitrich Chervyakov" is not just a name, 1) "Ivan" reflects a national stereotype (V. G. Korolenko recalled Chekhov's words about the hero of the play "Ivanov": "Ivan Ivanovich Ivanov. Do you understand? Thousands of Ivanovs... an ordinary man, not a hero at all..." [7, 143]); 2) " Dmitrich" colloquially emphasizes the insignificance of social status; 3) "Worms" in advance, even before the main action, predicts the servility and honor of the hero. The name becomes a property, characterizes the hero as an insignificant, petty person.

However, and this is still the first sentence of the story, Chervyakov is sitting in the second row of chairs, right behind the general. Combinations of entities (action, place, state) form a contrast between the social insignificance of the hero and his high self–esteem - the place acquires the value of a character property. He feels like a significant person. Let us recall that he is an "executor", that is, he has power over his subordinates, but at the same time "Worms", dependent on his superiors, "on one side of being such a person is always in a row of wordlessly trembling, on the other – among the peremptorily ruling the court" [8, 27-28]. The executor of Worms is thus a contextual oxymoron, and the beautiful executor of Worms is an oxymoron raised to a degree.

Using specialized algorithms for extracting named entities (Named Entity Recognition, NER) for the Russian language (for example, the SlovNet library, https://github.com/natasha/slovnet ) makes it possible to automatically mark the last name, first name, patronymic of the hero in the texts, but information about the markers of the national stereotype, social status and character of the hero must be entered additionally. For example, using NER algorithms, character names are extracted from the text, for marking which the following tags can be used: <person xml:id="Worms" subtype="personage"> <persName full_name="Ivan Dmitrievich Chervyakov"> <forename type="first"> Ivan </forename> <forename type="patronym"> Dmitrich </forename> <surname> Chervyakov <forename> </person>.

The markup of the character's name allows you to access its description anywhere in the text using the ref="#Worms" attribute or xml:id="Worms", while the description may include information external to the text, for example, the socio-economic status of the character is marked with the <socecStatus> tag; status/status ? the <state> tag; employment/profession ? <occupation>; events ? <event>; and using the <trait> tag, a property can be specified in which both the character of the hero (for example, "insignificant") and the stereotype of his name are marked. Moreover, these tags with the appropriate attributes can be used both in the description of the hero, marking his universal characteristics (<trait>) throughout the text, and in the specific situation/event described in the text (<state>), which significantly reduces the complexity of automatic text processing. A contextual oxymoron can also be marked in some universal way in the text, for example, with a tag that is used to segment the text and classify elements: <seg type ="oxymoron"> excellent executor, <person xml:id="Worms"> Ivan Dmitrich Worms</person></seg>. Text markup is always some form of its interpretation, so we believe that it is very important to rely on philological expertise in the digital publication of literary texts.

But the first sentence is not over yet. Chervyakov "was looking through binoculars at the Kornevil Bells." Usually the play is "watched", the opera is "listened to", however, the use of the verb "looked" in combination with the preposition "on" and the name of the comic opera instead of the words "play", "performance", etc., organizes the space according to the principle "from top to bottom". That is, Chervyakov, who is "on top of bliss", as if looking down from a height at the "Kornevil bells", is metaphorically located above them, which emphasizes not so much his enjoyment of the performance, as it may seem, as his ambition. So a simple action – "looked at" (essence-action) in conjunction with the name of the comic opera becomes a characterological property of the hero ? ambition (essence-property), for marking which you can use the tag <trait type ="character" key="ambitious">.

The choice of the play that Chervyakov "looked at" is also noteworthy here. The plot of R. Plunket's comic opera is connected with the theme of social elevation: the heroine of the opera in the finale turns out to be the daughter of the former owner of the Kornevil Castle. Her adoptive father is a rich man and a miser. So Chekhov hints at the secret dreams of his hero. And for the researcher, this is another argument in favor of the fact that in a literary work, especially Chekhov's, there is nothing accidental, every detail is functional.

Thus, the first sentences can be marked up as follows using the tags described above: "One fine evening, no less than <seg type ="oxymoron"> a wonderful executor, <trait type ="character" key="insignificant"> <person xml:id="Worms"> Ivan Dmitrich Worms </person> </trait>, </seg> <trait ref="#Worms" type ="character" key="ambitious"> was sitting in the second row of chairs and looking through binoculars at <name xml:id="Q959242"> “Kornevil bells” </name>. He looked and felt on top of bliss </trait>."

This markup uses the identifier (ID) from the Wikidata database, in which Q959242 is the ID of the entry about Robert Plunkett's comic opera "Les cloches de Corneville (Q959242)" (URL: https://www.wikidata.org/wiki/Q959242 ). Such a formalized approach makes it possible at the very beginning of the text to describe the main characteristics of the hero – his insignificance and at the same time ambition, which predetermine the dramatic finale.

The following sentence begins with the opposable conjunction "but". The question arises whether to mark service words when marking. The case being analyzed indicates that it is necessary to do this. The first paragraph of the story is built on the principle of the antithesis: the absence of an event is an event, pleasure and self–admiration are a fall, for which the reader is prepared by the preceding text. The "higher", the "lower", from "executor" to "Chervyakov". Another example of how a place-entity turns into a property-entity. "In stories, this "but suddenly" is often found. The authors are right: life is so full of surprises!" These two sentences reinforce the antithesis, emphasizing the surprise, spontaneity of the event. But they also have other artistic functions: 1) they mark the genre (a story is a narrative about an event, usually unusual, "sudden", going beyond the scope of everyday life), 2) appealing to numerous authors of stories, they point to a genre cliche, 3) they introduce the figure of the author–narrator into the story, signal a change of point of view, thanks to which now Worms from the subject turns into an object, it is no coincidence that Chekhov chooses sneezing as the plot plot – an action that is almost impossible to control. The actor is not the hero himself, but his individual parts: "the face wrinkled", "the eyes rolled up", "the breath stopped" (a completely Gogolian technique, cf. the story of N. V. Gogol "Nose"). Imperfect verbs are replaced by perfect verbs, which demonstrates the speed of changing states. This series ends with the onomatopoeia "apchi!!!", which has no face, gender, etc. at all: the action (essence), it is the same event (essence), is performed by itself, without the participation of the actor. And the hero performs actions aimed at destroying the image created by the first part of the antithesis: he takes the binoculars away from his eyes and bends down, thus carrying out movement (essence-action) from top to bottom, which at the same time is the realization of the metaphor "look down, from top to bottom" (essence-a feature of character).

Automatic morphosyntactic markup of the text allows you to get tokens (word forms), lemmas (source forms), tags of parts of speech and grammatical features for each lemma, as well as a tree of syntactic relations with vertices and dependents (a complete list of relations UD v.2 and their definitions , see https://universaldependencies.org/u/dep/index.html ). The automatic morphological and syntactic analysis program UDPipe 2 was used for markup [9]. In our study, the UDPipe model was used, trained on the data of the syntactic corpus of UD-SynTagRus 2.6 dependencies for the Russian language (SynTagRus, [10]). This corpus is currently a standard Russian-language data set for training modern neural network parsers (UDPipe, Stanford NLP, Turku NLP, DeepPavlov, etc.). At the same time, some design features of the syntax can be used for semantic markup. For example, in the sentences "But suddenly ...", "In stories it is often found but suddenly", the construction "but suddenly" is marked as "ADV (root/conj) + CCONJ (cc)", performing the function of coordination (connection), and "but" is always placed in a dependent position on "suddenly" [11], and in the sentence "But suddenly his face wrinkled, his eyes rolled up, his breathing stopped... he took the binoculars away from his eyes bent down and... ahhhh" this construction disintegrates and each word becomes dependent on the verb "winced", forming predicative connections: "VERB (root) + CCONJ (cc)" ? matching construction (winced + but), and "VERB (root) + ADV (advmod)" ? a construction that changes the meaning of the verb (winced + suddenly). Such changes in the use of constructs can be considered as markers in the change of semantics, however, for use in the automatic markup algorithm, this requires further study. In turn, with the help of automatically obtained tags of parts of speech and grammatical features for each lemma, a pattern of changing states can be marked when verbs of the imperfect form are replaced by verbs of the perfect form. But the artistic functions and metaphor discussed above cannot be marked up automatically, and therefore must be marked up expertly, for example: <seg type ="metaphor"> he took the binoculars away from his eyes, bent down and </seg>. The markup can be supplemented with a specific meaning of the metaphor, but even in this general form it will facilitate the search for metaphors in the text. Onomatopoeia can be noted in the text as follows <vocal who="#Worms"> apchi!!! </vocal>, which will allow you to organize a search for such elements.

Then the narrator enters the narrative again, it is to him that the comment about the spontaneity, the randomness of sneezing belongs: "Sneezed, as you can see. Sneezing is not forbidden to anyone anywhere. Men sneeze, and police officers, and sometimes even privy councillors. Everyone sneezes." The verb "is not forbidden" with a negative particle in an impersonal sentence, where an impersonal force or element is meant by an actor, as A. A. Potebnya believed [12], contrastingly correlates with an "executor" who "scolds", "forbids". Sneezing correlates exclusively with physiology, and not with social status, which Chekhov emphasizes by listing social roles and the amplifying particle "even". However, upon careful reading, it turns out that the social theme breaks through in the fragment "and sometimes even secret advisers", where the combination of "sneezing" with the rank of "secret (adviser)" disavows the idea of sneezing being generally available, since secret advisers sneeze secretly, which is emphasized by the adverb "sometimes" (essence-time) and amplifying the particle "even" (essence-degree of action) is irony. This technique resembles embroidery with a stem stitch: a new stitch begins from the middle of the previous one. So it is with Chekhov: one theme begins "inside" the other and each new "stitch" captures both themes.

This can be seen in the following example: "Chervyakov was not at all embarrassed, wiped himself with a handkerchief and, like a polite person, looked around him: did he bother anyone with his sneezing?" "Not at all embarrassed" negatively correlates with the verb "wiped off", since it means not only "wiped off", but also "got something unpleasant, offensive", cf. the phraseology "wipe your nose", that is, "show superiority". Relatively speaking, physiology has shown superiority over the hero's sense of self, his inner "height", he is forced to metaphorically "get lost". That is, after all, Chervyakov was confused, that's why he checks if he has disturbed anyone by sneezing. The characteristic "polite person" in this case sounds ironic. And the following sentence: "But then I had to be embarrassed," logically, "with a stalk seam," follows from the previous ones, which is facilitated by the repetition of the verb "to be embarrassed" (action–state), now in a direct, not in a negative sense, that is, Worms collapsed from the pedestal on which he erected himself.

With the help of the TEI markup language, we can both mark onomatopoeic elements and the social status of sneezers, which will facilitate the search for such elements: "<vocal who="#Worms"> Sneezed </vocal>, as you can see. <vocal> Sneezing </vocal> is not forbidden to anyone anywhere. <vocal>Sneeze </vocal> and <socecStatus type ="man"> men </socecStatus>, and <socecStatus type ="police chief"> police officers </socecStatus>, and sometimes even <socecStatus type ="privy councilor"> privy councilors </socecStatus>. Everyone <vocal> sneezes </vocal>."

At the same time, the question of the need to save all the elements of philological analysis in the markup format remains open, for example: the metaphor "wiped with a handkerchief" should be marked up for search using the <seg type ="metaphor"> tag, but is it worth marking up the ironic characterization of the character "polite person" by introducing additional tags, especially since there is ambiguity in defining the boundaries of the context in which irony will be recognized by automatic search.

The verb "saw" in the following sentence correlates with the initial "looked at ("Kornevil bells")". But now it's a simple action of a confused person. The point of view shifted back to the hero. He realizes that in front, in the first row (a reference to the second row of Chervyakov), there is an "old man" (not an "old man", it is an "old man" in the eyes of a confused Chervyakov), while the general, and he does not "wipe himself", but "wipes" his bald head and neck and mutters something that Chervyakov takes it personally, that is, again metaphorically "wipes off", even more "embarrassed". The general's surname is also speaking, "it is associated with the verbs "splash", "disgust" or "grumble", "bother", "bother"" [13, 36]. That is, the general is not as harmless as it may seem [see 14]. It can be assumed that the name of the department in which the general serves – "ways of communication" – was chosen by Chekhov not by chance, in addition to the direct one, one can see a figurative meaning in it: ways of communication are ways of communication. It is this communication that will not take place in the story.

This kind of analysis showed that it is necessary to pay attention to the study of verb forms and non-standard forms of nouns ("old man"), the characteristics of which can be automatically extracted from the text using a morphosyntactic parser (see above) and marked up automatically. The general's surname can be extracted from the text using NER algorithms and marked up taking into account his social status "state general": <person xml:id="Brizzhalov" subtype="personage"> <persName> Brizzhalov </persName> <socecStatus type ="state general"> state general </socecStatus> </person>. In this case, the description of the character in the text can be accessed using the ref="#" attribute.Brizzhalov" or xml:id="Brizzhalov".

 

Thus, the first paragraph is in the full sense the beginning of the story: it outlines the main storyline, the characteristics of the characters and predicts a dramatic finale. This fragment is permeated with internal connections, roll calls, similar to those that V. B. Kataev called "dramatic rhymes", forming a "single resonating space" [15, 3-4]. Sneezing in the story turns out to be not only a physiological, but also a social act. A "close reading" of even one fragment reveals the writer's artistic "technology", including the nature of the laconism inherent in his stories.

The complexity of semantic markup is determined by many aspects, including the author's "technology", which includes the presence of figurative meanings in words and constructions; and the impossibility of automatic recognition of irony, which has no verbal means of expression; and the "complexity" of artistic techniques, when irony is supplemented by metaphor, contextual oxymoron, etc.; and the reception of a "stalked seam", when one topic begins "inside" another and each new "stitch" captures both topics; etc. In addition, the "technology" of processing itself has difficulties associated with the automatic selection of entities in the text, the limitation of formal methods of their markup and the markup language itself, as well as the need to attract experts at different stages of markup, which implies that researchers have specialized knowledge in the field of digital philology.

Text markup is always some form of its interpretation, so we believe that in the digital publication of literary texts it is very important to rely on philological expertise, with the help of which the appropriate categories for markup can be identified. For example, our research has shown that it is necessary to classify and mark up such artistic techniques in the text as metaphor or antithesis and oxymoron, identify and mark up Chekhov's "talking" surnames with their properties and characteristics, figurative meanings of the Chekhov text, ironic characteristics of people and situations, and much more.

Thus, the development of expert markup, i.e. the preservation of philological knowledge in machine-readable form, poses a number of tasks for researchers. For example, the implementation of an extended search for semantic publications involves marking up a wider range of significant elements, such as color/ smell/ taste; natural phenomena; space, etc., which will be of interest to philologists-researchers. But the question of the representation of these entities in the text remains quite complex and requires additional research. In addition, there is a danger of a predetermined interpretation that will remain in the texts in machine-readable form, but at the same time it can become the basis of a new type of commentary that combines texts and meanings, complicating and enriching the understanding of the text, making it possible to include it in a broader context: historical, cultural, biographical, etc. Using expert markup for learning machine learning algorithms seems to be a very promising task for marking up other texts, creating applications in various fields of knowledge related to text. And this list is far from being exhausted. It is important to consider each of these issues from the point of view of problems that require their own understanding and solution.

Our task is to raise the question of the complexity of implementing such interdisciplinary projects and the need to combine the efforts of specialists from different fields of knowledge.

References
1. Chekhov, A.P. (1974-1983). Polnoe sobranie sochinenij i pisem: V 30 t. [Complete works and letters: In 30 volumes]. Academy of Sciences of the USSR. Institute of World Literature named by A. M. Gorky. Moscow: Nauka.
2. TEI Consortium, eds. TEI P5: Guidelines for Electronic Text Encoding and Interchange. Version 4.6.0. Last updated on 4th April 2023. TEI Consortium. Retrieved from http://www.tei-c.org/Guidelines/P5/
3. Severina, E.M., Bonch-Osmolovskaya, A.A., & Kudin, A.M. (2022). Digital Philological Practices: the Chekhov Digital Project. Current Issues in Philology and Pedagogical Linguistics, 2, 153-165. Retrieved from https://doi.org/10.29025/2079-6021-2022-2-153-165
4. Severina, E.M, & Larionova, M.Ch (2020). New philological practices: Digital Edition by A. P. Chekhov. Philology: Scientific Research, 10, 13-21. doi:10.7256/2454-0749.2020.10.33970
5. Chekhov, A. P. (1975). Smert' chinovnika [Death of a Government Clerk]. Polnoe sobranie sochinenij i pisem: V 30 t. [Complete works and letters: In 30 volumes]. Sochineniya: V 18 t. [Works: In 18 volumes] V. 2, 164-166. Moscow: Nauka.
6. Dal, V.I. (1989). Explanatory dictionary of the living Great Russian language (Vol. 4). Moscow: Russkiy yazyk.
7. Korolenko, V.G. (1960). Anton Pavlovich Chekhov. In Chekhov in the memories of contemporaries (pp. 135-148). Moscow: Goslitizdat.
8. Berdnikov, G.P. (1984). A.P. Chekhov: Ideological and creative quests. Moscow: Khudozhestvennaya literatura.
9. Straka, M., Straková, J., & Hajič, J. (2019): UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging. In: Proceedings of the 16th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 95-103. Association for Computational Linguistics, Stroudsburg, PA, USA.
10. Dyachenko, P.V., Iomdin, L.L., Lazursky, A.V., Mityushin, L.G., Podlesskaya, O.Yu., Sizov, V.G., Frolova, T.I., & Tsynman, L.L. (2015). A deeply annotated Corpus of Russian Texts (Syntagrus): contemporary state of affairs. Trudy Instituta russkogo yazyka im. V.V. Vinogradova, 6, 272-300.
11. de Marneffe, M.-C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J., & Manning, C. D. (2014). Universal Stanford dependencies: A cross-linguistic typology. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 4585-4592. Reykjavik, Iceland. European Language Resources Association (ELRA). Retrieved from https://nlp.stanford.edu/pubs/USD_LREC14_paper_camera_ready.pdf
12. Potebnya, A. A. (1968). Iz zapisey po russkoy grammatike [From Notes on Russian Grammar: In 4 Volumes. Volume 3]. Moscow: Prosveshchenie.
13. Bolotova, E.A. (2019). Anthropological linguistic mosaic «Speaking Names». The Scientific Heritage, 4, 35-39.
14. Larionova, M. Ch., & Shepeleva, O. A. (2019). What killed Chervyakov? Traditional Culture in A.P. Chekhov’s Story «Death of a Government Clerk» // Proceedings of Southern Federal University. Philology, 1, 36-41. Retrieved from https://doi.org/10.23683/1995-0640-2019-1-36-41
15. Kataev, V.B. (2008). "Steppe": Dramaturgy of Prose. Taganrog Bulletin, Materials of the International Scientific and Practical Conference «"Steppe" by A.P. Chekhov: 120 Years», 3, 3-9. Taganrog: LLC "Publishing House Lukomorie.

Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

Digital text processing at this historical stage is a productive task that orients researchers to pay closer attention to the language. Consequently, the language system is evaluated both formally and meaningfully, which makes it possible to present the mechanisms of functioning of the natural sign paradigm in more detail. The reviewed article is formed as an analytical review of the Chekhov Digital project, which is a semantic edition of the Complete Works and Letters of A. P. Chekhov in 30 volumes (hereinafter PSSiP), developed in the format of the Text Encoding Initiative (TEI/XML) digital publication standard. As noted at the beginning of the work, "the project is being developed by the Center for Digital Humanities Research of the Institute of Philology Journalism and Intercultural Communication of the Southern Federal University (IFZHIMKK SFU) in conjunction with the International Laboratory of Language Convergence of the Higher School of Economics and the Department of Humanitarian Studies of the Southern Scientific Center of the Russian Academy of Sciences of the YUNTS RAS." The author gives a fairly detailed algorithm of the system, describes the nuances, comments on the subtleties: "in the Text Encoding Initiative system, there is an extensive list of tags for marking up a wide variety of information, including external to the text, from the specifics of presenting information in different types of text (play, short story, novel or letter) to proper names and such social categories as social status, professional affiliation, etc. This kind of information creates difficulties for automatic text processing technologies, since it can be presented in different ways, but described explicitly using TEI markup reduces the complexity of processing using computer methods. At the same time, any markup is a form of text interpretation, so the digital format is some kind of interpretation of Chekhov's texts. The texts marked up in this way are publicly available on the project's website (http://chekhov-digital.sfedu.ru /), which additionally presents some digital research tools, such as semantic search, visualization tools, etc." Further, the main process is illustrated using the material of digital processing of A.P. Chekhov's story "Death of an Official". In addition to purely technical links, the analysis of the mechanism is also given. The author tries to fully reveal the essence of the assessment: for example, "already in this example it is clear that the standard morphological markup will show only the specifics of language forms, therefore, a completely different approach is required for marking semantics, which will take into account the corresponding markers of properties, characteristics of the situation (typicality); properties of subjective perception – time and profession / social status; the author's irony, built on the principle of contrast: the profession is beautiful, according to the hero's self–esteem, but it gives a low social status, and hints at the possibility of punishment," or "the use of specialized algorithms for extracting named entities (Named Entity Recognition, NER) for the Russian language (for example, the SlovNet library, https://github.com/natasha/slovnet ) makes it possible to automatically mark up in the last name, first name, patronymic of the hero, but information about the markers of the national stereotype, social status and character of the hero should be added additionally. For example, using NER algorithms, character names are extracted from the text, for which the following tags can be used: Ivan Dmitrich Worms " etc . The main task has been solved in the course of work, and the goal has thus also been achieved. In my opinion, it would be correct to bring out a full analysis of the story, it is not that big, it would be interesting and full-fledged. The markup of the text, as the author of the article notes, is some form of interpretation; yes, one can partially agree with this, take it into account. The material can be used as an example of an analytical procedure for evaluating text markup, digital processing of an utterance. The style of this work tends to the scientific type proper, the terms / concepts are used in a unified way. The work has an inherent internal logic, the logic of an open movement of thoughts. For example, "the complexity of semantic markup is determined by many aspects, including the author's "technology", which includes the presence of figurative meanings in words and constructions; and the inability to automatically recognize irony, which does not have verbal ways of expression; and the "complexity" of artistic techniques, when irony is complemented by metaphor, contextual oxymoron, etc. and the technique of the "stem seam", when one topic begins "inside" another and each new "stitch" captures both topics; etc. In addition, the "technology" of processing itself has difficulties associated with the automatic selection of entities in the text, limiting the formal techniques of their markup and the markup language itself, and It is also necessary to involve experts at different stages of markup, which implies that researchers have specialized knowledge in the field of digital philology." Formally, the results have been summed up, but the author stipulates that this problem should be dealt with further. This, in my opinion, is a very good result: "the development of expert markup, i.e. the preservation of philological knowledge in machine-readable form, poses a number of tasks for researchers. For example, the implementation of an extended search for semantic publications involves marking up a wider range of significant elements, such as color / smell / taste; natural phenomena; space, etc., which will be of interest to philological researchers. But the question of the representation of these entities in the text remains quite complicated and requires additional research." The list of sources is sufficient, citations / references are given taking into account the requirements of the publication. With that said, there is no reason not to allow the text to be published, I recommend the article "The Chekhov Digital Project: tasks and problems of implementing semantic markup of texts (using the example of A. P. Chekhov's story "Death of an official")" for open publication in the magazine "Litera".
Link to this article

You can simply select and copy link from below text field.


Other our sites:
Official Website of NOTA BENE / Aurora Group s.r.o.