Software systems and computational methods - rubric Systems analysis , search, analysis and information filtering

Abstract: The article is devoted to development of the universal corporate information system (IS) of maintenance of stages of the procedure "Advanced Product Quality Planning" (APQP), which would allow to enhance a production process of autocomponents on domestic enterprises and to increase quality of products. The feature of the offered IS is application of functional formulas of the description of the issued autocomponents instead of a product name that will allow to distinguish unambiguously details taking into account their functional and constructive and technological signs. The object of probe is process of information maintenance of stages of planning and design of automobile components for trucks. The subject of research are information communications between processes of the procedure APQP. Research methods: system approach, methodology of the description of business-processes DFD, methods of the system analysis, functional systematization, theory of algorithms. Scientific novelty of a research consists in application of methodology of functional systematization when developing an information system of support of the procedure APQP. Application of functional formulas of the description of details instead of names and codes allows to designate unambiguously the details necessary for the consumer projected, made and used in operation taking into account their key parameters. Thus, in case of receipt of claims, there is possible a fast information search about a detail and the subsequent effective work on improvement of quality of products.

Keywords: functional systematization, Advance planning of quality, Automobile component, Supplier, Maintenance of production processes, Information system, DFD-chart, DBMS, Computer program, taxon

DOI: 10.7256/2454-0714.2018.1.25297

Abstract: In this article the task of clustering operons (special units of genetic information) is solved. The authors describe its use for the identification of groups of operons with similar functions. The specifics of the open bases of operons used as sources of initial data for the study are considered. The authors describe the selection and preparation of data for clustering, the features of the clustering process, and its relationship with the approaches traditionally used for the analysis of natural languages. Based on the clustering performed, the quality and composition of the obtained groups is analyzed. To convert the raw data into vectors, the classical implementation of the word2vec algorithm and a number of features of the original data are used. The resulting representation is clustered by the DBScan algorithm based on the cosine distance. The novelty of the proposed method is associated with the use of non-standard algorithms for the initial data. The approach used effectively manifests itself when working with a large amount of data, does not require additional data markup and independently forms factors for clustering. The obtained results show the possibility of using the proposed approach for the implementation of services that allow comparative analysis of bacterial genomes.

Keywords: clustering, DBScan, word embeddings, word2vec, machine learning, methods, algorithms, operons, natural language processing, open access databases

DOI: 10.7256/2454-0714.2023.4.68680

Abstract: The subject of the research in this article are the features and regularities of the functioning of systems based on ChatGPT technologies, the knowledge of which makes it possible to formulate appropriate modifications of the Turing test, as well as the features and regularities of the formulation and use of the Turing test for systems based on ChatGPT technologies. The purpose of the study is to identify the features and patterns of functioning of systems based on the technologies of ChatGPT, as well as the features and patterns of formulation and use of the Turing test for systems based on the technologies of Chat GPT. As research methods, the method of social experiment was used, when during the study of a system based on Chat GPT technologies, certain questions were asked, answers were received, the analysis of which allowed us to conclude about the features of the "thinking" of systems based on ChatGPT technologies. In the course of the study, the following was found. Unlike human thinking, which is based on certain facts, the "thinking" of systems based on ChatGPT technologies, in some cases is not based on facts that take place in reality, often the user is given deliberately false information about facts and circumstances that take place in reality. In contrast to human thinking, which is usually systemic in nature, the "thinking" of systems based on ChatGPT technologies is disorderly and fragmentary. Systems based on ChatGPT technologies cannot admit their mistakes, and attempts to force systems based on ChatGPT technologies to critically comprehend their answers lead to a malfunction of these systems. The article also provides a Turing test developed by the author for ChatGPT, which made it possible to identify the features of the "thinking" of systems based on ChatGPT technologies.

Keywords: patterns, reflection, critical analysis, Chat GPT, artificial intelligence, technology, systems thinking, human thinking, Turing Test, computer science

DOI: 10.7256/2454-0714.2024.1.69900

Abstract: The study is dedicated to analyzing methods and tools for optimizing the performance of high-load systems using cloud, fog, and edge technologies. The focus is on understanding the concept of high-load systems, identifying the main reasons for increased load on such systems, and studying the dependency of the load on the system's scalability, number of users, and volume of processed data. The introduction of these technologies implies the creation of a multi-level topological structure that facilitates the efficient operation of distributed corporate systems and computing networks. Modern approaches to load management are considered, the main factors affecting performance are investigated, and an optimization model is proposed that ensures a high level of system efficiency and resilience to peak loads while ensuring continuity and quality of service for end-users. The methodology is based on a comprehensive approach, including the analysis of existing problems and the proposal of innovative solutions for optimization, the application of architectural solutions based on IoT, cloud, fog, and edge computing to improve performance and reduce delays in high-load systems. The scientific novelty of this work lies in the development of a unique multi-level topological structure capable of integrating cloud, fog, and edge computing to optimize high-load systems. This structure allows for improved performance, reduced delays, and effective system scaling while addressing the challenges of managing large data volumes and servicing multiple requests simultaneously. The conclusions of the study highlight the significant potential of IoT technology in improving production processes, demonstrating how the integration of modern technological solutions can contribute to increased productivity, product quality, and risk management.

Keywords: Technology integration, Internet of Things, Scalability, Performance optimization, Edge computing, Fog computing, Cloud computing, High-load systems, Data management, Service continuity

DOI: 10.7256/2454-0714.2020.1.31728

Abstract: The paper presents the results of evaluating the informative value of quantitative and binary signs to solve the problem of finding semantically close sentences (paraphrases). Three types of signs are considered in the article: those built on vector representations of words (according to the Word2Vec model), based on the extraction of numbers and structured information and reflecting the quantitative characteristics of the text. As indicators of information content, the percentage of paraphrases among examples with a characteristic, and the percentage of paraphrases with a attribute (for binary characteristics), as well as estimates using the accumulated frequency method (for quantitative indicators) are used. The assessment was conducted on the Russian paraphrase corps. The set of features considered in the work was tested as input for two machine learning models for defining semantically close sentences: reference vector machines (SVMs) and a recurrent neural network model. The first model accepts only the considered set of signs as input parameters, the second - the text in the form of sequences and the set of signs as an additional input. The quality of the models was 67.06% (F-measure) and 69.49% (accuracy) and 79.85% (F-measure) and 74.16% (accuracy), respectively. The result obtained in the work is comparable with the best results of the systems presented in 2017 at the competition for the definition of paraphrase for the Russian language (the second result for the F-measure, the third result for accuracy). The results proposed in the work can be used both in the implementation of search models for semantically close fragments of texts in natural language, and for the analysis of Russian-language paraphrases from the point of view of computer linguistics.

Keywords: accumulated frequencies, feature informativeness, support vector machine, neural network, paraphrase detection, text classification, semantic similarity, statistical evaluation, feature selection, machine learning

DOI: 10.7256/2454-0714.2023.4.68973

Abstract: The subject of this study are investment design management tools that allow evaluating the effectiveness of investment project options. The object of the study is digital products (software solutions) designed for automated efficiency assessment and selection of attractive projects for investment. The author has determined that the choice of the most profitable project for investment is the key task of the pre-investment stage of the investment process, while the presence of a large amount of information, the influence of external and internal weakly controlled factors, the state of uncertainty accompanying the investment process actualize the use of software products. Particular attention is paid to identifying and formalizing requirements for software products for risk analysis and evaluation of the effectiveness of investment projects, which will save time and financial resources and eliminate the influence of the human factor on the choice of a project for investment. The research methodology includes the use of a systematic approach to identifying tools and indicators for evaluating the effectiveness of investment decisions. The author conducted a comparative analysis of software products that act as tools for assessing the attractiveness of investment projects when choosing the most acceptable for the development of commercial activities of industrial enterprises. The results of the comparative analysis of domestic and foreign investment project management software solutions presented on the technology market made it possible to systematize programs, identify their strengths and weaknesses and formulate requirements for an optimal software package for analyzing and evaluating the effectiveness of investment projects of an industrial enterprise. The digital solution developed by the author for risk analysis and evaluation of the effectiveness of investment projects should have the following characteristics: functionality, reliability and stability, interface and usability, compatibility, price and licensing conditions, technical support. Based on the results of the study, the scope of application of the results of the comparative analysis of software products is determined – the further development of digital solutions for evaluating the effectiveness of investment projects, ensuring the effectiveness of the management process of investment design of enterprises.

Keywords: disadvantages of the software product, advantages of the software product, program characteristics, selection of investment projects, software product, efficiency evaluation methods, performance evaluation indicators, comparative analysis, system approach, Investment project

DOI: 10.7256/2454-0714.2020.3.33789

Abstract: The author considers a problem of automatic synthesis (induction) of the rules for transforming the natural language formulation of the problem into a semantic model of the problem. According to this model a program that solves this problem can be generated. The problem is considered in relation to the system of generation, recognition and transformation of programs PGEN ++. Based on the analysis of literary sources, a combined approach was chosen to solve this problem, within which the rules for transforming the natural language formulation into a semantic model of the problem are generated automatically, and the specifications of the generating classes and the rules for generating a program from the model are written manually by a specialist in a specific subject area. Within the framework of object-event models, for the first time, a mechanism for the automatic generation of recognizing scripts and related entities (CSV tables, XPath functions) was proposed. Generation is based on the analysis of the training sample, which includes sentences describing objects in the subject area, in combination with instances of such objects. The analysis is performed by searching for unique keywords and characteristic grammatical relationships, followed by the application of simple eliminative-inducing schemes. A mechanism for the automatic generation of rules for replenishing / completing the primary recognized models to full meaning ones is also proposed. Such generation is performed by analyzing the relations between the objects of the training sample, taking into account information from the specifications of the classes of the subject area. The proposed schemes have been tested on the subject area "Simple vector data processing", the successful transformation of natural language statements (both included in the training set and modified) into semantic models with the subsequent generation of programs solving the assigned tasks is shown.

Keywords: XPath-functions, domain-specific language, grammar parsing, training sample, eliminative induction, rules induction, code synthesis, program generation, natural-language processing, regular expressions

DOI: 10.7256/2454-0714.2017.1.20372

Abstract: In robotics complexes as in the living world, the main kind of sensation is vision. The first systems of technical (computer) vision used in the means of robotics copied the organs of vision of living organisms and have evolved in a following sequence: black and white monocular vision systems, color, stereoscopic and multi-angle systems with various options for hardware implementation. The article points out some aspects of the historical course of events in development of vision systems. The author analyses existing approaches to the implementation of vision systems in the tasks of robotics. The paper presents the formulation of the problem, describes the problems of designing vision systems and development of robotic complexes, shows modern ways of solving these problems. Methods of epistemology, epistemology and system analysis are applied. The issue of universal robotics in recent years is in the focus of research attention. There are many works devoted to this problem. However, the author decided to study this problem from different perspectives, applying not only philosophical and epistemological aspects, but also from the point of view of a systematic approach to the problem. A special contribution of the author in the study of the topic is the general systematization of the actual problems of introducing vision systems into robotic complexes and the definition of the main functions and purpose of such systems.

Keywords: appointment of vision systems, functions of vision systems, machine vision, vision systems, gnoseological aspects, robotic systems, system analysis, posing problems, control and processing, 3D

DOI: 10.7256/2454-0714.2020.3.33958

Abstract: The subject of the research is the process of collecting and preliminary preparation of data from heterogeneous sources. Economic information is heterogeneous and semi-structured or unstructured in nature. Due to the heterogeneity of the primary documents, as well as the human factor, the initial statistical data may contain a large amount of noise, as well as records, the automatic processing of which may be very difficult. This makes preprocessing dynamic input data an important precondition for discovering meaningful patterns and domain knowledge, and making the research topic relevant.Data preprocessing is a series of unique tasks that have led to the emergence of various algorithms and heuristic methods for solving preprocessing tasks such as merge and cleanup, identification of variablesIn this work, a preprocessing algorithm is formulated that allows you to bring together into a single database and structure information on time series from different sources. The key modification of the preprocessing method proposed by the authors is the technology of automated data integration.The technology proposed by the authors involves the combined use of methods for constructing a fuzzy time series and machine lexical comparison on the thesaurus network, as well as the use of a universal database built using the MIVAR concept.The preprocessing algorithm forms a single data model with the ability to transform the periodicity and semantics of the data set and integrate data that can come from various sources into a single information bank.

Keywords: time series, MIVAR, fuzzy time series, forecasting, clearing data, data unification, preprocessing, machine learning, data mining, knowledge base

DOI: 10.7256/2454-0714.2021.3.36564

Abstract: Monitoring and analysis of consumption of energy resources in various contexts, as well as measuring of parameters (indicators) in time are of utmost importance for the modern economy. This work is dedicated to examination and interpretation of the anomalies of collecting data on consumption of energy resources (on the example of gas consumption) in the municipal formation. Gas consumption is important for the socioeconomic sphere of cities. Unauthorized connections are the key reason for non-technological waste of the resource. The traditional methods of detection of stealing of gas are ineffective and time-consuming. The modern technologies of data analysis would allow detecting and interpreting the anomalies of consumption, as well as forming the lists for checking the objects for unauthorized connections. The author’s special contribution lies in application of the set of statistical methods aimed at processing and identification of anomalies in energy consumption of a municipal formation. It is worth noting that the use of such technologies requires the development of effective algorithms and implementation of automation and machine learning algorithms. The new perspective upon time-series data facilitates identification of anomalies, optimization of decision-making, etc. These processes can be automated. The presented methodology tested on time-series data that describes the consumption of gas can be used for a broader range of tasks. The research can be combined with the methods of knowledge discovery and deep learning algorithms.

Keywords: average, smoothing, municipality, gas consumption, energy consumption, search for anomalies, statistical analysis, unauthorized consumption, gas accounting, optimization

DOI: 10.7256/2454-0714.2020.4.34286

Abstract: In recent decades, the potential of analytics and data mining – the methodologies that extract valuable information from big data, transformed multiple fields of scientific research. Analytics has become a trend. With regards to education, these methodologies are called the learning analytics (LA) and educational data mining (EDM). Latterly, the use of learning analytics has proliferated due to four main factors: a significant increase in data quantity, improved data formats, achievements in the area of computer science, and higher complexity of available analytical tools. This article is dedicated to the description of building the model of decision support system (DSS) of a university based on educational data acquired from digital information and educational environment. The subject of this research is the development of DSS with application of learning analytics methods. The article provides a conceptual model of decision-making system in the educational process, as well as a conceptual model of the components of DSS component – forecasting subsystem. The peculiarity of forecasting subsystem model implies usage of learning analytics methods with regards to data sets of a higher educational institution, which contain the results of work of the digital information and educational environment, and include the characteristics of student activity. The main results of the conducted research is the examined and selected methods of clusterization and classification (KNN), the testing of which demonstrated palatable results. The author examined various methods of clusterization, among which k-prototypes method showed best results. The conclusion is made on favorable potential of application of the methods of learning analytics in Russian universities.

Keywords: risk group, forecasting methods, progress forecasting, electronic information and educational environment, educational analytics, classification, clustering, decision support, student progress, artificial intelligence

DOI: 10.7256/2454-0714.2017.4.24534

Abstract: The problem of the normalization of frequency-time correlation functions is considered and solved. The aim of the work is to create a methodology for calculating the coefficients for normalizing the time-frequency correlation functions and integrating them into the known computational algorithm. At the same time, the tasks were to ensure the possibility of normalizing each frequency component of the time-frequency correlation function independently and maintaining high performance of the original algorithm. The latter imposed restrictions on the application of the filtering operation in the time domain and the use of additional discrete Fourier transforms in the algorithm. To minimize the computational costs in calculating and rationing the time-frequency correlation functions, a technique was developed for calculating the normalizing coefficients from the samples of the complex signal spectrum. The main result of the work is a new algorithm for calculating the normalized time-frequency correlation function, which differs by an insignificant increase in computational complexity in comparison with the original algorithm. At the same time, the coefficients obtained can be used both for simultaneous normalization of all the frequency components of the time-frequency correlation function, which is necessary to ensure the independence of the result from the scale of the input signals, and for independent normalization of each of its frequency components. The latter is useful in solving problems of detecting weak correlated components in signal mixtures.

Keywords: computational scheme, correlator, normalization, spectral analysis, digital signal processing, correlation functions, time-frequency analysis, fast Fourier transform, signal detection, correlogram

DOI: 10.7256/2454-0714.2022.3.38841

Abstract: One of the main tasks in the management of technological processes is to reduce emergencies and failures of existing equipment. The statistical data obtained during the operation of machines and mechanisms require appropriate mathematical processing to analyze the dynamics of technological processes and establish relationships between deviations, influencing factors and failures. Regression and cluster analyses are convenient tools for processing these data. The failures of cavitation systems are an essential, and at the same time poorly illuminated topic in scientific periodicals. Cavitators are relatively common technical devices that allow maintaining the technological parameters of fuel oil in tank farms at the required level (viscosity, water content, adhesive properties). The practice of using cavitators on fuel oil farms of thermal power plants in the Kaliningrad region shows that these technical devices can fail relatively often. So, in case of disconnection or restriction of the supply of the required volumes of gas to the thermal power plant, reserves of fuel oil from the fuel park can be used. In turn, the failure of the cavitation system may lead to the impossibility of entering reserve fuel and, as a consequence, to the shutdown of power generation. Thus, the problem of ensuring energy security and the reliability of cavitation systems are closely interrelated. In this study, an array of accumulated statistical information on the parameters of the functioning of cavitators in fuel oil farms and the moments of failure is analyzed. Regression and cluster analyses were used to process the data array, which made it possible to determine the relationship between the types of failures and the influencing factors and to rank the weight of factors according to the degree of their impact on cavitation equipment. Based on the results of mathematical processing and data analysis, proposals have been developed to ensure greater technical reliability of cavitators, reorganize their maintenance system and reduce the number of failures.

Keywords: equipment reliability, equipment failures, analysis of statistical data, cavitation equipment, oil and gas equipment, k-clustering method, least squares method, cluster analysis, regression analysis, wear and tear

DOI: 10.7256/2454-0714.2017.4.24655

Abstract: The article explores the applicability of long short-term memory (LSTM) recurrent networks for the binary classification of text messages of the social network Twitter. A three-stage classification process has been designed, allowing a separate analysis of pictograms and verification of the text for neutrality. The accuracy of the classification of the emotional polarity of text messages using the LSTM network and vector representations of words was verified. The percentage of coincidences of vector representations of words with a training set of data is determined, which makes it possible to obtain an acceptable classification accuracy. The estimation of the learning speed of the LSTM network and the use of memory was carried out. To solve the task of classifying text messages, methods of processing natural language and machine learning using precedents are applied. The algorithmic base for processing text data from social networks, obtained as a result of the application of LSTM neural networks, has been optimized. The novelty of the proposed solution method is due to the implementation of pre-processing of messages, which allows to improve the accuracy of classification, and the use of the neural network configuration taking into account the specifics of text data of social networks.

Keywords: Twitter, word embeddings, social networks, LSTM networks, sentiment analysis, natural language processing, recurrent neural networks, text data preprocessing, reccurent network, binary classification

DOI: 10.7256/2454-0714.2015.1.14119

Abstract: The author studies the early stages of the software life cycle on the quality of which the quality of the result of software development essentially depends. An analysis of the experience of the practical application of the widely used waterfall (cascade), iterative and incremental models of software life cycle showed that they do not fully meet the needs of the practices. However, there is a possibility of the synthesis of a new model of the life cycle of software that combines all of these three models. The research methodology is based on the models of the software life cycle, structured systems analysis, software engineering, and information and logical modeling. The main conclusions of the study lies in the model of the software life cycle (for the stage of its development), presented in notation UML-diagrams, which consists of the stages of initiation of development, setting the increment, execution of the increment and completion of the development. Practical implementation of the developed model provides a reduction in the time required for software development and preparing the necessary project documentation.

Keywords: life cycle stages, structured systems analysis, software development, life cycle model, information and logical modeling, software engineering, program life cycle, software, software tools, system engineering

DOI: 10.7256/2454-0714.2015.1.66221

DOI: 10.7256/2454-0714.2024.1.69794

Abstract: The study addresses the crucial topic of designing and implementing smart systems in agricultural production, focusing on the development of a "Smart Greenhouse" utilizing neural networks. It thoroughly examines key technological innovations and their role in sustainable agriculture, emphasizing the collection, processing, and analysis of data to enhance plant growth conditions. The research highlights the efficiency of resource use, management of humidity, temperature, carbon dioxide levels, and lighting, as well as the automation of irrigation and fertilization. Special attention is given to developing adaptive algorithms for predicting optimal conditions that increase crop yield and quality while reducing environmental impact and costs. This opens new avenues for the sustainable development of the agricultural sector, promoting more efficient and environmentally friendly farming practices. Utilizing a literature review, comparative analysis of existing solutions, and neural network simulations for predicting optimal growing conditions, the study makes a significant contribution to applying artificial intelligence for greenhouse microclimate management. It explores the potential of AI in predicting and optimizing growing conditions, potentially leading to revolutionary changes in agriculture. The research identifies scientific innovations, including the development and testing of predictive algorithms that adapt to changing external conditions, maximizing productivity with minimal resource expenditure. The findings emphasize the importance of further studying and implementing smart systems in agriculture, highlighting their potential to increase yield and improve product quality while reducing environmental impact. In conclusion, the article assesses the prospects of neural networks in the agricultural sector and explores possible directions for the further development of "Smart Greenhouses".

Keywords: neural network, optimization algorithm, microclimate, hybrid neural network, deep learning, biotechnology, greenhouse, internet of things, Smart Greenhouse, Analysis of approaches

DOI: 10.7256/2454-0714.2018.4.28367

Abstract: With the proliferation of online forms design services for online surveys, the number of researchers using questionnaires in research practice has increased significantly. One of the problems that was present in the traditional form of the survey on paper and was transferred to online surveys is the problem of data reliability. Most researchers using online surveys have higher queries to automate research. They are not ready to make significant efforts to increase the reliability of the data. In this paper, we propose to consider an algorithm for automating the process of analyzing the reliability of questionnaire data. The proposed algorithm is based on the use of a sliding exam procedure for testing individual multidimensional observations obtained during an online survey. The main hypothesis underlying the developed method consists in the fact that the subordination of the questionnaire questions to some general topic leads to some latent connections between answers that are violated by random answers. A multi-dimensional statistical criterion was developed for testing personal data. The method is very simple to use and is available even for not sophisticated researchers.

Keywords: Internet service, nominal features, computer technology, sliding exam, criterion fquality of data, latent connections, multivariate statistical methods, online survey, data quality, questionnaire

DOI: 10.7256/2454-0714.2018.4.28301

Abstract: The subject of the research is methods and approaches for solving a class of key player search tasks (key player problem) applicable for identifying expert users in a certain subject area on social networks; a model for building social graphs from data selected from a social network; methods for constructing weighted oriented random graphs for model experiments and their comparative analysis; methods of cluster analysis of the ranking results of social network users; comparative analysis of various results of the identification of expert users in a given subject area. The research methods used in this work are based on system analysis methods, cluster analysis tools, graph theory, and social network analysis methods. To assess the performance of the proposed method, model experiments were carried out using a computer and experiments on real data. In the process of software implementation of the service, the methods of the theory of algorithms, the theory of data structures, and object-oriented programming were used to demonstrate the operability of the method. A method has been developed for identifying expert users on social networks in a given subject area, taking into account the quantitative data on the activity of these users. Unlike existing methods, users of a social graph can be ranked using two or more effective methods, which allows them to take advantage of these methods, and the method itself provides an opportunity to obtain additional information about users who are influenced by expert leaders, as well as potential hidden leaders public opinion.

Keywords: affinity propagation, Kendall-Wei ranking, Borgatti measure, opinion leader, social network, cluster analysis, directed graph, key players, social graph, user identification

DOI: 10.7256/2454-0714.2019.4.31797

Abstract: The aim of the study is to prepare for the analysis of poorly structured source data, their analysis, the study of the influence of data "pollution" on the results of regression analysis. The task of structuring data, preparing them for a qualitative analysis is a unique task for each specific set of source data and cannot be solved using a general algorithm, it will always have its own characteristics. The problems that may cause difficulties when working (analysis, processing, search) with poorly structured data are considered. Examples of poorly structured data and structured data that are used in the preparation of data for analysis are given. These algorithms for preparing weakly structured data for analysis are considered and described. The cleaning and analysis procedures on the data set were carried out. Four regression models were constructed and compared. As a result, the following conclusions were formulated: Exclusion from the analysis of various kinds of suspicious observations can drastically reduce the size of the population and lead to an unreasonable decrease in variation. At the same time, such an approach would be completely unacceptable if, as a result, important objects of observation are excluded from the analysis and the integrity of the population is violated. The quality of the constructed model may deteriorate in the presence of abnormal values, but may also improve due to them.

Keywords: statistics, big data, linear model, anomaly detection, regression analysis, data cleaning, semi-structured data, modelling, noise removal, econometrics

DOI: 10.7256/2454-0714.2014.2.11705

DOI: 10.7256/2454-0714.2014.2.65263

Abstract: the article presents a review of formal methods of text attribution. The problem of determining the authorship of texts is present in different field and is important for philologists, literary critics, historians, lawyers. In solving the problem of text attribution the main interest and the main complexity is in the analysis of syntactic, lexical/idiomatic and stylistic levels of text. In a sense, a narrower task is in the text sentiment-analysis (defining the tone of the text). Techniques for solving the task can be useful for identifying authorship of the text. Unfortunately, expert analysis of author’s style is complex and time consuming. It’s desirable to find new approaches, allowing at least partially automate experts’ work. Therefore the article pays special attention exactly to the formal methods of author’s identification and software implementation of such methods. Currently, algorithms of data compression, methods of mathematical statistics, probability theory, neural networks algorithms and cluster analysis algorithms are applied for text attribution. The article describes the most popular software systems for author’s style identification for Russian language. Author attempts to make a comparative analysis, identify features and drawbacks of the reviews approaches. Among the problems hindering researches in text attribution there are a problem of selecting linguostylistic parameters of the text and a problem of selecting sample texts. The author states that there is a need in further researches, aimed at finding new or improving existing methods of texts attribution, at finding new characteristics allowing to clearly separate author’s style, including cases of short texts and small number of sample texts.

Keywords: text attribution, defining authorship, formal text parameters, author’s style, text classification, machine learning, statistical analysis, computer linguistics, identification of author’s style, analysis of textual information

DOI: 10.7256/2454-0714.2015.2.15170

Abstract: The subject of the study is an automated functional state bioradiolokation control of pilot to ensure accountability of such an assessment in the process of operation of the aircraft in real time. The developed methods and algorithms of data processing for the first time enable the correct mathematical treatment of physiological signals that are not limited to the pilot motor activity in their registration by taking into account nonstationarity and intraspecific variability in processing signals. This, in turn, significantly increase the potential for non-contact monitoring of the status of the pilot. Methods used in the research: a systematic analysis, filtering radio signals, wavelet analysis, artificial neural networks, mathematical cybernetics, pattern recognition. The developed technique of monitoring of bioradiolokation of condition of pilot enables the diagnostics of dangerous (for reliable professional activity) states in terms of the activity of the cardiovascular and respiratory system. Implementing the results of the study provides the possibility of increasing the reliability of operation of aircraft through the use of non-contact monitoring of the state of flight personnel in order to improve the reliability of their professional activities and account evaluation of the current state circuits control the aircraft.

Keywords: monitoring of the state human, bioradiolokation monitoring, wavelet analysis of signals, artificial neural network, loop control of the aircraft, mathematical cybernetics, medical informatics, ergatic management system, pattern of signal, automated signal processing

DOI: 10.7256/2454-0714.2015.2.67102

Keywords: medical informatics, mathematical cybernetics, loop control of the aircraft, artificial neural network, wavelet analysis of signals, bioradiolokation monitoring, monitoring of the state human, ergatic management system, pattern of signal, automated signal processing

DOI: 10.7256/2454-0714.2013.3.63833

Abstract: The author suggests an approach to formalization and presentation of collective expertise knowledge in a form of conceptual models, based on a functional-oriented technology. Implementation of the conceptual model in form of base of knowledge enables the ability to use expertise knowledge autonomously in solving the tasks of a simulation model structure synthesis. A mechanism of simulation model generation is in consequent appliance of formal rules of base of knowledge to its declarative data. The base of knowledge contains three groups of inference procedures. The first group of procedures provide the selection of the declarative knowledge needed to solve specific task (specific simulation model synthesis) from the entire base. The second group of procedures generates the composition and structure of the simulation model. The third group of procedures forms the informational connections in the synthesized model. Appliance of the formal-oriented approach at the stage of formalization of expertise knowledge provides adequacy of the structure of synthesized model to the tasks of the subject area and can significantly improve efficiency of the use of expertise knowledge in the modeling and study of complex systems.

Keywords: expert knowledge, conceptual model, base of knowledge, formal synthesis, rules of inference, algorithm, structure, simulation model, system dynamics, complex system

DOI: 10.7256/2454-0714.2016.4.21438

Abstract: This article reviews the problems of automatic processing of web content. Since the speed of obsolescence of information in the global network is very high, the problem of prompt extraction of the necessary data from the Internet becomes more urgent. The research focuses on the web resources that contain text, unadapted to the automated processing. The subject of the research is a set of software and methods. A particular attention is paid to the categorization of ads placed on specialized websites. The authors also review practical aspects of the development of a universal architecture of information-gathering systems. The following methods were used during this study: analytical review of the main principles of development of systems of automated information gathering and analysis of natural languages. For obtaining practice-oriented methods of synthesis and analysis results were used. A special contribution of the authors of the study is in developing an automated system for collecting, processing and classification of the information contained on the web-site. The novelty of the research is to use a new approach to solve this problem by taking into account the semantics and structure characteristic for specific sites. The main conclusions of the study are the applicability and effectiveness of the classification method for solving this problem.

Keywords: machine learning, web robots, information collection, classification system, web-sites categorization, text analisis, parsing, data processing, crawling, big data

DOI: 10.7256/2454-0714.2016.4.68455