Journal Menu
> Issues > Rubrics > About journal > Authors > About the Journal > Requirements for publication > Council of Editors > Peer-review process > Peer-review in 24 hours: How do we do it? > Policy of publication. Aims & Scope. > Article retraction > Ethics > Copyright & Licensing Policy > Publication in 72 hours: How do we do it? > Digital archiving policy > Open Access Policy > Open access publishing costs > Article Identification Policy > Plagiarism check policy
Journals in science databases
About the Journal

72 - !
. 72 DOI .
MAIN PAGE > Back to contents
Cybernetics and programming

Detection methods for web resources automated data collection
Menshchikov Alexander Alexeevich

graduate student, Saint Petersburg State University of Information Technologies

197101, Russia, Sankt-Peterburg, g. Saint Petersburg, Kronverkskii Prospekt, 49


Gatchin Yurii

Doctor of Technical Science

Professor, Saint Petersburg State University of Information Technologies

197101, Russia, Sankt-Peterburg, Kronverkskii Prospekt, 49




The article deals with the problem of automated data collection from web-resources. The authors present a classification of detection methods taking into account modern approaches. The article shows an analysis of existing methods for detection and countering web robots. The authors study the possibilities and limitations of combining methods. To date, there is no open system of web robots detection that would be suitable for use in real conditions. Therefore the development of an integrated system, that would include a variety of methods, techniques and approaches, is an urgent task. To solve this problem the authors developed a software product – prototype of such detection system. The system was tested on real data. The theoretical significance of this study is in the development of the current trend in the domestic segment, making a system of web robots detection based on the latest methods and the improvement of global best practices. Applied significance is in creation of a database for the development of demanded and promising software.

Keywords: web-robots, information gathering, parsing, web robot detection, web security, information security, information protection, intrusion detection, intrusion prevention, weblogs analysis



Article was received:


Review date:


Publish date:


This article written in Russian. You can find full text of article in Russian here .

Otchet kompanii scrapesentry [Elektronnyi recurs]. Rezhim dostupa:, svobodnyi (data obrashcheniya: 08.10.2015).
I.A. Adegbola, R.G. Jimoh Spambot Detection: A Review of Techniques and Trends // International Journal of Applied Information Systems. 2014. V.6(9).
Otchet kompanii distil networks [Elektronnyi recurs]. Rezhim dostupa:, svobodnyi (data obrashcheniya: 08.10.2015).
MC. Calzarossa, L. Massari, D. Tessera An extensive study of Web robots traffic // Proceedings of International Conference on Information Integration and Web-based Applications & Services. 2013.
Menshchikov A. A., Shniperov A. N. Metod skrytogo vstraivaniya informatsii v vektornye izobrazheniya// Doklady TUSUR . 2015. 1 (35). S.100-106.
Robots Exclusion Protocol Guide [Elektronnyi recurs]. Rezhim dostupa:, svobodnyi (data obrashcheniya: 08.10.2015).
V. Almeida, D. A. Menasce, R. Riedi, F. P. Ribeiro, R. Fonseca, W. Meira, Jr. Analyzing Web robots and their impact on caching // Proc. Sixth Workshop on Web Caching and Content Distribution. 2001. P.299310.
D. Derek, S. Gokhale Web robot detection techniques: overview and limitations // Data Mining and Knowledge Discovery. 2011. V.22(1). P.183210.
T. Pang-Ning, K. Vipin Discovery of Web Robot Sessions Based On their Navigational Patterns // Data Mining and Knowledge Discovery. 2002. V.6(1). P.935.
D. Derek, S. Gokhale A Classification Framework for Web Robots // Journal of American Society of Information Science and Technology. 2012. V.63. P.25492554.
D. Derek, S. Gokhale Discovering New Trends in Web Robot Traffic Through Functional Classification // Proc. IEEE International Symposium on Network Computing and Applications. 2008. P.275278.
J. Lee, S. Cha, D. Lee, H. Lee Classification of web robots: An empirical study based on over one billion requests // Computers and security. 2009. V.28. P.795802.
B. Quan, X. Gang, Z. Yong, H. Longtao Analysis and Detection of Bogus Behavior in Web Crawler Measurement // Procedia Computer Science. 2014. V.31. P.10841091.
D. Derek, S. Gokhale Detecting Web Robots Using Resource Request Patterns // Procceeding 11th International Conference on Machine Learning and Applications. 2012. V.1. P.712.
D. Derek, K. Morillo, S. Gokhale A comparison of Web robot and human requests // Advances in Social Networks Analysis and Mining. 2013. P.13741380.
S. Kwon, YG. Kim, S. Cha Web robot detection based on pattern-matching technique // Journal of Information Science. 2012. V.38(2). P.118126.
G. Jacob, E. Kirda, C. Kruegel, G. Vigna PUB CRAWL: Protecting Users and Businesses from CRAWLers // Proceeding Security'12 Proceedings of the 21st USENIX conference on Security symposium. 2012. P.2536.
TH. Sardar, Z. Ansari Detection and Confirmation of Web Robot Requests for Cleaning the Voluminous Web Log Data // Proceeding International Conference on the IMpact of E-Technology on US. 2014. V.28. P.795802.
DS. Sisodia, S. Verma, OP. Vyas Agglomerative Approach for Identification and Elimination of Web Robots from Web Server Logs to Extract Knowledge about Actual Visitors // Journal of Data Analysis and Information Processing. 2015. V.3. P.110.
BT. Loo, O. Cooper, S. Krishnamurthy Distributed Web Crawling over DHTs // University of California, Berkeley Department of Electrical Engineering and Computer Sciences. 2004.
Gatchin Yu.A. Teoriya informatsionnoi bezopasnosti i metodologiya zashchity informatsii/Yu.A. Gatchin, V.V. Sukhostat.-SPb.: SPbGU ITMO, 2010.-98 s.
Korobeinikov A.G., Kutuzov I.M., Kolesnikov P.Yu. Analiz metodov obfuskatsii // Kibernetika i programmirovanie.-2012.-1.-C. 31-37. URL:
Korobeinikov A.G., Grishentsev A.Yu. Uvelichenie skorosti skhodimosti metoda konechnykh raznostei na osnove ispol'zovaniya promezhutochnogo resheniya // Kibernetika i programmirovanie.-2012.-2.-C. 38-46. URL:
Link to this article

You can simply select and copy link from below text field.

Other our sites:
Official Website of NOTA BENE / Aurora Group s.r.o.
"History Illustrated" Website