Предизвикателства при обхождането на интернет с цел извличане на данни

Challenges in Web Crawling for Data Collection

Georgi Cholakov 1), Emil Doychev 1),
Svetla Koeva 2)
1) Plovdiv University „Paisii Hilendarski“ Faculty of Mathematics and Informatics
2) Institute for Bulgarian Language „Prof. Lyubomir Andreychin“ - Bulgarian Academy of Sciences

https://doi.org/10.53656/math2024-1-1-cha

Abstract. The article presents the challenges of implementing a System for data retrieval and visualisation from the Internet by crawling language resources from the Hugging Face repository and extracting the associated data. The data in the system is updated at regular intervals to track the dynamics of language resource creation for different time periods. The article presents: a) the analysis of the available data and its structure; b) the chosen method for crawling the pages and extracting the data. The shared experience of overcoming the specific challenges can serve to solve similar problems related to the extraction of data from the Internet, a task that often has to be solved in various projects (including school projects).
Keywords: web crawling; automatic data extraction; linguistic datasets

Challenges in Web Crawling for Data Collection

Последвайте ни в социалните мрежи

Видеопослание за Левски

An Approach and a Tool for Euclidean Geometry

An Approach and a Tool for Euclidean Geometry

Are Established Taxonomies Relevant for e-Learning?

Student Satisfaction with the Quality of a Blended Learning Course

Последни публикации

Полезни линкове

Az-buki Weekly

Scientific Journals

Newsletter

Welcome Back!

Create New Account!

Retrieve your password

Challenges in Web Crawling for Data Collection

Свързани статии:

Последвайте ни в социалните мрежи

Видеопослание за Левски

An Approach and a Tool for Euclidean Geometry

An Approach and a Tool for Euclidean Geometry

Are Established Taxonomies Relevant for e-Learning?

Student Satisfaction with the Quality of a Blended Learning Course

Последни публикации

Полезни линкове

Az-buki Weekly

Scientific Journals

Newsletter

Welcome Back!

Create New Account!

Retrieve your password