Dr. Georgi Cholakov, Assist. Prof. 1), Dr. Emil Doychev, Assoc. Prof. 1) Prof. Dr. Svetla Koeva 2)
1) University of Plovdiv “P. Hilendarsky”
Plovdiv, Bulgaria
2) Institute for Bulgarian Language
Bulgarian Academy of Sciences
https://doi.org/10.53656/math2023-5-3-sys
Absract. The article presents a system that dynamically displays the availability of
language datasets and language models found in large repositories such as Hugging Face.The goal of developing such a system is to demonstrate that, outside of English, the datasets and language modelsrequired for advancements based on or utilizing language technologies and artificial intelligence have either moderate or fragmented support. At the same time, the description of the system architecture introduces readers to easy-to-use instruments such as Node-RED, MariaDB, and Grafana, which offer a wide range of application opportunitiesin solving various tasks like crawling and collecting data from the Internet, storing information in a database, and data visualization in a clear and functional manner. Each of these tasks, as well as their combinations, can be used to carry out student projects at the senior high school level of education.
Keywords: automatic data collection; data visualization;
language datasets; language models