Dr. Nadezhda Borisova, Assist. Prof.,
Dr. Elena Karashtranova, Assoc. Prof.
South-West University “Neofit Rilski” – Blagoevgrad (Bulgaria)
Abstract. The Internet serves billions of users providing a variety of information resources whereby a lot of the information is presented in natural human language and needs an efficient approach to be processed.
Natural language processing (NLP) refers to the ability of computers to analyze and understand the structure of human language. By utilizing NLP this linguistic knowledge is transformed into algorithms for solving specific problems. GATE is widely used, open-source software infrastructure that provides a framework and components for solving NLP tasks. The available GATE tools can be adapted to other languages and text processing tasks.
This article will present an approach for converting numeric data, written as words in Bulgarian, into digit numbers. For this case, a relevant conﬁguration ﬁle for Bulgarian has been integrated into the general tool set in the open source software for natural language processing GATE. The aim of this survey is to determine the exact numeric value of Bulgarian text numeric data, which can be used as a starting point for producing more complex annotations, such as monetary measurement units, etc.
Keywords: Natural language processing; Bulgarian grammar; GATE