Lexical Representation of Dense Numerical Vectors: Introducing LangVec

Simeon Emanuilov, Aleksandar Dimov
Sofia University “St. Kliment Ohridski” (Bulgaria)

https://doi.org/10.53656/math2024-3-1-lex

Abstract. High-dimensional numerical vectors are widely used in machine learning for searching and indexing data. However, it is often difficult for users to interpret their meaning. To address this, we introduce a novel approach that transforms dense vectors into human-readable lexical representations using a percentile-based mapping approach. The essence of the approach is a mapping of words from a predefined/custom lexicon to vectors based on their relative local magnitudes. This way, it enables intuitive visualization of the semantic similarities and differences between complex data points and allows for domain-specific interpretability. It provides an easy way to deduplicate dense vectors (even near-duplicates) and can generate locality-aware hash-like representations, which can be used for efficient indexing and retrieval in various applications. The approach has also been implemented in an open-source library called LangVec. The paper provides examples on LangVec usage and highlights the key applications, including semantic search, recommendation systems, and clustering of numerical data into a human-readable format.
Keywords: interpretable machine learning, vector representations, lexical mapping, semantic similarity, clustering, recommendation systems

Влезте в системата, за да прочетете пълната статия

Lexical Representation of Dense Numerical Vectors: Introducing LangVec

Последвайте ни в социалните мрежи

Служебното правителство отчете свършеното

Разстояния между забележителни точки и неравенства в изпъкналия четириъгълник

Разстояния между забележителни точки и неравенства в изпъкналия четириъгълник

Развитие на дигитални компетентности в задължителната подготовка по информационни технологии в средното училище

Using AI to Improve Answer Evaluation in Automated Exams

Последни публикации

Полезни линкове

Вестник „Аз-буки”

Научните списания

Бюлетин

Welcome Back!

Create New Account!

Retrieve your password

Lexical Representation of Dense Numerical Vectors: Introducing LangVec

Свързани статии:

Последвайте ни в социалните мрежи

Служебното правителство отчете свършеното

Разстояния между забележителни точки и неравенства в изпъкналия четириъгълник

Разстояния между забележителни точки и неравенства в изпъкналия четириъгълник

Развитие на дигитални компетентности в задължителната подготовка по информационни технологии в средното училище

Using AI to Improve Answer Evaluation in Automated Exams

Последни публикации

Полезни линкове

Вестник „Аз-буки”

Научните списания

Бюлетин

Welcome Back!

Create New Account!

Retrieve your password