Tag: text embedding

AI Machine Learning & Data Science Nature Language Tech Research

Nomic Embed: The Inaugural Open-Source Long Text Embedding Model Outshining OpenAI’s Finest

In a new paper Nomic Embed: Training a Reproducible Long Context Text Embedder, a Nomic AI research team introduces nomic-embed-text-v1, which marks the inception of the first fully reproducible, open-source, open-weights, open-data text embedding model, capable of handling an extensive context length of 8192 in English.

AI Machine Learning & Data Science Research

Microsoft’s E5 Text Embedding Model Tops the MTEB Benchmark With 40x Fewer Parameters

In the new paper Text Embeddings by Weakly-Supervised Contrastive Pre-training, a Microsoft research team introduces Embeddings from Bidirectional Encoder Representations (E5), a general-purpose text embedding model for tasks requiring a single-vector representation of texts and the first model to surpass the BM25 baseline on the BEIR retrieval benchmark under a zero-shot setting.