Starting with Finetuner 0.8.0,
we have introduced a suite of pre-trained text embedding models licensed under Apache 2.0.
These models have a variety of use cases, including information retrieval, semantic textual similarity, text reranking, and more.
The suite consists of the following models:
jina-embedding-t-en-v1 [Huggingface]: The fastest embedding model in the world with 14 million parameters.
jina-embedding-s-en-v1 [Huggingface]: This is a compact model with just 35 million parameters, that performs lightning-fast inference while delivering impressive performance.
jina-embedding-b-en-v1 [Huggingface]: This model has a size of 110 million parameters, performs fast inference and delivers better performance than our smaller model.
jina-embedding-l-en-v1 [Huggingface]: This is a relatively large model with a size of 330 million parameters, that performs single-gpu inference and delivers better performance than the other models.
importfinetunermodel=finetuner.build_model('jinaai/jina-embedding-s-en-v1')embeddings=finetuner.encode(model=model,data=['how is the weather today','What is the current weather like today?'])print(finetuner.cos_sim(embeddings[0],embeddings[1]))
Jina Embeddings is a suite of language models that have been trained using Jina AI’s Linnaeus-Clean dataset.
This dataset consists of 380 million query-document pairs of sentences.
These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
The Linnaeus-Full dataset, from which the Linnaeus-Clean dataset is derived, originally contained 1.6 billion sentence pairs.
Each Jina embedding model can encode up to 512 tokens,
with any further tokens being truncated.
The models have different output dimensionalities, as shown in the table below:
If you find Jina Embeddings useful in your research, please cite the following paper:
@misc{günther2023jina,
title={Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models},
author={Michael Günther and Louis Milliken and Jonathan Geuter and Georgios Mastrapas and Bo Wang and Han Xiao},
year={2023},
eprint={2307.11224},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
You can ask questions about our docs. Try:
What is Finetuner?
How does Finetuner Work?
What makes Finetuner unique?
Hi there 👋
Ask our docs!😵💫 I'm sorry but I don't know the answer.