What is embedding distance Comparison in Langchain
To measure semantic similarity (or dissimilarity) between a prediction and a reference label string, you could use a vector distance metric the two embedded representations using the embedding_distance evaluator.[1
Note: This returns a distance score, meaning that the lower the number, the more similar the prediction is to the reference, according to their embedded representation.
Various embedding distance implementations are
[<EmbeddingDistance.COSINE: 'cosine'>,
<EmbeddingDistance.EUCLIDEAN: 'euclidean'>,
<EmbeddingDistance.MANHATTAN: 'manhattan'>,
<EmbeddingDistance.CHEBYSHEV: 'chebyshev'>,
<EmbeddingDistance.HAMMING: 'hamming'>]
Whole implementation is like below
def embedding_distance_evaluator():
evaluator = load_evaluator("embedding_distance")
result = evaluator.evaluate_strings(prediction="I shall go", reference="I shan't go")
print("result for evaluation ",result)
result = evaluator.evaluate_strings(prediction="I shall go", reference="I will go")
print("result for evaluation ",result)
distances = list(EmbeddingDistance)
print('Embedding distances ',distances)
evaluator = load_evaluator(
"embedding_distance", distance_metric=EmbeddingDistance.EUCLIDEAN
)
embedding_model = HuggingFaceEmbeddings()
hf_evaluator = load_evaluator("embedding_distance", embeddings=embedding_model)
score = hf_evaluator.evaluate_strings(prediction="I shall go", reference="I shan't go")
print("score from first HF evaluation ",score)
hf_evaluator.evaluate_strings(prediction="I shall go", reference="I will go")
print("score from second HF evaluation ",score)
References:
https://python.langchain.com/v0.1/docs/guides/productionization/evaluation/string/embedding_distance/
No comments:
Post a Comment