Monday, June 24, 2024

What is embedding distance Comparison in Langchain

 What is embedding distance Comparison in Langchain 


To measure semantic similarity (or dissimilarity) between a prediction and a reference label string, you could use a vector distance metric the two embedded representations using the embedding_distance evaluator.[1


Note: This returns a distance score, meaning that the lower the number, the more similar the prediction is to the reference, according to their embedded representation.



Various embedding distance implementations are 


[<EmbeddingDistance.COSINE: 'cosine'>,

 <EmbeddingDistance.EUCLIDEAN: 'euclidean'>,

 <EmbeddingDistance.MANHATTAN: 'manhattan'>,

 <EmbeddingDistance.CHEBYSHEV: 'chebyshev'>,

 <EmbeddingDistance.HAMMING: 'hamming'>]


Whole implementation is like below 



def embedding_distance_evaluator():

   evaluator = load_evaluator("embedding_distance")

   result = evaluator.evaluate_strings(prediction="I shall go", reference="I shan't go")

   print("result for evaluation ",result)

   result = evaluator.evaluate_strings(prediction="I shall go", reference="I will go")

   print("result for evaluation ",result)

   distances = list(EmbeddingDistance)

   print('Embedding distances ',distances)

   evaluator = load_evaluator(

    "embedding_distance", distance_metric=EmbeddingDistance.EUCLIDEAN

   )

   embedding_model = HuggingFaceEmbeddings()

   hf_evaluator = load_evaluator("embedding_distance", embeddings=embedding_model)

   score = hf_evaluator.evaluate_strings(prediction="I shall go", reference="I shan't go")

   print("score from first HF evaluation ",score)

   hf_evaluator.evaluate_strings(prediction="I shall go", reference="I will go")

   print("score from second HF evaluation ",score)



References:

https://python.langchain.com/v0.1/docs/guides/productionization/evaluation/string/embedding_distance/

No comments:

Post a Comment