Tuesday, October 11, 2022

AI/ML Spacy package

English pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler, lemmatizer.


import spacy

from spacy.lang.en.examples import sentences 

nlp = spacy.load("en_core_web_sm")

doc = nlp(sentences[0])

print(doc.text)

for token in doc:

    print(token.text, token.pos_, token.dep_)


‘en’ stands for English language, which means you are working specifically on English language using the spaCy library.

‘core’ stands for core NLP tasks such as lemmatization or PoS tagging, which means you are loading the pre-built models which can perform some of the core NLP-related tasks.

‘web’ is the pre-built model of the spaCy library which you will use for NLP tasks that are trained from web source content such as blogs, social media and comments.

‘sm’ means small models which are faster and use smaller pipelines but are comparatively less accurate. As a complement to ‘sm’, you can use ‘lg’ or ‘md’ for larger pipelines which will be more accurate than ‘sm’.

 


references:

https://spacy.io/models/en#en_core_web_sm

No comments:

Post a Comment