-- Living Mobile --: AI/ML What is Document Term Matrix

Tuesday, November 15, 2022

AI/ML What is Document Term Matrix

The text data is represented in the form of a matrix. The rows of the matrix represent the sentences from the data which needs to be analyzed and the columns of the matrix represent the word. The dice under the matrix represent the number of occurrences of the words. Let’s understand it with an example.

import pandas as pd

from sklearn.feature_extraction.text import CountVectorizer

docs = [sentence1, sentence2, sentence3]

print(docs)

docs = [sentence1, sentence2, sentence3]

print(docs)

vec = CountVectorizer()

X = vec.fit_transform(docs)

#now this can be converted to and printed using data frame

df = pd.DataFrame(X.toarray(), columns=vec.get_feature_names())

df.head()

An example view from another workspace is

References:

https://analyticsindiamag.com/a-guide-to-term-document-matrix-with-its-implementation-in-r-and-python/

-- Living Mobile --

Tuesday, November 15, 2022

AI/ML What is Document Term Matrix

No comments:

Post a Comment

Followers

Blog Archive

About Me