Here some ideas:
new_str = str.upper()so beer and Beer will be same (if you
list = str.split()to make a list of the words
in your string.
set = set(list)to get rid of double words
- start with an empty word_list. Copy the first set in the word_list. In the following steps you can loop over the entries in your set and check if they are part of your word_list.
for word in set:
if word not in word_list:
- Now you can make a multi-hot vector from your sentence. (1 if word_list[i] in sentence else 0)
- Don’t forget to make your multi-hot vectors longer (additional zeros) if you add a word to word_list.
- last step: make a matrix from your vectors.