Words in their raw string form can’t be used as as inputs to machine learning models. Tokenizing is the process of assigning a unique number id for each word in your data set / vocabulary. You can build such a vocabulary by going over your data set and assigning random values to each unique word. You may reserve some of these values as special tokens, e.g.
Given that your input will be of varying length, you have to pad your input with zeroes, such that it matches the length of your expected input size.