We continue our work with sentiment analysis from Lecture 2. I go over common ways of preprocessing text in Machine Learning: n-grams, stemming, stop words, wordnet, and part of speech tagging. In part 2 I introduce a common approach to k-nearest neighbor classification with text (It is very similar to something called the vector space model with tf-idf encoding and cosine distance) Code and other helpful links:
Hide player controls
Hide resume playing