How to train the Stanford NLP Sentiment Analysis tool

What is the significance and difference between each file?
Train.txt/Dev.txt/Test.txt ?

This is standard machine learning terminology. The train set is used to (surprise surprise) train a model. The development set is used to tune any parameters the model might have. What you would normally do is pick a parameter value, train a model on the training set, and then check how well the trained model does on the development set. You then pick another parameter value and repeat. This procedure helps you find reasonable parameter values for your model.

Once this is done, you proceed to test how well the model does on the test set. This is unseen– your model has never encountered any of that data before. It is important that the test set is separate from the training and development set, otherwise you are effectively evaluating a model on data it has seen before. This would be wrong as it will not give you an idea of how well the model really does.

How would I train my own model with a raw, unparsed text file full of
tweets?

You can’t and you shouldn’t train using an unparsed set of documents. The entire point of the recursive deep model (and the reason it performs so well) is that it can learn from the sentiment annotations at every level of the parse tree. The sentence you have given above can be formatted like this:

(4 
    (4 
        (2 A) 
        (4 
            (3 (3 warm) (2 ,)) (3 funny)
        )
    ) 
    (3 
        (2 ,) 
        (3 
            (4 (4 engaging) (2 film)) (2 .)
        )
    )
)

Usually, a sentiment analyser is trained with document-level annotations. You only have one score, and this score applies to the document as a whole, ignoring the fact that the phrases in the document may express different sentiment. The Stanford team put a lot of effort into annotating every phrase in the document for sentiment. For example, the word film on its own is neutral in sentiment: (2 film). However, the phrase engaging film is very positive: (4 (4 engaging) (2 film)) (2 .)

If you have labelled tweets, you can use any other document-level sentiment classifier. The tag on stackoverflow already has some very good answers, I’m not going to repeat them here.

PS Did you label the tweets you have? All 1 million of them? If you did, I’d like to pay you a lot of money for that file 🙂

Leave a Comment