Photo taken from

On July 24th, Sara Rosenthal joined us at the HI offices for a discussion on sentiment analysis, the practice of systematically determining meaning and opinion in text . While her current research is still in the development phase, she brought up some great ideas and shared the types of tools she’s used in her work.

Sentiment analysis is a facet of a complicated and sometimes inexact science known as natural language processing (NLP). While NLP tools were once confined to computer science departments, they can now be increasingly found in the social sciences and digital humanities, as well as in a vast and expanding environment of business tools. To better understand how HI uses sentiment analysis to measure the influence of entertainment, the following is an introduction to some of the basics, along with links to advanced research.

The Lexical Approach

In an upcoming paper, Rosenthal, and her advisor, Kathleen R. McKeown, set out to create a sophisticated sentiment detection system, which is able to accurately distinguish between subjective and objective phrases in text. To accomplish this, Rosenthal chose a lexical approach, or the mapping of words from sources of study, such as Twitter and LiveJournal, to various dictionaries that have predefined parameters for categorizing the sentiment or meaning of the word.

There are a variety of dictionaries, or lexical databases, such as the Dictionary of Affect in Language (DAL), Wordnet, and Wiktionary, and each one provides different levels of semantic mapping, or connections among words. These connections act like a thesaurus, and reduce the distance between like concepts.

The DAL was created by Cynthia Whissell, and originally it was designed to measure a text source for ‘emotional meaning.’  It consists of 8,742 words, each with a compound score ranging from 1-13 on pleasantness, activeness, and imagery. The words are also evaluated as either objective or subjective. You can test it here. However the DAL draws on source texts from the 1960s through late 1990s, so it may miss more recent changes in language.

Wordnet differs from the DAL in that it is composed of word relationships. The relationships are defined through structural categories, such as hyponyms or antonyms, and these associations or groupings are called synsets, short for synonym sets. For example, the word ‘harmony’ forms a synset with ‘congruity’ as they are direct hyponyms of each other.  This manner of grouping words has been further abstracted by the WordNet Domains project, which has sorted the synsets by topic. Wordnet is free to download, and manually curated by a team at Princeton University that was until recently, headed by the late George A. Miller, a scientist who helped to define the fields of psycholinguistics and cognitive science.

Teams of scientists created both the DAL and Wordnet. Wiktionary, a lexicon also used in sentiment analysis, takes a different approach. Like Wikipedia, Wiktionary is an open source project and relies completely on the collaborative efforts of its users. Accordingly, it offers a lot of advantages, mainly in the number of words listed, and the strength of their connections.

Much like a map or globe, any one of these dictionaries provides a means for objectively categorizing words. Using these dictionaries increases the number of features that can be extracted from text, ideally without changing the message that’s being communicated.

The Machine Learning Approach

While the lexical approach can easily handle instances of polarity, or positive and negative expressions, how do researchers detect say, sarcastic implications in otherwise sincere expressions? (Hint: #sarcasm is a good start) What about the differences between formal and informal language? Or what happens when a significant number of words can’t be found in a dictionary, like emoticons or intentional misspellings?

To address these problems, researchers may turn to machine learning (ML).  When it comes to sentiment analysis, the ML process involves training an algorithm that can dynamically map the probability that any word (or combinations thereof) is associated with a positive or negative sentiment or emotional state.

Some of those frameworks include conditional random field modeling, Markov network modeling, or Bayesian network modeling, which are all used in NLP as well as a variety of other statistical applications. These are known more generally as graphical models. Graphical models help explain relationships among interdependent variables, such as phrases or sentences, which is why they can be so useful for discerning meaning.

In the context of NLP, ML algorithms are usually directed towards features called n-grams. Similar to the synsets of WordNet, an n-gram is a structural rule for grouping a number of units of meaning (e.g. phoneme, syllable, word, etc.) together. Once a text is broken up into n-grams, a researcher can apply a model to measure the probability that a given unit is associated with some outcome.

ML models can be either discriminative, meaning they act on known data, or generative, wherein they can create data according to a set of rules. There are advantages and disadvantages to either type. Though discriminative models, such as linear regressions, are generally thought of as more accurate, generative models (e.g. the naïve Bayes model) can be more useful with the small, or incomplete data sets that researchers commonly encounter.

An advantage of using any one of these models is that the machine does the heavy lifting, replacing the tedious hand-coding that is otherwise required to categorize text according to its meaning. But an algorithm can only categorize as well as its ability to accurately identify the constituent parts of a novel phrase or word. Some researchers have combined the lexical and ML methods, performing the lexical analysis first, and then using the results to improve upon their ML algorithm. However, one of the considerations of this framework (as with many ML models) is the level of human involvement or supervision over the machine learning process.

Supervised learning takes place when a human explicitly determines all the parameters that a computer uses to categorize language data. This is called training, and is the method used in the majority of ML algorithms. However it is labor intensive and requires known observations, or pre-set labels, before any experiments are run. More recently researchers and companies have turned to workers on Amazon’s Mechanical Turk to provide these labels. In contrast, semi-supervised or unsupervised learning algorithms are able to act on unlabeled data and adjust dynamically, as new patterns from the text source emerge. Of course, algorithms are not infallible, and unsupervised processes can perform less accurately based on the complexity of the possible text patterns (which in turn is often a function of the size of the text, as an increase in the number of words may correlate with an increase in the number of connections among those words). To assess accuracy, researchers may employ an independent lexical approach to process the text, introduce topic modeling, or even crowdsource the process. When dealing with huge amounts of text, the key is striking a balance between accurate results and available resources.


Creating a dictionary or teaching a machine requires immense amounts of written, organic language. Researchers need this data both to cull an extended, relevant vocabulary on which to develop and test their models. Kanjoya, a sentiment analysis firm focused on understanding written emotion, created the Experience Project wherein nearly 19 million user stories have been shared. The site provides an inspiring platform for users to connect, but also provides researchers with a rich assortment of personal, emotional writings.

However, differences in everything from subject matter to site architecture can affect the way in which users communicate. A trained algorithm might provide accurate results for one platform, but given the ever-growing ease for users to access new platforms, those results may quickly become outdated. Distinct differences may even exist across areas of one platform, such as the reviews on, rendering models of analysis on toaster reviews inaccurate when applied to books.

How does HI do Sentiment Analysis?

To bring it all back home—the NLP done at HI tends to be of the supervised machine learning variety. We work with platforms like Crimson Hexagon or AlchemyAPI, and use our expertise in the influence of entertainment to extract meaningful data for our evaluations and analysis. We’ve used NLP in much of our work with clients, as well as in our recent attempt to create a score of influence.

We’re always interested in hosting guest speakers or round table discussions at the Harmony Institute. If you would like to share your work or discuss collaboration potential, please feel free to contact us.

  1. Weston says:

    Brendan, Thanks for your insights! Your point is well taken about realcl errors. The feature and model selection was done on a labeled corpus of 5,000 wall posts about 6 different topics. I haven’t looked at the relative performance for each of those topics but it seems obvious now that I need to do that.The problem is that if there is some classifier bias in different domains, we would need to train the model on hand-labeled data for each domain. If the number of topics we are interested in is small, this could be solved with crowdsourced labeling, as you suggest. But how would we scale to hundreds of thousands of topics (i.e. all words and bigrams, or all entities in Freebase)? That’s an interesting problem that warrants some thought.

  2. Pingback: A New Approach to Making Films That Matter | Diet Fitness and Health

  3. Twitter24 says:

    Hi, just wanted to say i liked this article. it was practical. keep on posting.

Comments are closed.