A major question facing documentary filmmakers today is how to measure the influence of their work. Documentary films seek social change but do so using different narrative styles across a variety of platforms. While viewership, returns, and reviews tell us which films are more popular, how can we determine which have had the most influence?

The advantage of quantifying influence is that it enables comparisons that generate insights into the meaning of effective change. In a business context, advertisers have turned to social media to identify influential people and groups that will amplify their message.  In turn, an industry has formed to convert streams of social media data into indices of influence.  While Klout currently dominates this space, its inability to account for sway beyond social media presents a major flaw.

In an effort to address this shortcoming, we’ve set out to develop a new index of influence – the HI Score. We believe that this score will help quantify the influence of entertainment and identify strategies that facilitate effective impact and positive social change. Furthermore, by making this process as open as possible, we hope to avoid the lack of transparency inherent to metrics based on proprietary methods.

An initial attempt at generating HI Scores combined  data from Twitter, news media, and Google searches.  As our test set, we selected the last three years of Oscar-nominated documentary films. This list provided us with comparable examples and allowed us to expand on some of our past work. Through a trial-and-error process, we arrived at a score of 1 – 100 (the higher the better) that estimates the influence of a documentary film over time.  Below, you can see the cycle of HI Scores for two years of Oscar-nominated documentary films:

A key part of calculating the score was the creation of new tools to address obstacles in the data. One major hurdle was producing search queries that returned content relevant to each film.  Documentaries with unique titles such as Paradise Lost 3 were relatively easy to isolate, but films with less unique names like Undefeated were harder to filter.  In this version of HI Score, we simply remove these data, but addressing this issue will be crucial in generating scores that enable comparisons across many cases.

The advantage of quantifying influence is that it enables comparisons that generate insights into the meaning of effective change.

Working with Crimson Hexagon, we used these queries to count the number of tweets per day mentioning a film, the tone of these tweets (on a positive/neutral/negative scale), and the number of unique followers who saw the tweets. We then input these same queries into LexisNexis, searching for mentions of the films in television and major news outlets (see our Python code to structure the data here).  Finally, to measure general interest in each film, we used Google Insights – a free service that tracks web searches over time. Though the site allows easy export of the data, we found this process tedious when working with many films.  Using a package for R and some online help we made an easy-to-use API.

Google Insights excels in generating data for highly specific searches. However, we also ran into problems with the way their output is formatted. Each term is given a score from 0-100 on a relative basis, making it impossible to compare two or more queries, or in our case film titles. For now, we dealt with this issue by focusing on the degree of change over time rather than the absolute number of searches for a film.

In calculating HI Scores, we tried as best as possible to root our equation in theory.  For instance, we placed greater emphasis on traditional media since these outlets tend to have a larger audience.  Furthermore, we assigned these media more weight over time as their impact is arguably more prolonged than mentions on Twitter. Finally, to balance between brief spikes in buzz and sustained interest, we calculate each film’s score on a moving average.

The question of influence is crucial in measuring the positive impact of media. Until now, it has only been possible to measure effects with traditional indicators like viewers and revenue. The HI score represents an early attempt to generate a more valuable measure of influence, one that can account for a much wider definition of influence. By standardizing this measure, the score will permit meaningful comparisons and give a picture of social return on investment. As we further refine the score, we hope it will become an indicator for media producers and funders interested in maximizing the impact of their work.

For a more technical overview of how we generate the HI Score, we encourage you to check out the data, documented code, and equation and tell us your thoughts!