Boom or Bust? Factors that Influence Box Office Revenue
INTRODUCTION
According to an empirical analysis by renowned film data researcher Stephen Follows, only 51% of Hollywood films turn a profit. In a high-stakes industry with hundred-million-dollar productions, world famous celebrities, and powerful entertainment conglomerates, the pressure to succeed at the box office is enormous. Many variables such as content quality, marketing, and reputation of the director contribute to the success of a film at a box office, but what are some of the driving factors that influence box office revenue for film studios?
Using a dataset of thousands of films from 1990-present from boxofficeguru.com and R programming, I conducted an exploratory analysis to answer this question. The dataset contained variables such as opening/closing date, total gross, opening gross, number of theaters, distributor, and first week percentages.
Historically, factors such as opening gross, release date, and distributor have been known to play a large importance in box office. This precedent is due to legal contracts with exhibitors, calendar cycles, and brand recognition. In this report, I mainly focused on determining if this was a myth or whether these factors all truly played a significant role in box office returns.
Opening gross is generally important to studios because each additional week their film is in theaters, they take a smaller cut of the tickets sales for that particular film. Generally, studios take a 60% cut on opening weekend and the exhibitor receives the other 40%. For each additionally week, the studio’s cut is usually reduced by 10%. This explains the widely accepted practice of only exhibiting films in movie theaters for no more than one month. As a result, the percentage of total gross via opening weekend has increased drastically over the years, from 15.7% in the 1980s to 21.5% in the 1990s to 33.1% in the 2000s.
Additionally, release date is also extremely important to studios in today’s world. Former 20th Century Fox studio head Tom Rothman claims that “it’s just as crucial to pick the right release dates as it is to select the right script and hire the right stars and filmmakers.” Factors such as school and holidays might play a role in whether people have the time or desire to watch a movie in theaters.
Finally, many people accept the idea of the “big-six” film studios: Disney, 20th Century Fox, Warner Bros, NBCUniversal, Sony Pictures, and Paramount Pictures. Having the brand recognition inherently gives these studios a competitive advantage over other studios. In this exploratory analysis, I seek to determine whether opening gross, release date, and distributor name do play a significant role in box office returns.
METHODOLOGY
In order to download the data, I copied and pasted box office data of thousands of films from the film database on boxofficeguru.com into an excel spreadsheet. From the data extracted, I converted rows that contained missing values for the “TotalGross” variable into NAs. Second, I set films shown in less than 500 theaters to NAs because films with smaller audiences may confound the relationship between total gross and other variables. Third, I set outliers in total gross (>400 million) and opening gross (>100 million) to NAs because they are uncommon instances. I used the complete.cases() function to remove all the rows with NAs. I also engineered a new variable called “DistributorMain” that set each non-big- six studio and non-Lionsgate, non-MGM, and non-NewLine studios to “Indepedent.” In the end, there were 2867 observations for five variables: Title, Opening, TotalGross, OpeningGross, and DistributorMain. I then used the R programming language to organize and visualize the data. The R code used to clean the data is shown below.
#Cleaning and Segmenting the Distributor Data
filmdata$DistributorMain <- “Indepedent”
filmdata$DistributorMain[filmdata$Distributor == “Universal”] <- “Universal”
filmdata$DistributorMain[filmdata$Distributor == “Sony”] <- “Sony”
filmdata$DistributorMain[filmdata$Distributor == “Fox”] <- “Fox”
filmdata$DistributorMain[filmdata$Distributor == “Paramount”] <- “Paramount”
filmdata$DistributorMain[filmdata$Distributor == “Warner Bros.”] <- “Warner Bros.”
filmdata$DistributorMain[filmdata$Distributor == “Buena Vista”] <- “Disney”
filmdata$DistributorMain[filmdata$Distributor == “Lions Gate”] <-“LionsGate”
filmdata$DistributorMain[filmdata$Distributor == “MGM”] <- “MGM”
filmdata$DistributorMain[filmdata$Distributor == “New Line”] <-“NewLine”
#Cleaning Data on Total and Opening Gross
filmdata$TotalGross <- as.numeric(gsub(“,”,””,filmdata$TotalGross))
filmdata$OpeningGross <- as.numeric(gsub(“,”,””,filmdata$OpeningGross))
filmdata$TotalGross <- filmdata$TotalGross / 1000000
filmdata$OpeningGross <- filmdata$OpeningGross / 1000000
#Removing Flawed Values
filmdata$NumOfThtr[filmdata$NumOfThtr < 500] <- NA
filmdata$TotalGross[filmdata$TotalGross == “”] <- NA
filmdata <- filmdata[complete.cases(filmdata$TotalGross),]
filmdata <- filmdata[complete.cases(filmdata$NumOfThtr),]
#Cleaning Opening Dates
filmdata$Opening <- as.character(filmdata$Opening)
filmdata$Opening <- substring(filmdata$Opening, 0, 2)
filmdata$Opening <- as.numeric(gsub(“/”,””,filmdata$Opening))
#Removing Outliers (Films with > $400 Mil in Total Gross and > $100 Mil in Opening Gross)
filmdata$TotalGross[filmdata$TotalGross > 400] <- NA
filmdata <- filmdata[complete.cases(filmdata$TotalGross),]
filmdata$OpeningGross[filmdata$OpeningGross > 100] <- NA
filmdata <- filmdata[complete.cases(filmdata$OpeningGross),]
filmdata <- filmdata[ , c(1,2,4,5, 10)]
DATA
RESULTS
The results show strong support for the notion that opening gross is a strong indicator of total gross. The correlation is 0.89, which implies that there is strong positive linear relationship between opening gross and total gross. Even as opening gross increases, the variability in total gross widens only slightly and there are few notable outliers.
Additionally, it also appears that the month of release data matters. In the graph of total gross, blocked by opening month, May, June, July, November, and December are the months with films that have the best box office returns. This makes sense because May — July is during summer break for many moviegoers and November-December is during the holiday season. Not only is the center higher during those months, but the spread is also wider. Movies that have huge returns in prime release day months have significantly higher returns than box office hits in other months.
Finally, although there is some difference between the boxplots of the big-six film studios (Disney, Warner Bros, Universal, Paramount, Sony, and Fox), smaller, independent film studios generally perform decently at the box office in comparison to the big six studios. However, the smaller studios do have fewer box office hits and the spread for the big-six studios is much wider than the spread for smaller and independent studios like Lionsgate or MGM. Nevertheless, there is still some evidence that the brand recognition of the big-six studios still has some clout in Hollywood.