Skip to Content

Darla Moore School of Business

Students get exposure to real-life data problems in class with Kaggle

May 11, 2017

Looking around the class, you wouldn’t know these students were presenting final results for a statistics project that has no right answer. Normally, college sophomores might find such a task daunting, but this group has undertaken a different type of course. As a result, they are calm, curiously inquisitive and data savvy in a way that the faculty at the Darla Moore School of Business hope will be a growing trend. 

Over the past year, the Moore School has rolled out a number of changes as part of its Undergraduate Excellence Initiative. Central to that effort is increasing rigor in the classroom — especially in statistics and data training. This particular class, a pilot section of the new Statistics for Business and Economics class (MGSC 291), might soon be the foundation. It explored more abstract, real-world problems this semester using cases, projects and a data competition platform called Kaggle.

Launched in 2010, Kaggle serves as a platform for predictive modeling and analytics competitions. Companies post real-world data, and students and other data analysts from all over the world compete to produce the best models. The crowdsourcing approach to solving business analytics problems reflects the fact that there are many strategies that can be applied to any predictive modeling task.

Professor Joel Wooten, who teaches the class, has done research with Kaggle contests before and thought it would be a good way to integrate that real-world aspect into the classroom environment.

“Using these types of problems makes the class more ambiguous, and students have been given less direction,” he said, “but this mimics what they will see in their internships and jobs, so we’re giving them a better toolkit for tackling such tasks.”

For their final project students were challenged to predict the selling price of 1,500 houses. However, for these houses, they never actually see the sales prices. Instead, they were given data on 79 variables for the homes — from square footage to proximity to a railroad line — as well as a second set of data (a training data set) on 1,400 different houses so they could explore how house characteristics influence price. They then submitted their predictions to Kaggle to see where they placed on the leaderboard. After observing their scores (given as root mean squared error, a measure of how far off their prices were from reality), teams went back to the data to improve their methodology, submit again and try to claim the top spot on the leaderboard.

“The project was so open-ended that we could solve it any way we saw fit,” said Abby Barker, a student in the class. “Before taking this class, my statistics courses always presented me with problems that I solved by more or less plugging some information into an equation. I think the lack of structure pushed my group to try new ideas. We were free to try all sorts of regressions and were even encouraged to pursue advanced tactics that were not taught in class.”

Several groups taught themselves Python or R — two programming languages well-suited to data analysis — in order to implement some cutting-edge techniques from machine learning. Barker’s partner for the project, Kristin Shipley, explained, “Everyone in the class used very different approaches, and we were able to learn about those during weekly project debriefs. I learned about many ways to create statistical predictors, including simple Random Forest algorithms and Extreme Gradient Boosting models.”

So far, it seems the pilot has been a success. Although students admitted to being intimidated at first, by the end of the class, they seem more confident about their statistics skills overall.

“I believe that projects similar to the Kaggle project should really be what is guiding our education,” Shipley said. “Rather than learning and regurgitating material, we were able to apply what we learned in class and understand how it can affect statistical analysis in the real world.”

Barker agreed, saying, “As such an open-ended problem, I think this project is a good stepping stone into the business world. Future classes will benefit from such an experience as they make their way through the business school and prepare for their careers.”

As the world becomes more inundated with data, these students are showing that the business school’s focus seems spot on and that asking the right questions is actually the right answer.

By Madeleine Vath