Tackling a high-impact predictive modeling project can be intimidating, particularly for chief information officers and other upper administrators without extensive data science backgrounds. These projects are daunting at times, but they have the potential to be incredibly enlightening and empowering. The difference is in the approach, as I know from more than a decade of experience completing data projects with colleges and universities.
Here's what I've learned: those who lean in and partner with data scientists throughout the model development process can learn a lot more about their institutions than those who focus solely on final results. So how do you make a predictive modeling project transformative rather than score-centric and transactional? Embrace the journey.
Where Are You Headed?
First, you need to have a destination for your journey. The destination here refers to the questions your institution wants to answer with data, as well as potential actions that can be taken once the answers to those questions are known. Let's say you're aiming to predict freshmen retention. You might formulate your destination as follows: "Which of our first-time, full-time, degree-seeking freshmen are at the highest risk of dropping or transferring out before their third term? What traits and activities are associated with students leaving the institution, and what potential interventions can help us retain more of our students?"
You might determine that for a retention model, you'll need academic, financial, and social data points for each student, such as high school and first-term college GPAs, number of credits attempted and earned, distance of campus from the student's home, unmet financial need, and institutional scholarship offers. This data is usually stored in a variety of institutional source systems, such as those used by your admissions, enrollment, and financial aid offices.
How Will You Get There?
Once you know where you're headed, the types of data you need to get there, and where that data is stored, you can get "road ready." This includes defining the rules of the road by getting your data governance principles and practices in place and packing for the trip by merging the data from multiple source systems into one summary record per student. You'll need a team that includes subject-matter experts from each area to help gather, document, and validate the data; data engineers to prepare the analytical data set and create a pipeline to automatically refresh it as the underlying data changes; and data scientists to conduct the analysis and build the predictive model.
To determine how to best assemble this team, you'll need to decide what kind of journey to take. There's the do-it-yourself (DIY) option, in which you handle all aspects of your trip; the guided tour with an experienced partner familiar with navigating the routes to your target destination; or the prepackaged trip where a travel agent handles everything for you. Which is the best fit will depend on your institution's unique needs, goals, and resources.
If you choose DIY for your predictive modeling journey, you'll be very hands-on. You'll hire the staff you need for a robust analytics program, ranging from engineers to data scientists. You will map out all of the analytics program's activities and tasks and assign each one to the most appropriate staff member. Your internal team will lead and manage all efforts related to your program. You'll also need to build or buy a vehicle—your data analytics platform. It will be a significant investment of time and money, but once you create your predictive modeling capability and unleash your data scientists on it, you will have your own in-house team and infrastructure, which is a great benefit to institutions able to afford and sustain such a model.
If the DIY approach is cost- or time-prohibitive for your institution, consider a guided tour or a prepackaged trip. With a tour you will select a guide to customize a travel plan, help you along every step of your journey, offer suggestions for routes to take, and troubleshoot any difficulties that arise. This option bolsters your in-house team by adding capacity and capabilities of additional data scientists and other experts who will work at your side. And because it is a partnership, you'll explore your data together, gain insight into how the model works, and maintain some flexibility if you want to make adjustments. While the vehicle for the journey will vary depending on the partner—some offer a data analytics platform as part of their services, while others do not—your guide will most likely show you how to get the most out of the vehicle so you can hit the road in the future with or without involvement from the guide.
For prepackaged trips, you'll share your destination and data for inclusion in a prebuilt algorithm that cannot be customized nor altered. You'll get to your destination—a predictive score—but you won't necessarily know how you got there or what you missed along the way because you're removed from the process. However, outsourced journeys allow in-house staff to step back after they do the prep work and focus on other projects, which is what often attracts institutions to this option.
On the Road
When I was a data analyst at a university, we undertook a project using the prepackaged approach. We prepared and sent our summarized data set and got a predictive score in return, but because of our distance from the process, no one knew what that score really meant, what had gone into its determination, or how customized the predictive model was to our institution. This limited its usefulness for us at the time, and because the data set we sent was static, we couldn't generate scores for future cohorts without refreshing the data set ourselves and resubmitting it to the vendor. You'll want to weigh these additional factors as you consider this approach.
If you select the DIY or a guided tour option, once you are prepped and ready to set off on your metaphorical road trip, you will be able conduct what is called exploratory data analysis (EDA). Just as you can assess many routes to your destination and read reviews about potential stops along the way, data scientists explore many data combinations and evaluate multiple models to navigate you to your intended outcome—in this case, predicting the retention likelihood for each student. By exploring the patterns for past students with known outcomes, data scientists can identify key factors that correlate with retention and include those as inputs into the model.
Predictive modeling isn't magic, but in the same way that a navigation app can assess multiple routes to get you to your destination (including those on roads you never knew existed), a predictive model can simultaneously analyze many more fields in your data set than humans ever could. Even the best data analyst can only consider so many variables at a time and their relationships with the question you're trying to answer, but predictive modeling algorithms can pick up on patterns across dozens of variables to make a prediction, such as each freshman student's likelihood of retaining to the fall.
Identifying and interpreting these patterns will help you make institutional decisions on what policy changes might have the greatest impact and on which students can gain the most from particular interventions. For example, you will be able to say, "For this particular student, with each of his or her individual data points, here's a score indicating how likely the student is to be retained and the factors that led to that score, based on patterns identified in past students' data."
By being involved in the development of the model, you and your subject-matter experts can gain additional insights into those patterns at your institution. Here's an example: depending on your school's geography, knowing how far a student lives from campus might matter more for predicting retention than which state a student is from. If an institution near a state line only includes an indicator of whether a student lives in state or out of state, the model might miss out on patterns such as the difference in retention rates for out-of-state students who live just across the state line but close to campus and in-state students whose home is all the way across the state, much farther away. These connections can be made as a result of being close to the data exploration and model development.
Embracing the Journey
Institutions would do well to remember what goes on in their classrooms. The greatest learning takes place when students are understanding concepts, laying bare biases, and interrogating assumptions en route to a new, deeper understanding of what's at stake. The value isn't in the grade but in the intellectual growth gained in the process of achieving that grade. Predictive modeling projects work the same way. Ultimately, it isn't the destination but the journey that offers the richest experience. Whether you take the DIY, guided tour, or prepackaged option, the goal is to understand your own community and institution and their priorities as best as you can to make informed decisions. The work to arrive at that understanding is at least as impactful as any predictive score.
To extract the greatest value from your predictive modeling program, embrace each step along the way and get as close to the data science as your resources—your staff and your budget—allow. This is work that your data scientists want to do and are excited to take on if they're afforded the opportunity.
To view the original article, please refer to the following link: https://er.educause.edu/blogs/2020/4/more-than-a-score-extracting-the-greatest-value-from-predictive-modeling