If you’ve ever seen the movie Moneyball, you know that mathematics and statistics have not always been part of professional baseball. Sure, they were used to calculate batting averages and earned-run averages (ERA), but they weren’t used to actually create entire baseball teams. That is until the Oakland Athletics began using empirical analysis to shape their team’s roster in the late 1990s.

What followed revolutionized the way professional baseball rosters are constructed. Rather than rely completely on the subjective assessments of their scouts, Oakland’s General Manager Sandy Alderson and Assistant GM Billy Beane turned to the empirically based principles of “sabermetrics” to analyze and update their roster. This method allowed the team to choose players that, while underwhelming to scouts, had the “right” stats—such as on-base percentage (OBP)—that their team needed.

While this didn’t make the A’s unbeatable, it did help them create a team that secured a (spoiler!) franchise record twenty consecutive wins in 2002. Following the general success of the A’s sabermetrics-centric model, teams across the MLB have adopted a largely data-driven approach to building and maintaining their roster, sometimes choosing players based more on their data than their apparent size and skill.

Creating the Optimal MLB Team

While uncommon for their era, Alderson and Beane’s data-driven methodology would find itself right at home in the modern day. In just over two decades, virtually every Major League Baseball team has adopted some form of data analytics into their scouting and roster construction.

Consider the Boston Red Sox, whose Chief Baseball Officer Craig Breslow has a reputation as a data-driven leader. Upon his selection for the role, Breslow reoriented the team’s scouting department, opting to align behind and rely more strongly on their analytics department for player consideration and selection.

Other teams, like the Los Angeles Dodgers and Tampa Bay Rays, have used analytics-driven strategies to reinvigorate their roster with “undervalued” players, à la Moneyball. The Houston Astros relied heavily on their analytics infrastructure to build, develop, and train the roster that won their first ever World Series title in 2017 (and another in 2022). The New York Yankees use analytics for everything from developing game-specific strategies to designing a high-performing “torpedo bat.”

Regardless of the team, players, salaries, OBP, ERA, or other factors involved, each of these analytics-forward approaches ultimately has one thing in common: it brings these teams closer to optimization.

How Does Optimization Work?

On its face, optimization is the process of determining the best possible solution out of a number of potential outcomes. You make an optimization decision any time you pick which of your shirts is best suited for the day’s weather, which breakfast items will hold you over until lunch, and which route to work is likely to have the least traffic.

Optimization for complex decisions is a somewhat more technical process that involves the combination of logic and math. As a data science strategist at Gurobi Optimization, this math-driven approach is at the core of my work. Modern solvers leverage mathematical optimization, a common analytical means of problem-solving that takes complex and multifaceted problems and uses algorithms to find the best possible answer. They do so by assessing the three main components created specifically for each mathematical optimization problem, which include:

  1. Objective Function: The end goal(s) you intend to achieve.
  2. Decision Variables: The items you’re able to control and change.
  3. Constraints: The rules and limitations you must follow.

Optimization solvers, such as the Gurobi Optimizer, take these components, which model your problem, and leverage a series of algorithms to output unbiased and mathematically optimal solutions. The specific problem types involved are—pun intended—variable. The most common variations of Linear Programming (LP), Mixed-Integer Linear Programming (MILP), and Quadratic Programming (QP), amongst others. Each of these allows problems to be modeled in a different way, providing the flexibility to incorporate different mathematical and statistical functions into your optimization.

This might sound a bit like Gen AI and machine learning (ML). While they certainly have their similarities—problem-solving capabilities, roots in algorithms and models—they’re not exactly the same. Gen AI and ML models assess past data and generate predictive answers based on this information. Optimization solutions are prescriptive in nature, providing the best course of action in regard to the given variables and constraints. Regardless, both optimization and AI/ML provide users with the means of enhancing their capacity to make more informed decisions. But what do they have to do with baseball?

Bringing Optimization to Fantasy Sports

It starts, interestingly enough, with fantasy sports. DraftKings offers a range of online Daily Fantasy Sports (DFS) competitions, one of which has players construct an MLB team roster and compete to score more points than their opponents. Like many other fantasy sports, these DFS competitions have spawned an entire cottage industry of information, analytics, and recommendations.

Dartmouth MBA student Adam Scharf saw this as an opportunity to use optimization to level the playing field of fantasy sports—especially for more casual fans who don’t live their lives in fantasy competitions. He developed Smart Roster, a tool that combines predictive analytics, natural language prompts, and optimization modeling to help players approach lineup building more strategically. Instead of manually selecting players, Smart Roster encodes roster construction as an optimization problem and uses a solver—in Scharf’s case, the Gurobi Optimizer—to evaluate roster variables and generate more favorable lineups for a given DFS matchup.

It does so by operating with the following components:

  1. Objective Function: Build a team that scores the most points.
  2. Decision Variables: Which players you choose to add to your roster.
  3. Constraints: Player position limits and salary cap.

Users can also add in their own constraints, optimizing for players on preferred teams, players with low rostered percentages, and more. This flexibility allows the user to generate personalized rosters that still adhere to optimization’s math-based statistical projections. According to Scharf, the goal of Smart Roster is to engage the casual DFS player. By making it easier to create personalized and optimized lineups, he believes that his solution will make the DFS experience more competitive and exciting.

The Future of Optimized Sports

What might this mean for professional baseball? There will always be detractors arguing against the inclusion of analytics, those who think it’s making the game too technical and less entertaining. But analytics and optimization, while data-driven, will never render baseball entirely predictable.

Rather, optimization offers teams another competitive advantage. Like the example set by Smart Roster and DFS, it can help teams make their roster and strategy more compelling and exciting to watch. This doesn’t need to be restricted to OBP, either—teams can be optimized around player batting averages, their number of runs batted in (RBIs), their wins above replacement (WAR), any combination of these factors, and more. There’s still tons of room for scouts and team executives to define what kind of team they want to build and optimize their roster accordingly.

To quote another beloved baseball film, “If you build it, they will come.” Field of Dreams has a point: if you build an optimized, exciting, and competitive team, the fans—and even foes—will come. And at the end of the day, their increased investment and attention will be a win for the teams, the league, and the game of baseball.