Throughout this past summer, one of the 10 most-viewed items on Miller-McCune.com has been a short story posted on April 1st titled "Baseball's Best Teams Are...." It contained mathematician Bruce Bukiet's predictions for the 2008 Major League Baseball season.
In 2000, the math professor at the New Jersey Institute of Technology created and copyrighted a mathematical model to compute the probability of any one team defeating any other team. He notes on his Web site that his system "beat the odds" for five of the past seven years.
So how did his system perform for 2008? Well, it was closer to an infield single than a grand slam.
"This year I did worse than usual," said Bukiet, whose passion for numbers and baseball is evident as he describes what happened. "But I went back and looked at how many games Sports Illustrated was off, and it was similar to mine. They're the experts, and I figure if I come close to the experts, it's not so bad."
Of the six divisions (three in each league), Bukiet predicted only one correctly: the American League West. His models showed the Angels running away with the division, which they did. Indeed, they did even better than he predicted, winning 100 games instead of the anticipated 92.
In the AL East, Bukiet predicted a tie between the Red Sox and the Yankees, who would have 98 wins apiece. In fact, the Tampa Bay Rays — who, in fairness, surprised almost everyone — took the division with 97 wins. Boston came in second with 95 wins and became the wild-card team, but the Yankees won only 89 games, coming in a very disappointing third and missing the playoffs.
Bukiet's projections were way off in the AL Central, which he predicted the Tigers would easily win. Instead, the Detroit team came in last, with 74 victories rather than the projected 96. The White Sox, which he predicted would come in third, ended up on top, with 89 wins (10 more than projected).
In the National League, Bukiet predicted the Mets winning the East (it was the Phillies) and the Rockies the West (the Dodgers ended up on top of that mediocre division). In the Central Division, Bukiet saw a close race. In fact, the Cubs made it look easy, winning 97 games (he predicted 83). The Milwaukee Brewers, his projected winners, ended up in second place and became the wild-card team.
Besides, he did come out ahead on a more important statistic.
"During the season, each day, based on the betting payoffs you can find online, I put on my Web site whether or not you should bet on a team in a given game," he said. "I'm up this year by a tiny bit.
"During the season, there were just under 400 games I thought you should wager on, out of 2,400 — roughly one out of every six games. It looks like this'll be the sixth out of eight years that you will have made money if you followed my recommendations. You wouldn't have made a lot, but it's a neat way to show that, using math, you can beat the bookies' spread." (Ed. — Not that we're encouraging illegal wagering, mind you. It's all about solutions around here.)
Bukiet's model is a relatively simple one. It does not factor in such unpredictable occurrences as injuries or slumps or such unknown qualities as how much of an influence a new manager can have (which appears to be significant in the case of Joe Torre and the Dodgers). It also doesn't predict the batting averages of rookies, since he doesn't follow the minor leagues and has no past Major League record to go on for those players. (The surprising success of the Tampa Bay Rays is largely due to the impressive play of several rookies, including Evan Longoria.)
Here's how he describes his model:
"In baseball, almost every interaction in a game is a pitcher vs. a hitter. Yes, fielding matters; yes, base running matters. But it's pretty much those two guys (who determine what happens on a play).
"Another key aspect is that in baseball, there are only 25 different 'situations.' You can have zero, one or two outs and eight base-running combinations. So you can have no one out and a guy on first; no one out and men on first and second; etc. There are 24 of those situations, and three outs is the 25th.
"That's much simpler than football, where you can be on 100 different yard lines. But complicating things is the fact that order matters. If you get a single and a home run, it's two runs; if you get a home run and then a single, it's one run. So any model has to take that into account.
"I consider six things a player can do. Batters almost always get an out, a walk, a single, double, triple or home run. There are errors (but they are rare enough that I don't factor them in). Based on his history, a given player has a 15 percent chance of getting a single, a 5 percent chance of getting a double, or whatever it is.
"So if you put the guys in order, you can say, after the first player, there's a 20 percent chance there will be a guy on first. When the second guy comes up, whatever he does turns one of those 24 situations into another one. You cycle through the order until there's the probability you are out of the inning, or you have 27 outs and the game is over.
"The math is all adding and multiplying and bookkeeping. There's no calculus. Nothing fancy."
Bukiet averages the hitting percentage for each batter for the past three years (or for one or two years if they are relatively new to the majors). He doesn't calculate how well a given hitter does against a given pitcher; there isn't sufficient data for that.
"But for each pitcher, I have a number for how good they are at keeping people off the bases, compared to average. If the average pitcher lets on one out of every three batters, and a particular pitcher lets on 10 percent less, he'd have a pitcher number of 0.9. You multiply the batter's offense (against that number)."
With all that in place, he crunches the numbers. "What comes out of it at the end is the probability that a given team will get a certain number of runs. When you compare that to the other team's probability of getting a certain number of runs, you end up with the probability of one team winning the game."
So what went wrong this year?
"With the Yankees, half their starting pitchers got injured. I don't know what happened to the Detroit Tigers. Both Sports Illustrated and I said they should have won the AL Central, and they ended up in last place. The Tigers won 22 fewer games than they should have. Every single one of the starters had a batting average that was down a lot (from previous years) — 10, 20, 30, 40 points."
He is tweaking the model a bit for next season. "I had a high school student over the summer who got some data to make my model a little more realistic," he said. "In my model right now, if the batter hits a single, the guy on first only goes to second. In fact, sometimes they go to third, and sometimes they get thrown out. He was able to download information and process it in a way that I can put in a more advanced 'runner advancement' model."
Bukiet's favorite team, the Mets, did not make it into the postseason. But he is nevertheless busy calculating the odds of any given team making it to the next round. His charts can be found here.
As of Friday afternoon, he calculated that the Dodgers have an 89.7 percent chance of winning their division series, after beating the Cubs in the first two games. The odds similarly favor the Red Sox, Rays and Phillies.
He promises to keep the chart updated daily, and apologized it took until Friday afternoon to plug in all the numbers from Thursday's games. "I had to teach a class," he said with a laugh.
Isn't it annoying when life interferes with baseball?