In meetings with clients, hedge fund representatives present flashy charts and speak equal parts oracle and mad scientist. And for technical analysts who market themselves as the most technical of analysts, the mathematical jargon—“stochastic oscillators,” “Fibonacci ratios,” “Elliot wave,” “Golden ratio”—evinces a certain disarming beauty. “This mathematics is embedded in the structure of the universe,” Cynthia Kase, who runs a firm that employs “wave analysis” to predict oil prices, told Bloomberg News in 2012. “It is the language of God.”
Though much of this language is too gaudy to be embraced by sophisticated investors, there is a more subtle mathematical con that many, including editors at most of the top financial journals, overlook. The positive results that emerge from testing the performance of an investing algorithm on past market data, a process known as backtesting, can seem reliable and logical. And they sometimes are. But often, in practice, presentations of these results, though marketed as scientifically rigorous, conceal statistically insignificant methodologies.
A group of real mathematicians led by David H. Bailey, who spent much of his career at NASA and Lawrence Berkeley National Laboratory before departing recently to take a research position at the University of California-Davis’ computer science department, has finally grown so affronted by the financial quackery that they’ve decided to formally call the bluff. In the May issue of Notices of American Mathematical Society, the researchers point out the responsibility mathematicians have assumed historically in correcting these kind of pseudo-scientific errors:
Historically, scientists have led the way in exposing those who utilize pseudoscience to extract a commercial benefit. As early as the eighteenth century, physicists exposed the nonsense of astrologers. Yet mathematicians in the twenty-first century have remained disappointingly silent with regard to those in the investment community who, knowingly or not, misuse mathematical techniques such as probability theory, statistics, and stochastic calculus. Our silence is consent, making us accomplices in these abuses.
What many financial advisers and research papers are peddling as predictively profitable investments is, in many cases, the result of sloppy statistical legerdemain. Young researchers with few resources adjust their algorithms to the market data they’re working with through thousands or even millions of computer trials until the desired results, big profits, ripple to the surface.
What Bailey and his co-authors reveal is that the statistical significance of the investment algorithm plummets after a certain number of trials are used to better fit the sample numbers the analysts are working with. It’s called “overfitting.” The practice, which the authors consider near “pathological” in the industry, completely erodes the worth of performing the calculations.
“What you end up doing is that the models that you derive or you select tend to just focus on idiosyncrasies of the data, and don’t have any real fundamental forward predictive power,” Bailey says. “And then furthermore, another result that we derive in the paper is that, in fact, under the very realistic assumption that the stock market has some degree of memory, in fact, an overfit strategy is actually somewhat likely to lose money rather than gain.”
In other words, the algorithm is trained so well around points in the working dataset that once it’s unleashed in a live marketplace, it flails. “By optimizing a backtest, the researcher selects a model configuration that spuriously works well [in sample] and consequently is likely to generate losses [out of sample],” the authors write in the paper.
When it comes to publishing in the Journal of Finance or the Journal of Financial Markets, the editors simply don’t have the mathematical knowledge necessary to vet some of the more complex and nuanced assertions.
To preserve statistical significance, researchers must limit the number of trials they attempt. “For instance, just as an example, if you only have, say, five years of daily stock data for some security, then you better not try more than 45 models, or you’re almost certain to find one that looks at least one standard deviation better than neutral,” Bailey says. And for purposes of transparency, hedge funds should disclose the number of trials they used during the construction of their investment strategy.
Without protecting against overfitting, the numbers and endless manipulations can prove almost anything, according to a high-level financial services insider with a strong research background. “I mean, if someone asks me to predict the price of gold compared to the price of … the number of abortions in India, guess what?” says the industry source, who wished to remain anonymous. “We are going to get somewhere, somehow, some prediction between the two variables. And that’s the problem. The problem is that backtests are routinely published even at the top journals, where the conclusions are unsupported.”
When it comes to publishing in the Journal of Finance or the Journal of Financial Markets, the source says, the editors simply don’t have the mathematical knowledge necessary to vet some of the more complex and nuanced assertions. In the industry, the problem is even worse. “These are not easy concepts,” the source says. “And it requires some mathematical background and, as a result, investors are routinely presented with investment strategies that look mathematically-sound and scientifically-supported, but the evidence is just not able to confirm that.”*
According to the paper, proprietary software that financial advisers use is often designed to produce the best fits. “We strongly suspect that such backtest over-fitting is a large part of the reason why so many algorithmic or systematic hedge funds do not live up to the elevated expectations generated by their managers,” the authors write.
Of course, there are exceptions to the rule. At firms which employ hundreds of researchers, including many mathematics Ph.Ds, there is far more rigor. And the results show it. Bailey pointed to hedge fund Renaissance Technologies, which has “very bright mathematicians” conducting “very scientifically rigorous” analysis. The firm’s setup, according to the other source, is proof that math, when used correctly, can produce unbelievable results. “The difference is that these [successful] firms work like laboratories,” the source says. “There is no hedge fund that has been more profitable in history than Renaissance Technologies, period.”
But in many other cases, according to Bailey and the source, the more rigorous methodologies are not being applied. To produce reliable investment strategy, it takes a good degree of monetary investment and expertise, and most firms simply skimp. “This kind of work is very labor intensive, requires a lot of resources, a lot of focus and dedication,” the source says. “And what happens is that they take a couple of Ph.Ds and say, you know, ‘Invest the money.’ And guess what the Ph.Ds do? They say well, you know, we don’t have the resources for doing this right, but maybe—it’s kind of a lottery ticket. And they run the lottery ticket. They actually gamble, and very often, the gamble goes wrong.”
If many investors sign up with same methodologically flawed fund, it could mean huge financial losses. It would also be hard to quantify the cascading effect of high-value investors losing trust in the market.
There could be even more dangerous external effects, outside of the economy. Though the losses wouldn’t represent a mathematical failure, it would likely be hard to convince investors who lost their retirement savings on a bad investment deal that they were duped by the marketing of bad math. Some might start doubting the credibility of the larger institutions of math and science altogether. And that may be the worst effect of all, according to the financial services source. “It may sound like a trivial concern,” he says. “But it took 500 years to kind of prove to society that reasoning and science can be helpful, and, as a result, now we live in a planet where the expectancy of life is much better. And most people live much better. What value does this have? I think that has a tremendous value. And if we now let people suffer because we allow people to believe that math and science is actually detrimental to their lives, then we are all going to suffer for that, too.”
Bailey suggests a regulatory agency like FINRA could enforce some basic standards, like limits on the number of allowed trials on these types of calculations or at least forced disclosure.
But education of investors is crucial too. In that vein, the researchers hope to develop software tools that could be used by both investors and firms to determine if overfitting is an issue. “Right now, these guys are being completely unchallenged,” the financial source says. “And as a result, they think that they can sell anything. That they just need to sound smart.” But if tools and education about this problem are more widely available, that won’t be the case. “What will happen is that the next time these guys go into a room, they will be confronted with hard questions,” he says. “[P]eople do not need to have a very strong background to recognize when someone is just kind of dodging questions.”
In the paper’s conclusion, the mathematicians cite a famous line from Enrico Fermi, a pioneering quantum physicist, about the shady practice of overfitting. “I remember my friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” Given how unruly a circus the financial industry really is, let’s hope the elephant trainers can be tamed.
*UPDATE — April 28, 2014: We originally referenced the Journal of Economic Studies, but our source has since clarified that he misspoke and intended to cite the Journal of Financial Markets.