Numerai-A New Cryptocurrency For Coordinating Artificial Intelligence on Numerai
One hour ago, 12,000 data scientists were issued 1 million crypto-tokens to incentivize the construction of an artificial intelligence hedge fund. Here’s why.
Benevolence of the broker
Markets work not because anyone is trying to make them work. No one trading in a market is trying to make that market efficient. The efficiency is merely a byproduct of the market participants believing in the value of the money abstraction, and then selfishly trying to get more of it.
A stockbroker would prefer a world of permanent inefficiency so he can make money from it. A hedge fund would prefer permanent information asymmetry where it can extract rent on data that no one else has. In the stock market, no one wants the market to be efficient.
The problem here is that self-interested market participants trading with all their might to earn more money are not in alignment over a goal. They are adversaries. They have no incentive to work together, to share knowledge, to share data or share code to improve the market. Finance is anti-collaborative, and that hinders progress big league.
You may think that the problem lies with the self-interest of market participants. A solution may be to curb and regulate self-interest. But regulating self-interest is morally abhorrent. The problem isn’t with self-interest. The problem is with money itself.
Specie and statecraft on the blockchain
Money was invented to solve the coincidence of wants problem and facilitate transactions. But as fiat currencies continue to lose relevance into the 21st century, cryptocurrency presents solutions far beyond money transfer. Cryptocurrency can now be used to incentivize cooperation in populations.
With cryptocurrency, money can now be software. There can be programmed rules for how money behaves. The ability to program money seems subtle, but small changes to the rules of money can have large effects on the behavior of the holders of that money. For a historical example of primitive money software influencing a population, see The Wörgl Experiment of 1932. For a modern example, consider how bitcoin incentivized thousands of people around the world to mine it.
The stock market presents a situation similar to the prisoner’s dilemma. The market would be better off if market participants collaborated, but rationally they don’t. Regular money simply does not incentivize them correctly. Regular money is too low-tech.
Imagine the prisoner’s dilemma in a world that exists entirely on a blockchain. Now suppose the prisoners are issued a cryptocurrency similar to a normal money except for one small change: it is programmed to self-destruct whenever anyone goes to prison. By defining the money in this way, the prisoners’ fates are now financially bound. Prisoners in this scenario realize that if they don’t keep the other prisoner out of jail, they will lose all of their money with certainty.
This new cryptocurrency results in a world where citizens have a financial incentive to collaborate to keep each other out of jail. The prisoners are still motivated by self-interest but they now live in a universe where the money nudges them to collaborate in pursuit of that self-interest.
Last year, Numerai proposed a new kind of hedge fund, which allows any data scientist to build machine learning models on our data, and submit predictions to control the capital in our hedge fund.
Today, we are releasing a new money abstraction for Numerai. It begins a new commerce with our data scientists based on long-term alignment not possible with regular money.
It is a new cryptocurrency called Numeraire, and it makes collaboration compatible with self-interest.
Proof of intelligence
Data scientists collaborate on Numerai already. They share code. They share ideas on Slack. They write blog posts and tutorials. Numerai already has the spirit of a collaborative open software project. But the system design isn’t perfect.
It isn’t economically rational to tell your friends to join Numerai because it isn’t rational to help anyone beat you. There is a finite amount of bitcoin given away each week so the game is zero-sum.
So Numerai, like the market, has negative network effects — and that’s bad. Bitcoin facilitates the trade of dollars for machine intelligence on Numerai but this transaction clears the relationship and connection between Numerai and the data scientist because bitcoin and US dollars have little to do with Numerai. A Numerai data scientist has no economic incentive to tell his data scientist friends about Numerai. He would only be bringing in competition and making it harder for himself to earn bitcoin. But if every data scientist could benefit from the overall network improving then collaboration would become rational and the game would shift to positive-sum.
Today, Numerai issued 1,000,000 Numeraire crypto-tokens to our existing 12,000 data scientists based on their past performance in Numerai tournaments. There will be no crowdsale. Numeraire can be earned right now by competing in Numerai’s data science tournaments. In a sense, Numeraire is mined by data mining Numerai’s data, and submitting predictions is the proof of work.
With 1,000,000 Numeraire now issued, the data scientists on Numerai will all prefer those tokens to be worth more rather than less money. They are all incentivized to make them worth more. But a cryptocurrency without a compelling use case is merely a souvenir with no economic value. Numeraire’s economic value comes from its use inside Numerai.
On Numerai, data scientists can never lose money, they can only win bitcoin. But starting today, there is something to lose in order for there to be more to gain.
When a data scientist submits predictions to Numerai, those predictions are validated against historical data, and Numerai makes payouts based on how well the models performed on historical data. But Numerai cares much more about live performance in our hedge fund than backtest performance. Staking Numeraire is a way to incentivize live performance and completely disincentivize overfitting. The staking mechanism solves the biggest problem in quantitative finance; it is an economic forcing function to make backtest performance identical to live performance.
When a data scientist submits predictions, they will be able to stake Numeraire on those predictions. This involves sending Numeraire to Numerai’s smart contract on the Ethereum blockchain. After a period of time, the predictions are analyzed. If the predictions are accurate, the data scientist who staked Numeraire on them will earn money. If the predictions are poor, their Numeraire is permanently destroyed.
With Numeraire, there is now a way for data scientists to express confidence in their predictions the same way that traders do: by deciding how much to stake. Through our proposed staking mechanism, Numerai data scientists stand to gain by building models that perform well on live data, and stand to lose on models that overfit the past.
The value of Numeraire is connected to the stake payouts which will increase over time. Since Numeraire allows data scientists to earn more money by staking it, it is sensible to reason about its economic value. For example, the value of all Numeraire to a data scientist with a perfect model is the net present value of all the future stake payouts by Numerai.
Network effects in capital allocation
Nearly all of the most valuable companies throughout history were valuable through their strong network effects. If there is one motif in American economic history it is network effects. Every railroad made the railroad network more valuable, every telephone made the telephone network more valuable, and every Internet user made the Internet network more valuable.
But no hedge fund has ever harnessed network effects. Negative network effects are too pervasive in finance, and they are the reason that there is no one hedge fund monopoly managing all the money in the world. For perspective, Bridgewater, the biggest hedge fund in the world, manages less than 1% of the total actively managed money. Facebook, on the other hand, with its powerful network effects, has a 70% market share in social networking.
The most valuable hedge fund in the 21st century will be the first hedge fund to bring network effects to capital allocation.
We made a new film about network effects and Numeraire featuring Numerai investors Joey Krug (co-founder of Augur), Juan Benet (founder of IPFS and Filecoin), Andy Weissman and Fred Wilson at Union Square Ventures.
A New Abstraction
The stock market is inefficient with respect to new developments in machine learning because only a small fraction of the world's data scientists have access to its data. Numerai data scientists aren't traders or quants, and they don't want to be. They are experts in statistics, machine learning and artificial intelligence, working as geneticists, physicists, students and professors. They have specialized in building predictive models on data—any data. So we give them stock market data in its purest, most abstract form and let their machine learning algorithms discover its predictive structure.
Assembling a Super Intelligence
Numerai is not a search for the ‘best’ model; it is a platform to synthesize many different, uncorrelated models with many different characteristics. Data scientists compete on the leaderboard but models are ranked and rewarded based on their contribution to the meta model.
A Proof of Intelligence
Nearly all of the most valuable companies and infrastructures throughout history were valuable through their strong network effects. But no hedge fund has ever harnessed network effects. Negative network effects are too pervasive in finance, and they are the reason that there is no one hedge fund monopoly managing all the money in the world. To make Numerai the first hedge fund with positive network effects, we issued a million crypto-tokens to our twelve thousand data scientists to incentivize coordination. Learn more in A New Cryptocurrency For Coordinating Artificial IntelligenceRead our white paper
Numerai’s New Tournament to Crowdsource the Future of the Stock Market
The traditional crowdsourced machine learning tournament depends on a holdout dataset. The holdout data is some historical data known to the tournament organizer and unknown to the data scientists participating in the tournament. Data scientists’ submissions are graded and paid based on their ability to predict this holdout dataset. This creates an incentive to predict the holdout set as closely as possible, but there is no incentive to build models that generalize to the future. Data scientists are being rewarded to predict the past. This incentivizes overfitting, the primary enemy of data-driven endeavors.
In data science, logloss is a standard metric for measuring how good a set of predictions are. Data science competitions use logloss to rank competitors. On each submission, a data scientist is given a public logloss to indicate how good the predictions performed on the public holdout dataset. The major problem with this approach is that getting consistent feedback from the competition enables competitors to tailor their predictions to the feedback itself rather than solving the actual problem. This enables the overfitting that is incentivized by holdout dataset-based rewards.
There are many attempts to mitigate the overfitting that data scientists are incentivized to achieve in this tournament format. Most approaches involve complicating the selection of holdout sets and diminishing the usefulness of the logloss reported to the data scientists. Rather than bring together thousands of data scientists to achieve a good logloss on the past, Numerai’s only interest is to predict the future.
To perfectly align incentives with data scientists, Numerai no longer has a holdout dataset or a leaderboard, either public or private. Rather than hide information from the data scientists, Numerai gives data scientists all known information. Instead of grading data scientists on a fixed set of past data, data scientists are graded on future data once it becomes known. Four weeks after a tournament begins, the actual outcome of what was being predicted is known. Data scientists are then ranked and paid both USD and Numeraire based solely on their ability to predict those four weeks. This makes the overfitting problem the direct adversary of the data scientists.A paid round of the tournament, four weeks after the tournament began. The “Live Logloss” represents how well the model predicted those four weeks.
The above graph shows the logloss performance of an ensemble of the top user predictions in a round of the Numerai tournament. The orange line is the logloss expected of random predictions. Anything below it is a good prediction.
Here, all the data scientists’ predictions on the future for a round of the Numerai competition are compared against a backtest (test logloss). Their backtest logloss and their actual future outcome logloss (live logloss) are very similar, indicating the backtest was a good indication of out-of-sample, future performance.
Rather than devising increasingly complex methods of concealing information to combat overfitting, we’ve crowdsourced the overfitting problem itself. The above graphs show not only that data scientists successfully predicted the future, but that their future success was predictable. Predictable predictions can be leveraged infinitely.
To Better Predict the Future
Now that the data scientists in Numerai’s tournament are focused solely on generalization to the future, we’ve also released a new, human-readable feature to aid building models that are robust through time. The dataset now contains a column with time information that can be used to train models that strive for time-invariance.
Numeraire, our new cryptocurrency to coordinate machine intelligence, will be the final economic incentive layer against overfitting.
Silicon Doesn’t [email protected]