Predicting the Price of Bitcoin

How would you build a model to do this?

Some great minds are not deterred by difficult questions. Alongside a multitudinous clan of hackers and anon blockchain renegades competing to do the same, one our very own Berkeley & Blockchain Xcelerator graduates, Francesco Piccoli, has taken a recent crack at predicting the worlds crowning digital currency.

Initially exploring this area during research in Berkeley’s Data-X program, Francesco took his expertise to work as a data scientist for He is now the head their research effort to build a BTC price predictor.

There’s just so much open data that is not being used.

That is the first thing he said when we sat down. And while simple, it’s quite true. Just think about it.

“We like this stock,” “gamestonk,” “GME to the moon 🚀,” or “hodl” are memes to most. But to others, they are bytes of linguistic & cognitive data, relevant price information, injected into social networks and thus the minds of others, undoubtedly influencing Bitcoin’s plummets & surges. Attempts to quantify the strength and frequency of this data can be made by scraping comment counts, upvotes, and clicks. One can’t afford to dismiss this when attempting to predict a definitionally unpredictable security.

Normally traders and hedge funds use price and volume data. They find statistical trends in stock price and volume. These are fundamentals of stocks…

Can you do any fundamental or technical analysis when bitcoin shoots up $35k to $43k after a single Elon Musk tweet? You can’t, you have to use social media, not data used by investors.

Where do you start?

Well, Bitcoin fundamentals are different from traditional asset fundamentals. No shares, no P/E ratios, no balance sheets. The approach to modeling a price predictor for something so volatile thus strikes a balance between understanding financial markets and understanding how people make decisions.

The latter is especially important in markets where individuals drive large amounts of market volatility. In these markets, social behavior is at play. The behavioral economics view says we are not “rational agents”, mainly due to cognitive biases, overconfidence, overreaction, information bias, and many more faults in decision making. This view is in conflicts with many economic models including the efficient market hypothesis.

Given the recent salience of social behavior in markets (e.g. the Gamestop fiasco) I had to ask Francesco: what are you feeding your model?

Data from Wikipedia, that’s one route. How many people are just finding out about say Ethereum or Bitcoin? Then there’s Reddit data, which takes many different forms. Getting data from /r/bitcoin, /r/eth, /r/wsb etc. You have posts & comments, so an attempt to take the volume of these variables as an input into the model is the first step.

Beyond this is the use of natural language processing, developing a sentiment prediction between -1 (negative comment) and 1 (positive comment) to try and understand how people feel about this.

The last input is doing the same with a Twitter model. Anything from the past several years with a #Bitcoin hashtag. What does it all mean? There’s so much information, you can make multi-year correlations.

Over the years one can imagine tapping into periodic sentiments of hype and despair cycling like clock-work.

What about fundamentals?

So far, a set of weights generated from social network mining are core inputs to the architecture of Francesco’s machine learning models. But up until this point, all the information mentioned has been highly fallible speculation; commentary and hubris formed by flawed human heuristics.

What about the rigorous mathematical, algorithmic, and statistical modeling involved in such a prediction?

Transaction count, transaction value, fees paid, active addresses, hash rates, market capitalization, trading volume, stock-to-flow ratios, circulating supply and inflation indicators are just a few of the building blocks for a fundamental bitcoin analysis. If you wanted to go full-Citadel you could even start incorporating weather patterns.

Brilliantly, these aspects are not missing from Francesco’s approach to prediction.

There are many more inputs. Past prices, volume, and AnChain data. I am also measuring the balances and info of different entities on the blockchain. Whales, for example, are addresses with massive bitcoin holding balances. They are huge forces in the market.

All of these hold weight in the models. And there are several models being built concurrently.

Francesco laid out that last point with an added caveat:

The training problem here is that the past is not a reflection of the future, and these markets are especially vulnerable to irrational behavior.

Francesco’s predictor is backed by the blockchain analytics company, whose data adds a secret sauce to this prediction effort. The company as a whole aims to bring transparency to the blockchain ecosystem by working on cybersecurity, ML, anti-money laundering, and KYC products. For years they have been collecting large datasets on both bitcoin fundamentals and social sentiments regarding cryptocurrencies as a whole.

After learning about the time and resources being put into this prediction, I had to ask Francesco one final question…

Can I go use the predictor?

To my utter surprise, he said yes. Seriously. The the predictor Francesco built is actually available to use. While it is still a beta version intended to test market/business opportunities I encourage readers to go play around with it. Again, his model not only scrapes networks for social sentiments but combines this information with fundamental quantitative analysis.

The current prediction? Up 10.29% by tomorrow. Maybe it’s time to go buy…