Friday, May 20, 2022

Putting the Odds in Your Favor - Using Data Science to Identify the Bottom

Hello everyone! I come today with something that's really worth paying attention to and studying. I mean it. If you're a trader, set some time aside to review this, because you won't see this anywhere else, and it's high signal/noise.

Remember last year when I kept hammering on the "regression analysis," and how that helped me nail the top, to the day? Well I've done the same analysis, this time to try and identify the bottom.

Now this is a long post, so if you don't read all of, I get it. But at least review the next two sections, and the chart.

https://www.tradingview.com/x/f7OkX5Pq/

"What is a regression analysis anyways? And why should I care?"

The idea is to find a best fit equation that models a dependent variable (like Price) against one or more independent variables (like Time). It's one of those fundamental data science tools you would use to study any process. In this instance, we want to extrapolate into the future, so that we can make data based predictions on when to buy/sell, at least in a macro sense.

This isn't the only statistical tool available for price analysis, but I would dare say it's one of the most fundamentally important ones. This is Data Science 101, and when used correctly, can be very powerful. Just to be clear, this isn't arbitrary; isn't made up on the fly; it isn't subjectively based on some interpretation of hand drawn lines / patterns. It's fundamentally MORE important and to be weighted higher than any interpretation of moving averages, RSI, bollinger bands, or other "TA" signals.

That's not to say it's perfect or will definitely hold. All models are wrong, some are useful. But it does give us the best, (mostly) objective picture that can be gleaned by adhering to the principles and methodologies of one of the most important and fundamental data science tools that we have ... rather than the (mostly) subjective interpretations of TA.

I'm fairly confident that these solutions for the upper and lower bounds of Bitcoin price can be considered canonical (meaning I've correctly applied the methodology of regression analysis).

If you read nothing else, this is the distilled version of this chart/analysis

  • Blue Line - Is the regression for the blowoff tops. This is (basically) the same equation that I showed yall last year. It uses the USD tops from 2010, 2011, 2013, and 2017 (circled in blue). Consider this to be the absolute highest possible price for Bitcoin at any point in the future. [Please note, that I haven't update the analysis to include Apr 14, 2021. I want to illustrate how well this called the top last year, without even including the Apr 2021 data].

  • Red Line - This can be thought of in two ways: One - as the secondary capitulation during a bear market. But you could also consider it to be the absolute lowest possible price for Bitcoin at any point into the future.

  • Orange Line - This models only the first capitulation lows. Remember, it's variable over time, so the further we move into the future, the higher the first capitulation low will be.

You can see that I've also provided the color coded equations, and sample dates for future extrapolations. Each of these lines is for the start of a quarter/year (Jan, Apr, July, Oct).

Now remember that these models don't give us a specific date for when we should expect to touch the lower line. All it does is give us an idea of what the lowest possible price might be, depending on which future date we're looking at.

Going deeper into understanding this model

The reason we have two lines on the bottom is simple. Bitcoin typically exhibts two capitulations during a bear market. You can clearly see that if we only model the first capitulation (orange), that price still drops below the model.

Since the goal is to also provide a full lower bound equation, beyond which price has never fallen, we actually have to run an iterative process. It looks like this: - Model a best fit equation for all price data. - Then toss out anything above it, leaving only the lower portion. - Create another best fit equation, and again, remove the data points above it.

A few iterations of this, and it quickly becomes apparent which points diverge the most. You can then filter everything but the "lowest" (most divergent) points for each bear market, and run the regression that produces the red line on the bottom.

If this doesn't fully make sense to you, think of it this way: Time is a factor. Hypothetically, if BTC sat at $30k for the next decade, it technically wouldn't be a lower low in terms of USD. But, because it happened so far in the future, it would intuitively feel like a divergence from the exponential price history.

That's really what the red line represents - The largest divergences from the overall exponential price curve. Luckily, even the worst divergences are still best modeled with their own exponentially rising curve.

And that's why models like this are powerful. They capture an intuitive concept, and yield a solid, objectively derived model, that you can use for making informed decisions on when to sell the top, and when to buy the bottom. Even if this model isn't 100% accuate (it's not, lol), it's VERY likely to get us in the ballpark.

But didn't people like Plan B and the Rainbow Log Regression already do this??

Yes and no. If you really want to understand why this is a unique analysis, reasonably correct (canonical), and theirs are not, it requires a bit more depth. Nerd territory alert ...

First and foremost, lets understand the limitations and risks of what we're trying to do here. Extrapolation is always difficult and prone to error. Even the best analysis will have some divergence, if for no other reason than the inevitability of noise. But moreover, we're attempting to model exponential data.

So any small procedural or methodological errors will quickly compound over time; producing signficantly divergent results, to the point of diminishing usefulness.

Mathematicians and data scientists of the past century have developed a fairly well defined set of mathematics and methodology to check your work, and ensure that you're in fact producing the best modeling equation that statistics could dictate, while also understanding the limitations and error range of the model. Here are some of those things: - Checking p-values to ensure your equation is statistically signficant. - Examining residuals plots to ensure your model is sufficiently complex to model all of the available signal. - Comparing adjusted r-squared between models to ensure that your equation is not overly complex and not just modeling random noise. - Running anova between models to determine if there is statistically significant improvement, and not just due to chance. - Understanding your dataset, identifying bad/noisy data, outliers, and knowing when it's appropriate to exclude data. - Understanding how to apply these concepts to answer specific questions and aspects about a dataset.

So what's the difference between what you're looking at here and guys like Plan B, the Rainbow Regression, or even Benjamin Cowen? It's clear that those guys didn't bother plotting their residuals, or else they would've seen the drift in their models. And residuals plots are probably one of the most fundamental checks you need to do, to make sure you have a valid model. Which tells me that they probably didn't run any of the other statistical checks, and probably are mis-applying these tools in other ways.

Now I like Ben, and he's gotten the closest of anyone I've seen so far. He's not arrogant, and takes a humble, simple approach. If anyone knows him personally and feel like pointing him my way, feel free to do so. He will continue updating his model with new data, and it will have some level of usefulness ; but until he improves the model itself, there is going to be a propensity for error at the edges, and the further you try to extrapolate.

Contrast him with a guy like Plan B, who is kind of arrogant. He'll continue to issue ad hoc adjustments and post-hoc justifications (aka, "black swans" and "unforseen events") to try and externalize the blame for why his provably bad model continues diverging ... rather than going back to the drawing board and really diving deep into proper regression analysis.

One Last Verification Piece of This Model

One extra step that I took, was to examine only the data from 2016 and before and apply the same methdology to see how well this would have worked for the last bull/bear cycle in '17 and '18. In other words, pretending that we were back in time, say Jan 1 2017, I wanted to see how accurate this same methodology would have been back then.

It was accurate to within about 7%, for both the top, and the bottom. I'm sure I don't have to explain that on a process which rose 10,000% from 2015 low to 2017 high, and then dropped by 6x from the high to the 2018 capitulation ... this was phenomenally accurate. Nothing else I've seen anywhere, comes remotely as close as this.

And again, this same methodology was what I used when calling the Apr 14th top last year, and it was off by less than 1%.

Okay that's all folks

If anyone wants to do a deeper dive on my methodology, hit me up. We can do a call, screenshare, and I can share my r-studio scripts, and include as many Monero peeps as want to join. I know there's at least a couple statistics nerds out there who can validate my work.


No comments:

Post a Comment