ok, so firstly,
all of the papers I found through Google search and Google scholar. Google scholar doesn't actually have every research paper so you need to use both together to find them all. They were all found by using phrases like "predict bitcoin" or "predict stock market" or "predict forex" and terms related to those.
Next,
I only tested papers written in the past 8 years or so, I think anything older is just going to be heavily Alpha-mined so we can probably just ignore those ones altogether.
Then,
Anything where it's slightly ambiguous with methodology, I tried every possible permutation to try and capture what the authors may have meant. For example, one paper adds engineered features to the price then says "then we ran the data through our model" - it's not clear if it means the original data or the engineered data, so I tried both ways. This happens more than you'd think!
THEN,
Anything that didn't work, I tried my own ideas with the data they were using or substituted one of their models with others that I knew of.
Now before we go any further, I should caveat that I was a profitable trader at multiple Tier-1 US banks so I can say with confidence that I made a decent attempt of building whatever the author was trying to get at.
Oh, and one more thing. All of this work took about 7 months in total.
Right, let's jump in.
So with the papers, I found as many as I could, then I read through them and put them in categories and then tested each category at a time because a lot of papers were kinda saying the same things.
Here are the categories:
- News Text Mining. - This is where they'd use NLP on headlines or the body of news as a signal.
- Social data - Twitter Sentiment/Google Search/Seeking Alpha. Again, some were NLP, for google trends they just used the data.
- Technical Analysis & Machine Learning together. Most of these would take the price, add TA features, then feed into a ML model.
- Other machine learning (as in, not using TA). Just using the price and some other engineered features.
- Analyst Recommendations. Literally just taking the recommendations from banks/brokers and using that as the signal.
- Fundamental data. So ratios from the income statement/balance sheet,
Results:
Literally every single paper was either p-hacked, overfit, or a subsample of favourable data was selected (I guess ultimately they're all the same thing but still) OR a few may have had a smidge of Alpha but as soon as you add transaction costs it all disappears.
Every author that's been publicly challenged about the results of their paper says it's stopped working due to "Alpha decay" because they made their methodology public. The easiest way to test whether it was truly Alpha decay or just overfitting by the authors is just to reproduce the paper then go further back in time instead of further forwards. For the papers that I could reproduce, all of them failed regardless of whether you go back or forwards. :)
Now, results from the two most popular categories were:
- *Social data.*A lot of research papers were extensions of or based off of a paper by Johan Bollen called "Twitter mood predicts the stock market". It literally has 3,955 citations and is complete and utter horse shit; the paper is p-hacking to the extreme. Not only could I not reproduce the results, but given the number of sentiment indicators he uses I regularly found correlations between sentiment and my data based on how I engineered it. None of these correlations held over longer time periods. Every paper that's a derivative of this one or cites it has the same issues.
- *Technical analysis & machine learning.*Every paper would do something along the lines of.. take past price data for some asset (stocks, forex), then add technical analysis indicators as "features". Then either they'd run through a feature-selector that figures out the best features then put the best ones into a model OR they'd dump this data straight into the model and afterwards select the subset of instruments that it "worked" on. None of these would hold if you k-fold test them or test on different subsets of data outside of the ones used in the paper. The results are always based off of selecting favourable subsets of data.
The most frustrating paper:
I have true hate for the authors of this paper: "A deep learning framework for financial time series using stacked autoencoders and long-short term memory". Probably the most complex AND vague in terms of methodology and after weeks trying to reproduce their results (and failing) I figured out that they were leaking future data into their training set (this also happens more than you'd think).
The two positive take-aways that I did find from all of this research are:
- Almost every instrument is mean-reverting on short timelines and trending on longer timelines. This has held true across most of the data that I tested. Putting this information into a strategy would be rather easy and straightforward (although you have no guarantee that it'll continue to work in future).
- When we were in the depths of the great recession, almost every signal was bearish (seeking alpha contributors, news, google trends). If this holds in the next recession, just using this data alone would give you a strategy that vastly outperforms the index across long time periods.
Hopefully if anyone is getting into this space this will save you an absolute tonne of time and effort.
So in conclusion, if you're building trading strategies, simple is good :)
Also one other thing I'd like to add, even the Godfather of value investing, the late Benjamin Graham (Warren Buffet's mentor) used to test his strategies (even though he'd be trading manually) so literally every investor needs to backtest regardless of if you're day-trading or long-term investing or building trading algorithms.
EDIT: in case anyone wants to read more from me I occasionally write on medium (even though I'm not a good writer)
No comments:
Post a Comment