Monday, March 4, 2019

PGA: No Frills DFS Data - Honda Classic Recap & Discussion of Golf Metrics

So, this slate was fantastic.

I had a player pool of 22 guys and only 3 missed the cut with another as an MDF. While I only had 1 guy in the top 5 this time, it was one of my most exposed players in Lucas Glover. I had 3 more at T9 so 4 of the top 11 guys and a bunch more T20 or better. I didn't have any lineups packed with the top 5 so didn't have any huge individual scores but when most lineups went 6/6 or 5/6 with a bunch of T20 or better players, it's always going to be a very good week despite not hitting yahtzee.

Again, to recap, here was my player pool in order of exposure.

T30 Justin Thomas

T4 Lucas Glover

MDF Graeme McDowell

T9 Sergio Garcia

T59 Zach Johnson

T36 Daniel Berger

T16 Michael Thompson

T59 Vaughn Taylor

T36 Gary Woodland

T51 Russell Knox

CUT Adam Scott

T20 Chesson Hadley

CUT Luke List

T16 Billy Horschel

T20 Brian Stuard

T36 Byeong Hun An

CUT Cameron Smith

T36 J.T. Poston

T9 Jason Kokrak

T9 Jim Furyk

T20 Matt Wallace

T20 Talor Gooch

My model once again pushed Furyk (it tends to really like him, Chez Reavie and Phil Michelson) but this time it wasn't overboard about it. At the end I didn't use him in any of the purely model driven lines but ended up trusting the model when I created the "homer line" where I choose 1-2 guys I really want added in and exclude a few I'm already heavy on so I could jam in Adam Scott again and the lineup said fill it out with Furyk. Was pleasantly surprised with a T9 from the guy and it will give me a little bit more faith when the model recommends him.

Now back to Adam Scott, this is why I limit my ability to directly construct a lineup to only 1 dart. The only things in Scott's favor were course history, tout coverage and Vegas odds. Everything else said he's a fine golfer but way too over priced and since my model works rather holistically, all those things were already accounted for so I already had a smittering of him out there. Yet I bought into the narrative and jammed him in there. I don't regret the decision, I'd do it again. But this is exactly why I build a model, because if I built my 10x gpp lineups by hand, I'd likely have gone with him in a lot more lineups because his narrative was very compelling. The other guys to miss the cut in Smith and List, well, I stand by those choices as well. Half the field needs to be cut, so even if everyone golfed the game of their lives you'd still get half the field get cut despite hitting peak form. Kind of like if everyone went to an Ivy League then we'd have Yale PhDs flipping burgers kind of scenario. In short, don't worry about it. Even the best golfers will miss the cut.

You may also recall the model was suggesting Ortiz and Blayne and I vetoed them because I didn't feel the data was reliable. They both missed the cut. I would have been about 1/3 exposed to each had I not manually sifted through and error checked my lineups, something I sometimes don't get a chance to do because I didn't start running the model until near lock. It would have been disastrous had I not seen those unfamiliar names and decided to take a closer look.

My cash games went exceedingly well as I chose one of my lineups that did fairly well to use in cash. I cashed in every 50/50 and double up (sometimes outright winning them) and won all but 2 of my h2hs. Ther's a good story here about why, despite that I play most of my volume in cash, that I go with only 1 lineup. There's one specific player I've been matching up with quite a bit. It started out in lower stakes and I believe he's now tilted and trying to recover because he keeps upping the stakes but I keep taking em. This past slate he posted a $100 h2h and I took it. He then matched up with me in another one for $5. He decided to go with 2 lineups, one of them performed pretty poorly, another would have done very well in a GPP. Given how pleased I am writing about this, I bet you can imagine which one of those I lost and which I won. This is why I just create one cash lineup and stick with it because I've been on his side of things in the past. If he wins both then it wouldn't matter, if he loses both then it wouldn't matter. If he loses the $5 wins the $100 it doesn't matter... but if he loses the $100 but wins the $5 then he goes on crazy monkey tilt.

It doesn't matter at all that mathematically speaking it doesn't make a difference (so long as both lineups had equal assumed expectations), emotions still run high in this and unless you're doing very high volume at leveled stakes (not 2 matchups of 20x difference in size) and not going to track the individual results but look at the big picture then it's fine. But nobody does this, we aren't androids, when you win you win, when you lose you lose. This is why although I put way more in cash than gpp and bad cash lineup can sink me, I'm still taking a binary approach with cash games. I'm not taking a 75% indifference with a 25% chance of losing my god damn mind because the h2h that mattered was the one that failed. Fail like a stoic with a single cash lineup that gives 100% indifference.

Now then, some people have been asking me to go into more detail about about the data that use to create the lineups. I'll just reiterate again that I'm never going to explain how the sausage is made. But I will be serving plenty of sausage and give you a general idea what animal it came from.

Today I'm going to talk about specifically how most of my research really demonstrates just how stupid most golf stats are. I really want to be 100% sure and am in the process of scraping an absurdly large database containing several decades. And since I'm doing this on my free time, it'll take some time before I parse and analyze everything. I don't want to make the very bold claims I already believe to true without further studying the matter and really ensuring my thoughts are real and it's not the product of bad calculations or insufficient sample size. But, what I've discovered thus far, is that all those stats are just window dressing. Saying someone led the field shots gained x is fundamentally no different than saying "they did well and had a good tournament." Things like shots gained track results not process. So it's much like tracking wins and rbis. Yes, the best hitters and the best pitchers in baseball often lead the league in those metrics, but we all know why they aren't good predictive tools.

For example, when my beloved Red Sox signed Dante Bichette in 2001, there was all this talk about him having led the major leagues in RBIs the past few seasons. He just had his epic year, two years ago driving in 133 runs and the year before got 90. While he was aging and slowing down, I distinctly remember a lot confusion over why we signed this elite hitter but then used him in a platoon. I'd be at Fenway and as the Red Sox lost, people would openly question the wisdom of having one of the best hitters in the game ride it out on the bench. This was 2 years before Moneyball was published and while front offices knew the reality of the situation (third team in 2 years and out of the league after that season), the average hard core Red Sox fan would just scratch their head wondering why we didn't give Dante a little more of a chance to show he still had it.

I feel this is the situation today with golf and golf statistics while what we have today is an improvement of the past - we take it for granted that it comes with the same authority as so wOBA or usage. We know that the winners won, but we don't know much else and shots gained is basically more or less a fancy way to say someone did a better job. If someone gets a birdie on a par 4, their SG will improve by about... drumroll please... 1. So you could just simply compare scores - IE look at end of tournament standings. Yes, there is definitely some nuance, and I didn't feel like there was some actionable data out there I wouldn't bother with any of this. But I believe that way too much weight is put into this, whether I'm right or wrong, I will follow up on this in much more detail once it's no longer a hunch but rather indisputable. The reason why gathering this data is difficult is that it's restricted - which itself should be a bit of a red flag.

I'll also be reading "Every Shot Counts" soon, which is a book written by the creator of the Shots Gained metric. I really don't want to make any further and sweeping judgements until I read the author's long and detailed explanation of the metric.

But really, we can all see the smoking gun https://registrations.pgatourhq.com/forms/shotlinkintel/ for ourselves to see that the process by which they used to record shots gained has been shuttered. Even prior to them ghosting us, access to the statistics themselves was restricted - you need to apply for access. The twitter account still exists and it's like everyone vanished into thin air, the last tweet https://twitter.com/ShotLink/status/893531791297978368 was well over a year ago and simply a picture of a golf course as if nothing was about the change.

Also, the PGA still insists "All strokes gained statistics are calculated using ShotLink, the PGA TOUR's real-time scoring system powered by CDW. https://www.pgatour.com/news/2016/05/31/strokes-gained-defined.html Since it was discontinued such a long time ago, how exactly is it calculated now? Nowhere have I been able to find this information. I'm not talking conspiracies or anything, they could have a very good data collection system that's much improved over Shotlink, but the very notion that the PGA doesn't even bother telling anyone how the data is collected and yet nobody is asking any questions should tell you this isn't exactly the most objective market.

So basically, I'm very confused by Shots Gained as a metric, can find very little information on it and what I can find is out of date and contradictory and seems to imply it's more or less no different than a nuanced version of looking at the final standings. I want to say it's bullshit, but I'm just reserving final judgement and simply labeling as sketchy for now.

So then we should look at results yeah? Yes, but this is largely what pricing is based upon, so not much of an edge there. So shall we look at ranking? Yes, let's take a look at OWGR.

When I first started with golf, I knew nothing and had nothing to base anything on other than seeing their pricing and recent point accumulations. Since Tiger Woods wasn't playing in that event, it was all entirely new names, just names I'd hear in passing while switching off ESPN as they were starting their golf coverage. So naturally, when I saw each golfer had a world ranking, I viewed that as a cheat sheet. From the very beginning, one of the formulas I've used to develop lineups was as simple as putting together the golfers within budget that collectively had the lowest aggregate world ranking number. Why am I suddenly speaking in such specifics you ask? Because it's a horrible DFS metric and nobody else is doing it (I track gpps lineups to see what others are doing, there are a few of these more simple formulas that pop up periodically, this is not one of them) so it's not exactly as if disclosing this information will make my opponents that much stronger.

My OWGR lineup has in fact been the single worst performing in cash and the 2nd to worst performing in gpps of the dozens of lineup models I have. Thankfully, I don't play it because it's so bad but I keep tracking it and recording how it would have performed just for fun these days. The only lineup that performed worse than the OWGR lineup in GPPS, well that one heavily factors in OWGR as well :). OWGR is just a terrible, terrible metric for DFS. Yes, it will give you the cream of the crop like the Dustin Johnsons, but you can never afford a lineup of Dustin Johnsons, you'll have to start digging deeper and pulling up min priced guys like Satoshi Kodaira - mr bitcoin himself. Someone who if you've been reading my stuff, is the entire reason I stopped playing any lineup that had OWGR as a primary indicator.

Now Satoshi, despite being a pretty horrible DFS play most of the time, is a great example of everything wrong with OWGR. His Fedex Cup rank is currently 160 and has never been better than 93, but his world rank is perplexingly 59. In 2018, he played 18 tournaments and finished under par only twice. He missed more cuts than he made as well. I could be mistaken, but it seems that he got into some majors via a sponsor in 2017 and 2018 and managed to do alright in them. He also ended up winning one of the tournaments he played in last year.

When researching OWGR to figure out how it came about and how it is calculated, I learned a lot. Basically, it's nothing more than party planning. A golf course in Scotland wanted to figure out whom to invite to compete in their tournament and invented the system. It weighs the strength of the field very heavily in rewarding points- and the strength of the field is - yup - you guessed it - determined by people already ranked by the system. So if Dustin Johnson cloned himself and kept playing tournaments exclusive to him and his equally ranked clones, they'd forever hold onto the top rankings. If OWGR was an excel sheet, the creator would get an error popup upon loading it up each day due to circular references. So, Satoshi I'm sure is a great golfer, anyone there should be, but his ranking is very artificially skewered up because he managed to make the cut and finish around 50th in some really packed majors that had a lot of heavy hitters. In fact, the ranking system is so completely absurd, that any millionaire can get themselves world ranked pretty easily. They just need to do something like sponsor a Pro-Am at some odd but counted tour like the Alps Tour and then invite the guys ranked 1st, 2nd and 3rd to compete and filling out the rest of the field with toddlers and yourself. You would be assured a 4th place finish. Yet you didn't beat any of the top 3 golfers in the world. You just beat 100 toddlers. Yet you still get the high ranking because they get 45, 37 and 32 respective points for strength of field, which is greater than if you had a tournament of the golfers ranked 93rd through 200 playing. Finishing 4th behind the only 3 adults and beating 100 toddlers has the same impact as finishing 4th in a field of 107 of the greatest golfers in the world. http://www.owgr.com/about

Finishing 4th and beating 100 toddlers will grant you the same amount of points as finishing 20th at a major. That's how utterly stupid this rating system is. Obviously I'm using some extreme edge cases, it's very likely they would see through that scheme and not count it, but you get the idea of how inconsistent the system is. If you simply altered the PGA tour to the top 3 golfers and then a bunch of amateurs, those amateurs would soon arbitrarily be some of the highest rated in the world themselves, thus feeding itself.

This is why I call my OWGR model Ouroboros https://en.wikipedia.org/wiki/Ouroboros

Dustin Johnson doesn't play defense. He isn't jumping out of the sand trap and blocking your approach shot. Him finishing in front of you has zero impact on how well you performed compared to him. Yet if you simply show up and play in enough events where he easily beats you, you'll end up with a solid world ranking. This is an absurd system. When I researched OWGR, I was simply shocked it was how some random guy created an invitation list for a tournament and because golf feels the need to be so full of tradition they just made that the official world rankings.

Don't get me wrong, the top OWGR guys are all very good DFS plays because they are winners. However, after a certain point you're not dealing with anything at all reliable. I'm not sure at which point it gets diluted, but after a certain point, that metric becomes just as unstable as Bitcoin. I find it very amusing that the indicator that showed me the flaws with OWGR after a certain stage is named Satoshi. I'm also fully aware of how difficult it is to quantify something so intangible as golf. However, there's no doubt in my mind that there must be a significantly better manner than what is currently used.

But, whether or not my hunch is right or wrong, we still have a system where the PGA actively pretends they are counting it all via a long since discontinued system and yet nobody is asking anything about it. That's something everyone should be aware of as they set their lineups.

Good luck everyone. Will dive deeper into the shots gained after I get around to buying and reading the book and finally finish analyzing that data. I could very well come back here in two weeks apologizing for my ignorance that gave me the gall to question such genius. In the meantime, good luck grinding out there and I'll post again in a few days with my player pool for the next event.


No comments:

Post a Comment