Great article as always, although strange to fathom that a 4-5 win season qualifies as "good" season.
One idea regarding the plot: it almost feels like the x-axis should be flipped. It looks like the y-axis represents offense, and so good offense = higher score = to the top of the y-axis. Conversely, the x-axis represents defense, but the good defense = lower Dim1 score = to the left of the chart (hence "good games" go in the top left). Intuitively, it feels like good games should belong in the top right quadrant, and the axis labels could be 'offense' and 'defense' instead of Dim1 and Dim2 (since it's not quite apparent of what a Dim1 = -2.5 score means to me).
We're definitely grading on a curve as to what "The Good" is here, because a good set of grades for ~.500 Cal would look very different from a good set of grades for conference contenders Utah or Washington.
In fact, I originally used PFF's season-level data and the results were much rosier. I'm not sure why the season-level grades are much better than the average across the games. For example, PFF gives the 2023 Cal team a grade over 90 for running even though only a single game has a grade higher than 90.
That's a good point about the plot. I've added an additional paragraph before it to provide a better explanation of how to interpret it. The tl;dr is that the axes represent the the dimensions that capture the most and second-most amount of variation in the data. That allows the plot to provide the most insight into what is differentiating the data points. So there isn't a tangible interpretation of what a specific position in the plot means, but you can start to notice trends across different areas of the plot (like how offense tends to be better on the top than the bottom). Interestingly, the axes can sometimes flip as more data is added because it helps better visualize the new set of data. That happened earlier this year when the y-axis flipped a few games ago (see our first cluster plot as a comparison https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c8ede6c-cd96-41a2-b553-adbee09e8fab_3000x3000.png).
Great article as always, although strange to fathom that a 4-5 win season qualifies as "good" season.
One idea regarding the plot: it almost feels like the x-axis should be flipped. It looks like the y-axis represents offense, and so good offense = higher score = to the top of the y-axis. Conversely, the x-axis represents defense, but the good defense = lower Dim1 score = to the left of the chart (hence "good games" go in the top left). Intuitively, it feels like good games should belong in the top right quadrant, and the axis labels could be 'offense' and 'defense' instead of Dim1 and Dim2 (since it's not quite apparent of what a Dim1 = -2.5 score means to me).
Just a thought!
We're definitely grading on a curve as to what "The Good" is here, because a good set of grades for ~.500 Cal would look very different from a good set of grades for conference contenders Utah or Washington.
In fact, I originally used PFF's season-level data and the results were much rosier. I'm not sure why the season-level grades are much better than the average across the games. For example, PFF gives the 2023 Cal team a grade over 90 for running even though only a single game has a grade higher than 90.
That's a good point about the plot. I've added an additional paragraph before it to provide a better explanation of how to interpret it. The tl;dr is that the axes represent the the dimensions that capture the most and second-most amount of variation in the data. That allows the plot to provide the most insight into what is differentiating the data points. So there isn't a tangible interpretation of what a specific position in the plot means, but you can start to notice trends across different areas of the plot (like how offense tends to be better on the top than the bottom). Interestingly, the axes can sometimes flip as more data is added because it helps better visualize the new set of data. That happened earlier this year when the y-axis flipped a few games ago (see our first cluster plot as a comparison https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c8ede6c-cd96-41a2-b553-adbee09e8fab_3000x3000.png).