The post below, plus all of the talking heads you hear/see in the media, inspired me to actually study the numbers and really see - what the hell is wrong with C.C. Sabathia? My limited knowledge of baseball statgeekery tells me he's pretty damn unlucky (high BABIP). Of course, this is given he still has the skills that won him the Cy Young last year, or even that of an average major league pitcher. His BABIP in 2007, which is what this analysis is focused around, was .314. The major league average this year is .288. This is purely analytical: I haven't seen him throw once this year.
| Date | Opp | BAS | IP | H | SO | HR | BB | BABIP | WHIP | ER | B+W ER |
| 31-Mar | CHW | 0.222 | 5.33 | 6 | 7 | 2 | 3 | 0.308 | 1.69 | 5 | 5.0 |
| 5-Apr | @OAK | 0.251 | 5.33 | 6 | 2 | 1 | 4 | 0.278 | 1.88 | 4 | 4.7 |
| 11-Apr | OAK | 0.257 | 3.33 | 12 | 4 | 0 | 2 | 0.923 | 4.20 | 9 | 9.4 |
| 16-Apr | DET | 0.255 | 4 | 8 | 1 | 2 | 5 | 0.353 | 3.25 | 9 | 8.1 |
| 22-Apr | @KCR | 0.237 | 6 | 4 | 11 | 0 | 2 | 0.400 | 1.00 | 0 | -0.1 |
| 27-Apr | NYY | 0.254 | 8 | 4 | 8 | 1 | 1 | 0.158 | 0.63 | 1 | 0.8 |
| 3-May | KCR | 0.262 | 6.33 | 10 | 4 | 0 | 1 | 0.417 | 1.74 | 4 | 4.1 |
The field BAS is 'Batting Average Split' - what that team has done on the home or on the road so far in 2008. B+W ER is the result of a prediction formula fit to BAS and WHIP with ER as the result (stat geeks: Rsq=.98 RsqA=.88). I haven't tried this formula on any other pitchers yet, or more data, but it seems to work well with this set.
| Date | H | WHIP | ER | Ha | WHIPa | B+W ERa |
| 31-Mar | 6 | 1.69 | 5 | 4.3 | 1.37 | 2.0 |
| 5-Apr | 6 | 1.88 | 4 | 5.5 | 1.78 | 4.4 |
| 11-Apr | 12 | 4.20 | 9 | 3.7 | 1.72 | 4.1 |
| 16-Apr | 8 | 3.25 | 9 | 5.5 | 2.62 | 6.5 |
| 22-Apr | 4 | 1.00 | 0 | 2.9 | 0.81 | -1.3 |
| 27-Apr | 4 | 0.63 | 1 | 5.8 | 0.85 | 1.5 |
| 3-May | 10 | 1.74 | 4 | 6.9 | 1.25 | 3.4 |
Actually doing the work and regressing the BABIPs in C.C.'s games so far to the league average of .288, we can get an adjusted number of hits (Ha) and therefore WHIP and B+W ER. However, we'd still be missing something.
| Date | IP | WHIP | ER | WHIPa | B+W ERa | IPa50 | WHIPa50 | B+W ERa50 | IPa75 | WHIPa75 | B+W ERa75 |
| 31-Mar | 5.33 | 1.69 | 5 | 1.37 | 2.0 | 5.3 | 1.37 | 2.0 | 5.7 | 1.29 | 1.2 |
| 5-Apr | 5.33 | 1.88 | 4 | 1.78 | 4.4 | 5.3 | 1.78 | 4.4 | 5.3 | 1.78 | 4.4 |
| 11-Apr | 3.33 | 4.20 | 9 | 1.72 | 4.1 | 4.7 | 1.23 | 2.9 | 5.3 | 1.08 | 2.5 |
| 16-Apr | 4 | 3.25 | 9 | 2.62 | 6.5 | 4.3 | 2.42 | 6.0 | 4.3 | 2.42 | 6.0 |
| 22-Apr | 6 | 1.00 | 0 | 0.81 | -1.3 | 6.0 | 0.81 | -1.3 | 6.0 | 0.81 | -1.3 |
| 27-Apr | 8 | 0.63 | 1 | 0.85 | 1.5 | 7.7 | 0.88 | 1.6 | 7.3 | 0.92 | 1.7 |
| 3-May | 6.33 | 1.74 | 4 | 1.25 | 3.4 | 6.7 | 1.19 | 3.3 | 7.0 | 1.13 | 3.3 |
Each one of those hits that "Shouldn't Be" dropping in should actually turn into an out. That should prolong the start (IP), albeit not necessarily at a 1:1 ratio. IPa50 is the adjusted number of IP if 50% of the "new outs" go into prolonging the start, and IPa75 is the same for 75%. At 50% his season total of IP shoots up to 40 from 32.
| IP | BB | K |
| 2008 | 32 | 18 | 37 |
| Adj1 | 40 | 22 | 45 |
| Adj2 | 43 | 24 | 48 |
The last bit we can adjust is to keep the same K:BB ratio for IP - as well as adding in more IP for the number of Ks missing from the additional 8 innings (Adj1). Then we have to add a few more IP for those new Ks, and a few more Ks for those IP - the last line (Adj2) is basically an estimate (moreso than the rest of this post).
| 2008 Season | ERA | K | BB | WHIP |
| Unlucky C.C. | 7.52 | 37 | 18 | 1.77 |
| Statistical C.C. | 3.95 | 48 | 24 | 1.36 |
Lady Luck is a bitch.