Monday, June 2, 2008

BABIP Update

Update to the post a few below, to see how that group of pitchers did in May:

Name Team Apr IP Apr ERA Apr WHIP Apr BABIP May ERA May WHIP
S.Downs TOR 13.3 3.38 1.65 0.351 0 0.64
J.Bergmann WAS 12.3 11.68 1.78 0.376 1.3 0.9
J.Valverde HOU 17 5.29 1.47 0.341 1.29 0.93
J.Hanrahan WAS 16.7 5.4 1.86 0.38 2.37 0.95
B.Logan CHI-A 12 3.75 1.5 0.357 0 1.08
C.C.Sabathia CLE 38.3 7.51 1.77 0.361 2.44 1.11
C.Billingsley LA 33.7 4.54 1.57 0.344 1.89 1.13
M.Delcarmen BOS 12.3 7.3 1.7 0.36 3.27 1.26
J.Burton CIN 14.7 4.3 1.43 0.36 2.6 1.44
P.Feliciano NY-N 10 0.9 1.8 0.344 3.75 1.44
R.Betancourt CLE 12.7 6.39 1.58 0.358 4.91 1.45
R.Ohlendorf NY-A 21 5.14 1.48 0.333 6.94 1.89
J.Albaladejo NY-A 10.3 5.23 1.45 0.356 DL DL

Tuesday, May 20, 2008

Colbert Celebrates Craft Beer Week

Thursday, May 8, 2008

Regression to the Mean

I have read several posts by various authors that all say the same thing - pitchers with either too high or too low BABIP (Batting Average of Balls In Play) will 'regress to the mean', meaning they'll get better or worse. I've never seen a single one of these authors try to regress a pitcher or two and alter their stats - or rerank the whole league based on this "regression." There are other stats that account for fielding (FIP) or ballpark factors (ERA+) but none to my knowledge that even out BABIP. Keep in mind - major league pitchers are not all equal. They might appear that way, but it's mostly because the talent level is very high amongst all players (except for Bobby Ayala).

In most fantasy leagues, WHIP and ERA are stats to be concerned about. In these fits below, I took stats through May 7th and isolated just the starters (6+ GS).

The fit is close to perfect - It could be totally perfect if we substituted HR/9 for HR/FB (we have to add the HRs back in to the subtracted factor missing from BABIP) but then we wouldn't be able to 'regress' that stat.

ERA is trickier to model - at this point there are still some statistical outliers (yay Cliff Lee) that throw the fit off. Even excluding these guys, the predicted values can get out of control. In this fit I've added GB/FB.

The ML BABIP so far this year is .292, and the HR/FB % is 9.6. Here are your adjusted leaderboards:



WHIP WHIPf WHIPm
Cliff Lee 0.604 0.611 0.934
Roy Halladay 0.982 1.012 1.093
Dan Haren 0.992 0.977 1.094
Johan Santana 1.014 1.024 1.102
Tim Hudson 1.047 1.039 1.160
Jesse Litsch 1.35 1.346 1.177
Brett Myers 1.364 1.315 1.180
John Danks 1.038 1.036 1.186
Jake Peavy 1.027 1.029 1.196
* WHIPf - model fit; WHIPm model adjusted to league average

Name ERA ERAf ERAm
Edinson Volquez 1.06 1.23 2.38
Johan Santana 2.91 2.81 2.51
Tim Lincecum 1.49 2.10 2.66
Roy Oswalt 5.33 5.08 2.72
Felix Hernandez 3.04 3.55 2.89
Jair Jurrjens 2.84 1.98 2.96
Jake Peavy 2.22 2.41 2.97
Cliff Lee 0.81 1.00 2.97
Brandon Webb 2.49 2.28 3.11
* ERAf - model fit; ERAm model adjusted to league average

Cliff Lee is legitimately one of the top-10 pitchers in MLB, probably top-5. Jurrjens is a surprise - but the ERA formula doesn't look to fit him well.



WHIPd
1 Cliff Lee 54.6%
2 Ryan Dempster 35.6%
3 Ben Sheets 34.8%
4 Shaun Marcum 34.0%
5 Adam Wainwright 24.6%
6 Tim Wakefield 19.5%
7 Brandon Webb 19.5%
8 Zack Greinke 17.3%
9 Jake Peavy 16.4%
10 Ian Snell -16.3%
58 Ian Snell -16.3%
59 Clay Buchholz -16.8%
60 Mark Redman -17.3%
61 Kevin Millwood -17.6%
62 Roy Oswalt -18.9%
63 Matt Chico -22.4%
64 Manny Parra -22.6%
65 C.C. Sabathia -25.4%
66 Andrew Miller -31.5%
67 Bronson Arroyo -33.9%



ERAp ERAe
1 Cliff Lee 266.5% 0.23
2 Edinson Volquez 124.5% 0.16
3 Carlos Zambrano 111.3% 0.72
4 Zack Greinke 111.2% 0.69
5 Tim Lincecum 78.3% 0.41
6 Ben Sheets 63.7% 0.11
7 Adam Wainwright 52.3% 0.09
8 Tim Wakefield 50.2% 0.28
9 Ryan Dempster 50.1% 0.06
10 Vicente Padilla 39.9% 0.48
58 Jered Weaver -22.7% -0.06
59 Clay Buchholz -23.5% -0.08
60 Brett Myers -23.7% 0.11
61 Mark Buehrle -31.3% -0.05
62 Matt Chico -37.3% -0.05
63 Mark Redman -43.7% -0.26
64 Andrew Miller -44.4% -0.06
65 Roy Oswalt -49.0% -0.05
66 C.C. Sabathia -49.3% -0.09
67 Bronson Arroyo -56.3% 0.04
* Bold indicates the error might be pretty high in that prediction.

The most artificially inflated, and deflated, numbers in the game. Negative means you're playing better than ERA and WHIP indicate, and it looks like Bronson Arroyo is the poster child of screwedism thus far (due for the biggest correction in a good way). I'm pretty sure I haven't been reading any fantasy blogs telling you to go pick that dude up.

Wednesday, May 7, 2008

So...

They must have read my blog and put Hanrahan in a pressure situation. Oz-obviously he blew it.

Stats don't predict whether you have a sack or not.

C.C. Sabathia Is One Unlucky S.O.B.

The post below, plus all of the talking heads you hear/see in the media, inspired me to actually study the numbers and really see - what the hell is wrong with C.C. Sabathia? My limited knowledge of baseball statgeekery tells me he's pretty damn unlucky (high BABIP). Of course, this is given he still has the skills that won him the Cy Young last year, or even that of an average major league pitcher. His BABIP in 2007, which is what this analysis is focused around, was .314. The major league average this year is .288. This is purely analytical: I haven't seen him throw once this year.

Date Opp BAS IP H SO HR BB BABIP WHIP ER B+W ER
31-Mar CHW 0.222 5.33 6 7 2 3 0.308 1.69 5 5.0
5-Apr @OAK 0.251 5.33 6 2 1 4 0.278 1.88 4 4.7
11-Apr OAK 0.257 3.33 12 4 0 2 0.923 4.20 9 9.4
16-Apr DET 0.255 4 8 1 2 5 0.353 3.25 9 8.1
22-Apr @KCR 0.237 6 4 11 0 2 0.400 1.00 0 -0.1
27-Apr NYY 0.254 8 4 8 1 1 0.158 0.63 1 0.8
3-May KCR 0.262 6.33 10 4 0 1 0.417 1.74 4 4.1

The field BAS is 'Batting Average Split' - what that team has done on the home or on the road so far in 2008. B+W ER is the result of a prediction formula fit to BAS and WHIP with ER as the result (stat geeks: Rsq=.98 RsqA=.88). I haven't tried this formula on any other pitchers yet, or more data, but it seems to work well with this set.

Date H WHIP ER Ha WHIPa B+W ERa
31-Mar 6 1.69 5 4.3 1.37 2.0
5-Apr 6 1.88 4 5.5 1.78 4.4
11-Apr 12 4.20 9 3.7 1.72 4.1
16-Apr 8 3.25 9 5.5 2.62 6.5
22-Apr 4 1.00 0 2.9 0.81 -1.3
27-Apr 4 0.63 1 5.8 0.85 1.5
3-May 10 1.74 4 6.9 1.25 3.4

Actually doing the work and regressing the BABIPs in C.C.'s games so far to the league average of .288, we can get an adjusted number of hits (Ha) and therefore WHIP and B+W ER. However, we'd still be missing something.

Date IP WHIP ER WHIPa B+W ERa IPa50 WHIPa50 B+W ERa50 IPa75 WHIPa75 B+W ERa75
31-Mar 5.33 1.69 5 1.37 2.0 5.3 1.37 2.0 5.7 1.29 1.2
5-Apr 5.33 1.88 4 1.78 4.4 5.3 1.78 4.4 5.3 1.78 4.4
11-Apr 3.33 4.20 9 1.72 4.1 4.7 1.23 2.9 5.3 1.08 2.5
16-Apr 4 3.25 9 2.62 6.5 4.3 2.42 6.0 4.3 2.42 6.0
22-Apr 6 1.00 0 0.81 -1.3 6.0 0.81 -1.3 6.0 0.81 -1.3
27-Apr 8 0.63 1 0.85 1.5 7.7 0.88 1.6 7.3 0.92 1.7
3-May 6.33 1.74 4 1.25 3.4 6.7 1.19 3.3 7.0 1.13 3.3

Each one of those hits that "Shouldn't Be" dropping in should actually turn into an out. That should prolong the start (IP), albeit not necessarily at a 1:1 ratio. IPa50 is the adjusted number of IP if 50% of the "new outs" go into prolonging the start, and IPa75 is the same for 75%. At 50% his season total of IP shoots up to 40 from 32.


IP BB K
2008 32 18 37
Adj1 40 22 45
Adj2 43 24 48

The last bit we can adjust is to keep the same K:BB ratio for IP - as well as adding in more IP for the number of Ks missing from the additional 8 innings (Adj1). Then we have to add a few more IP for those new Ks, and a few more Ks for those IP - the last line (Adj2) is basically an estimate (moreso than the rest of this post).

2008 Season ERA K BB WHIP
Unlucky C.C. 7.52 37 18 1.77
Statistical C.C. 3.95 48 24 1.36

Lady Luck is a bitch.

Tuesday, May 6, 2008

K/BR

In my attempt to be really innovative, insightful, and dig my fantasy baseball team out of the gutter, I'm looking for An Edge. I guess that means I'm trying to find players that noone else sees coming. For some reason I find this easier to do with pitchers - and actually - when you look around the fantasy baseball Blog-O-Sphere you'll see that many people find they have similar insights. Knowing that noone with the exception of Donald in my fantasy league read this blog, I'm essentially posting this here for a record to see if this shit has any merit whatsoever. I figure, hey, I do statistics for a living, why not?

One stat that many look at is BABIP (batting average on balls in play) which, interestingly enough, are mostly similar for pitchers but can vary more wildly for batters. Along with K:BB, this lets you 'see through' the False Golden Idol that is ERA.

So, I'll try a couple of things in this blog soon, trying to target people noone are talking about. First is K/BR, strikeouts per baserunner. K/BR>=0.5, IP>=10, and BABIP>=.330.

Name Team G IP ER HR K:BB GB/FB ERA WHIP K/9 BABIP K/BR
J.Hanrahan WAS 13 16.7 10 1 24:14 1.60 5.40 1.86 12.9 0.380 0.774
J.Bergmann WAS 3 12.3 16 5 12:2 0.56 11.68 1.78 8.8 0.376 0.545
C.C.Sabathia CLE 7 38.3 32 6 37:18 1.05 7.51 1.77 8.7 0.361 0.544
J.Burton CIN 13 14.7 7 3 21:5 1.30 4.30 1.43 12.9 0.360 1.000
M.Delcarmen BOS 16 12.3 10 2 12:5 2.00 7.30 1.70 8.8 0.360 0.571
R.Betancourt CLE 14 12.7 9 4 13:2 0.71 6.39 1.58 9.2 0.358 0.650
B.Logan CHI-A 14 12 5 0 9:3 0.63 3.75 1.50 6.8 0.357 0.500
J.Albaladejo NY-A 5 10.3 6 1 11:3 1.00 5.23 1.45 9.6 0.356 0.733
S.Downs TOR 14 13.3 5 1 14:7 2.38 3.38 1.65 9.5 0.351 0.636
C.Billingsley LA 8 33.7 17 2 44:21 1.38 4.54 1.57 11.8 0.344 0.830
P.Feliciano NY-N 17 10 1 1 9:6 1.22 0.90 1.80 8.1 0.344 0.500
J.Valverde HOU 15 17 10 4 22:6 0.58 5.29 1.47 11.6 0.341 0.880
R.Ohlendorf NY-A 11 21 12 1 21:9 1.53 5.14 1.48 9.0 0.333 0.677

Of the starters:
  • Bergmann was deactivated
  • Sabathia has almost 1HR per start but still has a good GB/FB ratio (probably due for a big correction), plus he's owned in my league by the dude in first
  • I drafted/own Billingsley.
Others:
  • Hanrahan looks due for a big correction and has great numbers otherwise. Too bad they don't use him in any sort of valuable position. He has pitched in zero wins this year.
    His average fastball his last appearance (May 2) was 96.7mph, topping out at 98.
  • Burton isn't owned in my league either, his K/BR is the highest of the bunch. Max fastball was at 94 in his last start.
  • Russ Ohlendorf only plays in losses and blowouts, and he has kind of slow stuff as well.
  • Scott Downs is replacing Brian Tallett (also of Toronto) on my roster. He pitches in more games that matter lately, has a 92mph FB, 83mph SL, and 75mph CB. He gets way more groundballs than flyballs, and has only given up 1HR so far this year.
I'll try to revisit these guys in a month.

Wednesday, April 30, 2008

Mos Def: I do believe in the bigfoot

Tuesday, April 29, 2008

Proving the Devil Invented Evolution