Smart Football: More on evaluating the run game

The discussion surrounding evaluating the run game was great. I will have more to add, but I wanted to highlight some of the best commentary. First, I did want to say that my focus was generally on two aspects, and I don't think I made that clear.

One, I really am more interested in running games, or a team's ability to run, than I am in one runningback versus another. I definitely play fantasy football myself, but it's not the reason I get interested in football stats. Instead I want to know how good an offense is, and then secondarily how good a particular play is; whether Barry Sanders or Emmitt Smith is better is usually not a discussion I get into. As a result I don't mind so much that it's hard to disassociate how good a runningback is from how good the line is, or the faking, etc. From an evaluation perspective, if you can analyze one play being better than another, then you can pretty easily ask if it is scheme or execution, and thus concepts or players.

Second, I do prefer to focus on easily observable stats. Some of this is maybe my laziness, but that's one big appeal of yards per carry: I know it has little application on third down. (One yard could be a success if it converts for a first down, and eight yards could be a failure if it was third and ten -- but then what if the draw was a good call rather than an interception or a sack? I digress.) That is just mainly aimed at seemingly interesting stats that would be a practical nightmare, based on every play and then a subjective interpretation of how many guys he bounced off of or his vision and cutback versus contact, etc -- you get the idea.

Anyway, Bill Connelly of Football Outsiders (and RockMNation) had actually discussed this fairly recently:

Regular Varsity Numbers readers have probably become familiar with some of the basic VN concepts, namely PPP (Points Per Play) and the "+". PPP is a measure of explosiveness--the amount of Equivalent Points (EqPts) averaged per play. The "+" number compares an offense's output to the output expected against a given defense, and vice versa. With the "+" number, 100 is average, anything above 100 is good, and anything below 100 is bad.

Points Over Expected

Is there any way to use these concepts to come up with a good rushing measure? Of course! Meet POE (Points Over Expected), the collegiate stepchild of DYAR. Whereas a rusher's PPP+ would compare his EqPts output to what would be expected, and is therefore great for measuring an offense's overall effectiveness, POE is cumulative. It is a comparison of a rusher's total EqPts to the Expected EqPt total, subtracting the latter from the former.

POE = EqPts - Expected EqPts. . . .

Most Varsity Numbers measures, in one way or another, bounce output versus expected output. POE, a brother to PPP and cousin to S&P and S&P+, does just that. POE, which intends to both evaluate both per-play and cumulative success, could also be used to evaluate receivers and tight ends, but that will be hard without good "pass intended for _____" data (some college play-by-plays record detailed information in this regard, others do not). Right now, it is an RB-only figure, but it is a pretty good one.

Not sure I entirely buy this as the best method (requires getting into the nitty gritty of FO's methods), but overall this is a good starting spot. It tends to reward the explosive players.

Moving to the comments, a few highlights, though all were excellent. Brad said:

I don't think getting long runs is the only way a back can improve his average. He can also do so by getting less short gains.

Think of a back that gets 3 yds minimum on slightly over half of his carries and gets 6 yds on the rest. Then compare him to a back that gets loses a yard on a third of his carries gets 3 yards on a third of his carries and gains 10 yards on a third of his carries.

Both backs have a median rush of 3 yds, but the first back averages around 5 yds per carry while the second one averages only 4. However the second back clearly has more "Big play potential" because he gains 10 yards on 1/3 of his runs.

My point is that a back can improve his average vs median both by getting more long gains OR by having less short runs. Which of these two things that great backs do is a question for the data.

I should have conceptualized this better in the first place, because this helps explain why Reggie Bush has been such a mediocre rusher in the NFL. It's not his explosiveness (though he hasn't broken many very long runs), but his routine bad plays. It also is why Emmitt Smith and Barry Sanders are so hard to compare: Barry's stat line was full of negative plays and small gains, but checkered with the spectacular long runs. Emmitt Smith, the opposite. (And I don't think with Barry it was all just jump and bad blocking; it was also just his running style. Do you think he would have fit in well with the Denver Broncos "one-cut-and-go" philosophy? People say "oh, if he had played for them he would have had 3,000 yards but I'm not so sure.)

Tom points me to another good bit from Football Outsiders, this time by Mike Tanier, quoted at length:

The 4.0-4.1 yard average is an arithmetic mean: add up all the yards, divide by the attempts. The arithmetic mean is easily skewed by extremes in data. A 75-yard run can increase a starting running back's rushing average by several tenths of a point by the end of a season. This skewing always increases rushing averages: there are several 50+ yard rushes every year, but no 50+ yard losses on running plays.

We all know that a few big plays can make a mediocre running back's rushing average look great. But how much effect do long gains have on the league rushing average? The best way to see this is to break down every running play by distance. . . . The table reveals a surprising fact: the mean carry may yield four yards, but the median carry yields only three yards, and the data distribution is centered at two yards. . . .

Over 20 percent of running plays gain zero or one yards. Factor in losses, and over one-fourth of all runs result in negative or negligible yardage. The rushing average for the plays in the -4-to-10 yard range in 2005 was 2.95 yards per attempt. Long runs make up only about nine percent of all rushing plays, but they increase the league rushing average by over 40 percent. . . .

As a way of negating the importance of team strength as well as studying the contrasts between rushing styles, let's examine a pair of teammates from 2005.

Last season, Tatum Bell gained 920 yards and averaged 5.3 yards per carry. Mike Anderson gained 1,014 yards but averaged just 4.2 yards per carry. Despite the wide disparity in yards per carry, DVOA and DPAR ranked Anderson as the better back. Anderson was 37.0 points above replacement level, Bell 16.4. Anderson was 20.3 percent better than the average back, Bell just 7.6 percent.

Bell's rushing average was inflated by several long runs: he had a 68, 67, and 55 yard run in 2005, plus several 35-yard runs. Anderson's longest carry of the season was 44 yards, and that was his only run longer than 25 yards. We all know that Bell is a "home run threat" while Anderson is more consistent. But is it really fair to downgrade Bell because of his long runs? We're inclined to downgrade Bell somewhat because so much of his value is contained in a few plays. But is that really fair? After all, gaining four yards at a time is great and all, but big plays are pretty important, too. . . .

Anderson's yardage distribution is centered in the 2-3 yard range, while Bell's is centered in the 1-2 yard range, giving Anderson a full yard-per-play advantage on carry after carry. Bell's advantage, of course, is on runs of more than 10 yards. All but 6.5 percent of Anderson's runs gain from -4 to 10 yards, while 10.5 percent of Bell's runs are outside the chart (he only lost five yards on one play last season). Give them both 200 carries, and Bell will have eight more long runs than Anderson, and those runs will be longer than what Anderson can usually muster. But Anderson will gain an extra yard that Bell couldn't on dozens of other
runs. . . .

Anderson's in-the-box mean was 3.36 yards per attempt, noting again that his "box" is larger. Bell's was just 2.67. What's interesting is that we tend to think of backs like Anderson as "ordinary" while backs with Bell's big-play potential are held in higher esteem. But Bell's rushing distribution is more in line with the league norms than Anderson's. He's very good, but his contributions are typical of what backs around the league provide. Anderson, at least in 2005, was the unique player, providing hard-to-get, down-in, down-out production.

The difference between Bell and Anderson suggests that "cloud of dust" backs are more valuable than "boom or bust" backs, but we must be careful when using cheesy labels. Our perception of a back's production profile are often way off. How would you classify Marshall Faulk in his prime? Probably as a boom-or-bust back, albeit one with lots of boom and only a little bust.

But Faulk's running distributions show that in his prime he was much more than a big-play machine. . . .

Faulk's in-the-box mean was 3.37, a very good figure. What's more, his "box" only included 86 percent of his runs. Faulk had seven 12-yard runs, six 16-yard runs, and three 18-yard runs in 2000, giving him a very high percentage of 11-20 yard runs. But what's most remarkable about his production was his ability to avoid no-gainers and his above-average totals in the 3-5 yard range. Fast, shifty Faulk was just as good at using his skills to gain a yard or two as he was at burning defenses for long gains.

By contrast, [Jonathan] Stewart's ability to avoid losses and pick up two or three yards couldn't offset his complete lack of big-play potential. At first glance, Stewart's distribution looks similar to Anderson's. But his in-the-box mean of 2.8 is over a half-yard lower. The differences are subtle -- Anderson is a little more likely to gain five or six yards and a little less likely to lose yardage -- but they add up over a few hundred carries. And Stewart, like Anderson, concentrated 95 percent of his carries in the -4-to-10 yard range, so he had few 10-20 yard bursts to increase his productivity. Stewart, like Anderson, was providing a unique skill, which is why he was able to stay in the league for several years. Unlike Anderson, he wasn't a great exemplar of that skill, and the Football Outsiders metrics took him to task for it. . . .

Teams don't generate rushing yards in three-, four-, or five-yard bursts. They gain it through punctuated equilibrium, waiting through dozens of minimal gains for a few big plays per game.

And those big plays aren't that big. We've focused on gains of ten or less in this article, ignoring the 10.5 percent or so of plays that yield more yardage. The vast majority of those runs gain 11-20 yards: 6.9 percent overall. Almost 25 percent of the rushing yardage gained in the NFL is generated on runs of 11-20 yards. There were 960 such runs last year: 30 per team, or just over two per team per game. Amazingly nearly 10 percent of all rushing yardage is generated on runs of 30 or more yards, plays which occur about four times per year for a typical team.

These distribution breakdowns are so interesting that they might seduce us into making some wacky conclusions. Keep in mind that all of these averages and distribution patterns are situation dependent. . . .

Without further study, we shouldn't leap to grand conclusions. But we know this much: if we expect to gain four or five yards on every running play, we're going to be disappointed most of the time. No wonder passing totals have been creeping up for decades. If all a handoff gets you is two yards and a cloud of dust, you might as well throw the ball.

Lots going on here, but it mostly just reinforces what we know: Backs and teams have different styles, and it is not always easy to compare them; you want a guy who (a) does not lose yardage, (b) consistently gets positive yardage, and (c) is a big-play threat. They don't always come that way, so it is interesting that Tanier and FO conclude that the consistent back is simply better than the big-play threat. I'd like to see more to support that -- i.e. that the "dozens of first downs" or extra yards Anderson might have pulled down for the team were worth more than Tatum Bell's big plays. I'm not saying I disagree, but that it is interesting. That kind of conclusion could have troubling implications for a guy like, say, Barry Sanders, or moreso Reggie Bush.

Chase of the PFR Blog points out marginal yards, and adds:

I looked at rushing yards over 3.0 yards per carry. However, as the author has implied, I've begun shifting my focus away from yards per carry.

Rushing first downs is a key part of evaluating a running game. Without play by play information, I'd want to focus on rushing first downs, rushing yards, rushing TDs and carries.

I think this is good; rushing first downs should be part of the evaluation. According to CFBStats, last season's top first-down teams in college football are an expected bunch:

1. Air Force
2. Tulsa
3. Navy
4. Nevada (tie)
4. Oklahoma State (tie)
6. Oregon
7. TCU
8. Florida
9. Oklahoma
10. Georgia Tech

As a side note, I do think yards per carry is most useful on first down, and CFB Stats (as well as the pro-football reference site), has a ready breakdown of rushing stats by down, for all teams. For example, the yards per carry of the top 5 teams in the country last year, limited solely to first down, were:

1. Nevada 6.95
2. Louisiana-Lafayette 6.77
3. Florida 6.76
4. Navy 1843 6.12
5. Oregon 1676 5.96

Each team had over 1,600 yards on first down alone (everyone bud Oregon had over 1,800, and Nevada over 2,000). And those averages -- yes I just pasted that thing from FO saying you can't solely look at averages -- indicates that these teams had a lot of favorable down and distances to convert (Louisiana-Lafayette, the one seemingly strange entry, was in the top 15 of total offense last year despite not being a great throwing team).

In the end ... I have to think about this question some more. I think we're moving in the right direction, as, again, part of my motivation is to find handy and easy to use stats (thus one reason I dislike the idea of some kind of "running back efficiency rating" like they use with quarterbacks). I agree that the debate is going to be between styles of running game (or running back), as well as situation. I would imagine that teams like Oregon or Georgia Tech are going to have much different looking rushing distributions than, say, Wisconsin. But we're on our way down the path to the end.

End note: I'll be on vacation this week. I have a couple of posts set to go up, but otherwise I'll be out of pocket until next weekend/week. Cheers.

6 comments:

Doug said...: Excellent post, as always.

Just an observation from a statistician's perspective: One thing that shouldn't be overlooked is the degree of accuracy with which we can actually estimate a team's distribution of run lengths, based on the finite number of run plays in a season (and specifically the fairly small number of long runs).

When we talk about the distribution of a team's or a player's runs, what we're really doing is (1) using the distribution of runs that we observe for a team for the season to estimate the team's "true" run distribution, and then (2) using that estimate of the "true" run distribution to somehow quantify how good the team's run game is.

Most of the discussion so far has been about (2). But let's not forget about (1). And what I mean by that is, let's remember that when we're looking at things that don't happen very often, there's a ton of variability in the data we observe, and so any statements we try to make about those things aren't as precise as we probably think they are.

Specifically, this applies to the issue of long runs. If a certain running back has 10 carries of 15+ yards in a season, then we shouldn't be the least bit surprised to see that same back have, say 14 or 7 carries of 15+ yards the following season, even under exactly the same circumstances, because that's well within the range of variability we'd tend to see, statistically speaking (assuming a Poisson distribution, or a binomial distribution with small p, which is really the same thing). So while it's reasonable to say that we think that a back with 10 long runs in a season has better big-play ability than a back with only 7, we certainly can't make that statement with any sort of confidence that it's actually true. To put it a different way, while it's certainly true that the ability to break long runs is a repeatable skill (unlike, say, a team's ability to recover fumbles), it's unfortunately a repeatable skill about which we have a small amount of data for each team in a given season, which means it's difficult to say for sure how good teams, much less players, are at that skill.

So when we're designing a metric to evaluate the run game, I think it's necessary that it weights success on short runs somewhat more heavily than breaking the occasional long run, not because the former skill is more valuable, but because the latter skill is much more difficult to accurately estimate.; 8/02/2009 08:56:00 PM
Jay said...: One of my favorite qoutes by a running back:
"If you need one yard, I'll get you three. If you need five yards, I'll get you three."
-Leroy Hoard; 8/03/2009 11:36:00 AM
Anonymous said...: Consistency is key. The most important quality for an offense is the ability to move the chains. An offense that gains five yards every play will score more points per drive and win more games than an offense that averages 10 yards per play. While a 35 yard run is wonderful, follow it with runs of 1, 2, and 2 and you've averaged 10 yards a carry and punted.

After the Colts' Edgerrin James came back from his knee surgery, he lacked explosiveness. He appeared less athletic than the best RBs in the NFL and no better than the mediocre. Hindered by a very poor run blocking offensive line, he shouldn't have been very effective.

But Indy's scheme (their unique stretch with play action threat kept the secondary back) gave James a chance to do something he did better than anyone in the game -- attack a seam with his pads really low. Without secondary support, defensive fronts were forced to race laterally to the outside. The Colt O-line merely maintained contact. Edge would find a little crease, cut hard and drive his pads through about knee high between defenders. Running hard and locked up with a blocker, defenders couldn't stop him from pushing forward for 2-3 yards after contact. He consistently put the Colts in makeable 3d down situations. And of course, no one is more consistent picking up first downs than Peyton.

Break down film of those Colt teams and the running game looks pedestrian or worse. The blocking isn't good and James makes very few long runs. Yet, all they managed to do was provide the opportunity for the offense to keep moving the sticks and putting up points. Clearly they benefited from the defenses focusing to stop the pass. Still, their positive impact was far greater than any statistical measure is likely to capture.

stan

I think that something like FO's success rate with a focus on consistency will likely provide the best measure.; 8/03/2009 12:51:00 PM
Brad said...: I realized and error in my original post that does not affect the point in anyway.

But the average of 3 ypc and 6 ypc is of course 4.5 not 5.; 8/04/2009 01:33:00 PM
Ken said...: Interesting point about 75-yard runs and the arithmetic mean. It just struck me that software packages like SAS have routines that identify outliers and influential points in data sets. It might be that the way to evaluate running games (and/or running backs) is not by trying to find a single statistic, but by an index that takes into consideration mean, standard deviation (smaller SD = more consistent production, ceteris parabus), and analysis of outliers. More positive outliers might be thought a good thing, but in any case it's worth knowing just how that offense got to a 5.0-yard average.; 8/05/2009 01:13:00 PM
Dr Obvious said...: Still no discussion about success rate? It's not the end all, be all only stat you need to see to judge a running back, but I think it really captures what you are looking at with first down only above, except with all 4 downs. It's also easy to read, understand, and compare.; 8/05/2009 01:30:00 PM