Tuesday, June 13, 2006
Murder By Numbers
Roberts’ talk was hot for three reasons. First, it addressed rockism — which by the last day had become the unofficial theme of the conference — without resorting to any of the usual popist rhetoric or counter-arguments. Second, it was quite funny, and not just because of the PowerPoint. And third, it involved multi-colored graphs derived from spreadsheets — a labor-intensive, number-heavy approach that was well beyond the lyric parsing that makes up most rock writing.
But that’s the problem. Having been on the same panel as Roberts (delivering a talk called My JPOP Problem — and Yours), I greatly enjoyed his delivery. But re-reading the paper, which he has posted here, I found myself thinking that, amusing though it is, his argument ultimately doesn’t add up, despite having been so warmly received. If the stereotype of rock critics as former English majors is even partly true, then it shouldn’t be a surprise that innumeracy runs rampant in the field, and that my colleagues would accept Roberts’ analysis so uncritically. To quote a recent Dilbert strip, “And you know it’s accurate because I used math.”
So let’s look at the way Roberts used math, and discuss what he could have done but didn’t.
Before proceeding, I should mention that I’m not exactly neutral on the topic of the Rolling Stone guides. I wrote for three of the four volumes (only David McGhee has contributed to all four), and The New Rolling Stone Record Guide, which provided the grist for Roberts’ mill, was the first time anything I wrote ever wound up in a book. Roberts quotes me twice in his paper, but in neither instance were my words mocked or misrepresented. So this isn’t personal.
What bothers me about the essay is that Roberts promises a lot — for instance, that “[b]y understanding the unconscious biases of the editors, we can more fully understand what exactly we were taught, and how some of our pleasures came with the added baggage of shame”— and delivers little. His graphics may be colorful and his observations droll, but the points he makes are too often irrelevant or misleading, Worse, there are many larger points he could have made, but didn’t.
Fortunately, he describes this paper as “something I’m expanding/revising,” so perhaps a later version will address these issues.
Let’s start with the fact that the statistical analysis tells us nothing about “the unconscious biases” Roberts seeks to reveal. There is, for example, a bright, colorful pie chart breaking down the artists included in the second record guide by race. Because this follows by several paragraphs a sentence in which Roberts writes, “It didn’t cross my mind that of the 52 reviewers who appraised the records, only three were women, nor did I question how many black critics chimed in,” it’s easy to assume that the race chart reflects the bias of the mostly male, universally white reviewers.
But does it? One of the things the editors — Dave Marsh, who Roberts routinely castigates, and John Swenson, who Roberts largely ignores — make clear is that there wasn’t much filtering going on in choosing who got covered in the book. Unlike the Trouser Press or Spin record guides, which assiduously excluded albums that didn’t match their aesthetic criteria, the first two Rolling Stone record guides took the attitude that if it was in print and wasn’t jazz or classical, it got reviewed. (Later Rolling Stone guides were much more discriminating.) As such, Roberts’ pie chart is less a reflection of bias in the media than of bias in the recording industry — a different issue altogether.
Then there’s the notion of artistic bias. Roberts’ stats are derived from the star ratings. But because he sticks to the blunt tool of averaging (adding stars together, then dividing by the number of albums), the conclusions he derives are limited. Even so, he presents them as a sort of scientific insight — the Beatles, he reports. “were .8 of a star better than the Stones.” In other words, “You know it’s accurate because I used math.”
If, on the other hand, Roberts calculated the standard deviation of the ratings in a particular genre, as well as the averages, he would have a much stronger tool for deducing evidence of bias. Standard deviation reflects the amount of variance within a collection of ratings. If the ratings are all close together, the deviation is small; if the ratings vary greatly, then the deviation is large. If, say, there were a large number of blues artists whose albums averaged 3.5 stars with little deviation, and a large number of MOR artists whose albums averaged 0.25 stars with little deviation, one might deduce that The New Rolling Stone Record Guide values blues more than MOR.
Or not. My wife, who is a mathematician, points out that “using means and standard deviations to look at these things depends on two things: 1) the album guide reviewers were randomly chosen from the population of pop reviewers; 2) when the editors assigned the review jobs, they did so randomly; and 3) there is a large set of data. On this third point, whether or not there are small-sample issues would be up to Roberts to determine; he has the data and he'd have to look in a stats book to learn how to test whether or not his data set is a small one.” Of course, the reviewers were not chosen at random, and while there was some element of chance in doling out the assignments — reviewers had some choice in their assignments, but that choice whttp://www.blogger.com/img/gl.link.gifas limited by what had already been claimed by others — it hardly fits the classic statistics model.
She suggests using histograms to graph the ratings. (You can read about them here and here.) When a histogram graphs a genuinely random or unbiased collection of data, it produces something that looks like a bell curve. When the data points are loaded with extremes — for instance, a ranking of NBA salaries or of biased reviews — the result looks quite unlike a bell curve.
Another approach Roberts might have taken would have been to look at the extreme ends of the ratings system -- the five star and no star reviews. He does make a stab at this, writing that, “Seventeen women released 5-star records — five percent of the 378 total masterpieces in the canon.” He also reports that the Rolling Stones’ “5-star ratio was a mere 19 percent.” But that doesn’t really tell us much.
But if one were to sort the artists who got five-star ratings by how many they got — one, two, three or more —then track that data by genre, the resulting map of “indispensable” albums would provide a fair representation of the Rolling Stone canon. Seeing how they’re distributed across genres would say a lot about what kind of music the reviewers valued most.
It would also be interesting to repeat the exercise with those albums deemed “worthless” — the no-star recordings. This would provide insights simple averaging can’t. After all, an artist whose sole release gets a bullet and one whose five album catalog all get bullets would each end up with the same average (0 stars). But writing off five albums conveys a level of contempt well beyond what Roberts describes as acts being “swatted away like flies and flicked out the window.” Again, analyzing that data by genre would say a lot about the kinds of music the reviewers were incapable of respecting.
There are problems with Roberts’ paper beyond the math. His prose manages to be both purple and clunky (“The opus that is The Rolling Stone Record Guide begins on page one…”), and he has the unfortunate habit of making his points through misleadingly selective quotes. For instance, he writes, “How did English-speaking bands line up against one other? Rolling Stone has an answer: ‘At their best, Chilliwack was the finest Canadian rock band, outrocking BTO and outwriting Burton Cummings.’” When he read that line at EMP, it got a big laugh, in part because people found it amusing to think anyone would rank the near-forgotten (in the US) Chilliwack above BTO. What the review (by long-time Globe and Mail reviewer Alan Neister, who really does know Canadian rock) went on to say was this: “But a lack of consistency kept it from international success, and only these albums remain in print,” A bit less risible when you get the whole thought, eh?
Roberts also tends to personify the views advanced by the book as reflecting the taste and will of Marsh, as if Swenson and the other 52 reviewers were merely standing on the sidelines. While talking about the reviewers’ distaste for metal, he writes, “Marsh’s Hammer of Justice came down hard on Motorhead.” Unfortunately, the review he quotes is credited to Malu Halasa, not Dave Marsh.
Such sloppiness is repeated when Roberts turns his attention to the guide’s coverage of hip-hop. He writes, “By 1982, the Sugarhill Gang, Grandmaster Flash, the Tom Tom Club, Fab Five Freddy, Kurtis Blow and Trouble Funk had released 12-inches, but the Guide ignored them.” Well, no. First off, the book states in the introduction that it chose not to review singles (which is what 12-inches were). Secondly, although Kurtis Blow’s singles may have been ignored, his two albums weren’t. They are reviewed by Dave Marsh, on page 47.
The last point I’d like to make has to do with the notion that the Rolling Stone record and album guides were intended to reflect and propagate the official party line of what Roberts calls “the Empire of Rolling Stone.” Many of the contributors to the guides were not regular reviewers for the magazine, and I’m unaware of any effort during the compilation of those books to ensure that the guide rating matched the magazine rating. Hell, many of the rankings changed from volume to volume, sometimes drastically. There may have been biases at work in the reviews, but it’s probably overstating the case to ascribe them specifically to Dave Marsh or Rolling Stone.