Tuesday, June 13, 2006
Murder By Numbers
One of the bigger hits at the recent EMP Pop Conference was a paper by Randall Roberts called Dave Marsh-ing My Mellow: The Rolling Stone Record Guide and the Creation of the Canon. The talk, which was accompanied by a very ambitious PowerPoint presentation, attempted to use statistics to expose the biases that underlay the first two Rolling Stone guides (and by extension, the whole Rolling Stone rock aesthetic).
Roberts’ talk was hot for three reasons. First, it addressed rockism — which by the last day had become the unofficial theme of the conference — without resorting to any of the usual popist rhetoric or counter-arguments. Second, it was quite funny, and not just because of the PowerPoint. And third, it involved multi-colored graphs derived from spreadsheets — a labor-intensive, number-heavy approach that was well beyond the lyric parsing that makes up most rock writing.
But that’s the problem. Having been on the same panel as Roberts (delivering a talk called My JPOP Problem — and Yours), I greatly enjoyed his delivery. But re-reading the paper, which he has posted here, I found myself thinking that, amusing though it is, his argument ultimately doesn’t add up, despite having been so warmly received. If the stereotype of rock critics as former English majors is even partly true, then it shouldn’t be a surprise that innumeracy runs rampant in the field, and that my colleagues would accept Roberts’ analysis so uncritically. To quote a recent Dilbert strip, “And you know it’s accurate because I used math.”
So let’s look at the way Roberts used math, and discuss what he could have done but didn’t.
Before proceeding, I should mention that I’m not exactly neutral on the topic of the Rolling Stone guides. I wrote for three of the four volumes (only David McGhee has contributed to all four), and The New Rolling Stone Record Guide, which provided the grist for Roberts’ mill, was the first time anything I wrote ever wound up in a book. Roberts quotes me twice in his paper, but in neither instance were my words mocked or misrepresented. So this isn’t personal.
What bothers me about the essay is that Roberts promises a lot — for instance, that “[b]y understanding the unconscious biases of the editors, we can more fully understand what exactly we were taught, and how some of our pleasures came with the added baggage of shame”— and delivers little. His graphics may be colorful and his observations droll, but the points he makes are too often irrelevant or misleading, Worse, there are many larger points he could have made, but didn’t.
Fortunately, he describes this paper as “something I’m expanding/revising,” so perhaps a later version will address these issues.
Let’s start with the fact that the statistical analysis tells us nothing about “the unconscious biases” Roberts seeks to reveal. There is, for example, a bright, colorful pie chart breaking down the artists included in the second record guide by race. Because this follows by several paragraphs a sentence in which Roberts writes, “It didn’t cross my mind that of the 52 reviewers who appraised the records, only three were women, nor did I question how many black critics chimed in,” it’s easy to assume that the race chart reflects the bias of the mostly male, universally white reviewers.
But does it? One of the things the editors — Dave Marsh, who Roberts routinely castigates, and John Swenson, who Roberts largely ignores — make clear is that there wasn’t much filtering going on in choosing who got covered in the book. Unlike the Trouser Press or Spin record guides, which assiduously excluded albums that didn’t match their aesthetic criteria, the first two Rolling Stone record guides took the attitude that if it was in print and wasn’t jazz or classical, it got reviewed. (Later Rolling Stone guides were much more discriminating.) As such, Roberts’ pie chart is less a reflection of bias in the media than of bias in the recording industry — a different issue altogether.
Then there’s the notion of artistic bias. Roberts’ stats are derived from the star ratings. But because he sticks to the blunt tool of averaging (adding stars together, then dividing by the number of albums), the conclusions he derives are limited. Even so, he presents them as a sort of scientific insight — the Beatles, he reports. “were .8 of a star better than the Stones.” In other words, “You know it’s accurate because I used math.”
If, on the other hand, Roberts calculated the standard deviation of the ratings in a particular genre, as well as the averages, he would have a much stronger tool for deducing evidence of bias. Standard deviation reflects the amount of variance within a collection of ratings. If the ratings are all close together, the deviation is small; if the ratings vary greatly, then the deviation is large. If, say, there were a large number of blues artists whose albums averaged 3.5 stars with little deviation, and a large number of MOR artists whose albums averaged 0.25 stars with little deviation, one might deduce that The New Rolling Stone Record Guide values blues more than MOR.
Or not. My wife, who is a mathematician, points out that “using means and standard deviations to look at these things depends on two things: 1) the album guide reviewers were randomly chosen from the population of pop reviewers; 2) when the editors assigned the review jobs, they did so randomly; and 3) there is a large set of data. On this third point, whether or not there are small-sample issues would be up to Roberts to determine; he has the data and he'd have to look in a stats book to learn how to test whether or not his data set is a small one.” Of course, the reviewers were not chosen at random, and while there was some element of chance in doling out the assignments — reviewers had some choice in their assignments, but that choice whttp://www.blogger.com/img/gl.link.gifas limited by what had already been claimed by others — it hardly fits the classic statistics model.
She suggests using histograms to graph the ratings. (You can read about them here and here.) When a histogram graphs a genuinely random or unbiased collection of data, it produces something that looks like a bell curve. When the data points are loaded with extremes — for instance, a ranking of NBA salaries or of biased reviews — the result looks quite unlike a bell curve.
Another approach Roberts might have taken would have been to look at the extreme ends of the ratings system -- the five star and no star reviews. He does make a stab at this, writing that, “Seventeen women released 5-star records — five percent of the 378 total masterpieces in the canon.” He also reports that the Rolling Stones’ “5-star ratio was a mere 19 percent.” But that doesn’t really tell us much.
But if one were to sort the artists who got five-star ratings by how many they got — one, two, three or more —then track that data by genre, the resulting map of “indispensable” albums would provide a fair representation of the Rolling Stone canon. Seeing how they’re distributed across genres would say a lot about what kind of music the reviewers valued most.
It would also be interesting to repeat the exercise with those albums deemed “worthless” — the no-star recordings. This would provide insights simple averaging can’t. After all, an artist whose sole release gets a bullet and one whose five album catalog all get bullets would each end up with the same average (0 stars). But writing off five albums conveys a level of contempt well beyond what Roberts describes as acts being “swatted away like flies and flicked out the window.” Again, analyzing that data by genre would say a lot about the kinds of music the reviewers were incapable of respecting.
There are problems with Roberts’ paper beyond the math. His prose manages to be both purple and clunky (“The opus that is The Rolling Stone Record Guide begins on page one…”), and he has the unfortunate habit of making his points through misleadingly selective quotes. For instance, he writes, “How did English-speaking bands line up against one other? Rolling Stone has an answer: ‘At their best, Chilliwack was the finest Canadian rock band, outrocking BTO and outwriting Burton Cummings.’” When he read that line at EMP, it got a big laugh, in part because people found it amusing to think anyone would rank the near-forgotten (in the US) Chilliwack above BTO. What the review (by long-time Globe and Mail reviewer Alan Neister, who really does know Canadian rock) went on to say was this: “But a lack of consistency kept it from international success, and only these albums remain in print,” A bit less risible when you get the whole thought, eh?
Roberts also tends to personify the views advanced by the book as reflecting the taste and will of Marsh, as if Swenson and the other 52 reviewers were merely standing on the sidelines. While talking about the reviewers’ distaste for metal, he writes, “Marsh’s Hammer of Justice came down hard on Motorhead.” Unfortunately, the review he quotes is credited to Malu Halasa, not Dave Marsh.
Such sloppiness is repeated when Roberts turns his attention to the guide’s coverage of hip-hop. He writes, “By 1982, the Sugarhill Gang, Grandmaster Flash, the Tom Tom Club, Fab Five Freddy, Kurtis Blow and Trouble Funk had released 12-inches, but the Guide ignored them.” Well, no. First off, the book states in the introduction that it chose not to review singles (which is what 12-inches were). Secondly, although Kurtis Blow’s singles may have been ignored, his two albums weren’t. They are reviewed by Dave Marsh, on page 47.
The last point I’d like to make has to do with the notion that the Rolling Stone record and album guides were intended to reflect and propagate the official party line of what Roberts calls “the Empire of Rolling Stone.” Many of the contributors to the guides were not regular reviewers for the magazine, and I’m unaware of any effort during the compilation of those books to ensure that the guide rating matched the magazine rating. Hell, many of the rankings changed from volume to volume, sometimes drastically. There may have been biases at work in the reviews, but it’s probably overstating the case to ascribe them specifically to Dave Marsh or Rolling Stone.
Roberts’ talk was hot for three reasons. First, it addressed rockism — which by the last day had become the unofficial theme of the conference — without resorting to any of the usual popist rhetoric or counter-arguments. Second, it was quite funny, and not just because of the PowerPoint. And third, it involved multi-colored graphs derived from spreadsheets — a labor-intensive, number-heavy approach that was well beyond the lyric parsing that makes up most rock writing.
But that’s the problem. Having been on the same panel as Roberts (delivering a talk called My JPOP Problem — and Yours), I greatly enjoyed his delivery. But re-reading the paper, which he has posted here, I found myself thinking that, amusing though it is, his argument ultimately doesn’t add up, despite having been so warmly received. If the stereotype of rock critics as former English majors is even partly true, then it shouldn’t be a surprise that innumeracy runs rampant in the field, and that my colleagues would accept Roberts’ analysis so uncritically. To quote a recent Dilbert strip, “And you know it’s accurate because I used math.”
So let’s look at the way Roberts used math, and discuss what he could have done but didn’t.
Before proceeding, I should mention that I’m not exactly neutral on the topic of the Rolling Stone guides. I wrote for three of the four volumes (only David McGhee has contributed to all four), and The New Rolling Stone Record Guide, which provided the grist for Roberts’ mill, was the first time anything I wrote ever wound up in a book. Roberts quotes me twice in his paper, but in neither instance were my words mocked or misrepresented. So this isn’t personal.
What bothers me about the essay is that Roberts promises a lot — for instance, that “[b]y understanding the unconscious biases of the editors, we can more fully understand what exactly we were taught, and how some of our pleasures came with the added baggage of shame”— and delivers little. His graphics may be colorful and his observations droll, but the points he makes are too often irrelevant or misleading, Worse, there are many larger points he could have made, but didn’t.
Fortunately, he describes this paper as “something I’m expanding/revising,” so perhaps a later version will address these issues.
Let’s start with the fact that the statistical analysis tells us nothing about “the unconscious biases” Roberts seeks to reveal. There is, for example, a bright, colorful pie chart breaking down the artists included in the second record guide by race. Because this follows by several paragraphs a sentence in which Roberts writes, “It didn’t cross my mind that of the 52 reviewers who appraised the records, only three were women, nor did I question how many black critics chimed in,” it’s easy to assume that the race chart reflects the bias of the mostly male, universally white reviewers.
But does it? One of the things the editors — Dave Marsh, who Roberts routinely castigates, and John Swenson, who Roberts largely ignores — make clear is that there wasn’t much filtering going on in choosing who got covered in the book. Unlike the Trouser Press or Spin record guides, which assiduously excluded albums that didn’t match their aesthetic criteria, the first two Rolling Stone record guides took the attitude that if it was in print and wasn’t jazz or classical, it got reviewed. (Later Rolling Stone guides were much more discriminating.) As such, Roberts’ pie chart is less a reflection of bias in the media than of bias in the recording industry — a different issue altogether.
Then there’s the notion of artistic bias. Roberts’ stats are derived from the star ratings. But because he sticks to the blunt tool of averaging (adding stars together, then dividing by the number of albums), the conclusions he derives are limited. Even so, he presents them as a sort of scientific insight — the Beatles, he reports. “were .8 of a star better than the Stones.” In other words, “You know it’s accurate because I used math.”
If, on the other hand, Roberts calculated the standard deviation of the ratings in a particular genre, as well as the averages, he would have a much stronger tool for deducing evidence of bias. Standard deviation reflects the amount of variance within a collection of ratings. If the ratings are all close together, the deviation is small; if the ratings vary greatly, then the deviation is large. If, say, there were a large number of blues artists whose albums averaged 3.5 stars with little deviation, and a large number of MOR artists whose albums averaged 0.25 stars with little deviation, one might deduce that The New Rolling Stone Record Guide values blues more than MOR.
Or not. My wife, who is a mathematician, points out that “using means and standard deviations to look at these things depends on two things: 1) the album guide reviewers were randomly chosen from the population of pop reviewers; 2) when the editors assigned the review jobs, they did so randomly; and 3) there is a large set of data. On this third point, whether or not there are small-sample issues would be up to Roberts to determine; he has the data and he'd have to look in a stats book to learn how to test whether or not his data set is a small one.” Of course, the reviewers were not chosen at random, and while there was some element of chance in doling out the assignments — reviewers had some choice in their assignments, but that choice whttp://www.blogger.com/img/gl.link.gifas limited by what had already been claimed by others — it hardly fits the classic statistics model.
She suggests using histograms to graph the ratings. (You can read about them here and here.) When a histogram graphs a genuinely random or unbiased collection of data, it produces something that looks like a bell curve. When the data points are loaded with extremes — for instance, a ranking of NBA salaries or of biased reviews — the result looks quite unlike a bell curve.
Another approach Roberts might have taken would have been to look at the extreme ends of the ratings system -- the five star and no star reviews. He does make a stab at this, writing that, “Seventeen women released 5-star records — five percent of the 378 total masterpieces in the canon.” He also reports that the Rolling Stones’ “5-star ratio was a mere 19 percent.” But that doesn’t really tell us much.
But if one were to sort the artists who got five-star ratings by how many they got — one, two, three or more —then track that data by genre, the resulting map of “indispensable” albums would provide a fair representation of the Rolling Stone canon. Seeing how they’re distributed across genres would say a lot about what kind of music the reviewers valued most.
It would also be interesting to repeat the exercise with those albums deemed “worthless” — the no-star recordings. This would provide insights simple averaging can’t. After all, an artist whose sole release gets a bullet and one whose five album catalog all get bullets would each end up with the same average (0 stars). But writing off five albums conveys a level of contempt well beyond what Roberts describes as acts being “swatted away like flies and flicked out the window.” Again, analyzing that data by genre would say a lot about the kinds of music the reviewers were incapable of respecting.
There are problems with Roberts’ paper beyond the math. His prose manages to be both purple and clunky (“The opus that is The Rolling Stone Record Guide begins on page one…”), and he has the unfortunate habit of making his points through misleadingly selective quotes. For instance, he writes, “How did English-speaking bands line up against one other? Rolling Stone has an answer: ‘At their best, Chilliwack was the finest Canadian rock band, outrocking BTO and outwriting Burton Cummings.’” When he read that line at EMP, it got a big laugh, in part because people found it amusing to think anyone would rank the near-forgotten (in the US) Chilliwack above BTO. What the review (by long-time Globe and Mail reviewer Alan Neister, who really does know Canadian rock) went on to say was this: “But a lack of consistency kept it from international success, and only these albums remain in print,” A bit less risible when you get the whole thought, eh?
Roberts also tends to personify the views advanced by the book as reflecting the taste and will of Marsh, as if Swenson and the other 52 reviewers were merely standing on the sidelines. While talking about the reviewers’ distaste for metal, he writes, “Marsh’s Hammer of Justice came down hard on Motorhead.” Unfortunately, the review he quotes is credited to Malu Halasa, not Dave Marsh.
Such sloppiness is repeated when Roberts turns his attention to the guide’s coverage of hip-hop. He writes, “By 1982, the Sugarhill Gang, Grandmaster Flash, the Tom Tom Club, Fab Five Freddy, Kurtis Blow and Trouble Funk had released 12-inches, but the Guide ignored them.” Well, no. First off, the book states in the introduction that it chose not to review singles (which is what 12-inches were). Secondly, although Kurtis Blow’s singles may have been ignored, his two albums weren’t. They are reviewed by Dave Marsh, on page 47.
The last point I’d like to make has to do with the notion that the Rolling Stone record and album guides were intended to reflect and propagate the official party line of what Roberts calls “the Empire of Rolling Stone.” Many of the contributors to the guides were not regular reviewers for the magazine, and I’m unaware of any effort during the compilation of those books to ensure that the guide rating matched the magazine rating. Hell, many of the rankings changed from volume to volume, sometimes drastically. There may have been biases at work in the reviews, but it’s probably overstating the case to ascribe them specifically to Dave Marsh or Rolling Stone.
Comments:
<< Home
JD- this is brilliant! an elementary stats lesson for music geeks! So when are you going to start running regression models?
On paper, it's often hard to tell if Roberts is being flip or not in several places. It certainly doesn't come close to being a serious statistical analysis, which, in the context of a five-star rating system, I can't see as being anything other than a curiosity anyway.
I'm also unconvinced about the "unconscious" bias angle. Marsh saying that The Slits are "for hardcore anglophilic ass-kissers only" sounds pretty overt to me. Perhaps Roberts really means "not stated vigorously enough at the beginning of the book"?
BTW, very much enjoy your pieces in The Globe, Mr. C. Even if you do like Whitesnake.
I'm also unconvinced about the "unconscious" bias angle. Marsh saying that The Slits are "for hardcore anglophilic ass-kissers only" sounds pretty overt to me. Perhaps Roberts really means "not stated vigorously enough at the beginning of the book"?
BTW, very much enjoy your pieces in The Globe, Mr. C. Even if you do like Whitesnake.
Thanks so much for the thoughtful analysis of this *thing* that I did. It's semi-shocking that the paper has been given the attention that it has, honestly, and incredibly gratifying (and a little horrifying) to see it examined with such thoroughness. I wrote the proposal one day *after* the deadline -- banged it out without much thought with what I was proposing to do. It was rather quickly greenlighted, and only then did I realize what I had gotten myself into.
I ended up logging probably 100 hours in my offtime (I'm a staff writer at the Riverfront Times) simply entering data, and could have spent much more. It was way way down to the wire, and the result was that I spent more time on data than I did on writing the damn thing, and to my eyes (and JD's), it shows. Which is why I'm revising it, and offering what I presented at the EMP as a place-holder until I finish (summer project, publish in fall) a final version.
What I presented was received well by the crowd, and worked well in the context of an oral presentation with PowerPoint. But examining it on paper, which J.D. rightfully did, reveals a slapped together collection of graphs and topics that most definitely lacks cohesion and a central argument (though most seem to have centered on the whole issue of *rockism* -- even though the word isn't uttered once in the piece). As published, it's a decent first draft, and your analysis will most definitely help me narrow the focus.
Anyway, I'll keep you posted on the final version.
And on a personal note, J.D., I appreciate you saying that I mentioned your name "without malice," because that's the truth. I've read your work for 20 years now, and have the utmost respect for what you've done. If anyone would have told me when I was first discovering the RS Record Guide that J.D. Considine would be actively engaged in a dialogue with something I wrote, I would have laughed it off. So, thanks.
I ended up logging probably 100 hours in my offtime (I'm a staff writer at the Riverfront Times) simply entering data, and could have spent much more. It was way way down to the wire, and the result was that I spent more time on data than I did on writing the damn thing, and to my eyes (and JD's), it shows. Which is why I'm revising it, and offering what I presented at the EMP as a place-holder until I finish (summer project, publish in fall) a final version.
What I presented was received well by the crowd, and worked well in the context of an oral presentation with PowerPoint. But examining it on paper, which J.D. rightfully did, reveals a slapped together collection of graphs and topics that most definitely lacks cohesion and a central argument (though most seem to have centered on the whole issue of *rockism* -- even though the word isn't uttered once in the piece). As published, it's a decent first draft, and your analysis will most definitely help me narrow the focus.
Anyway, I'll keep you posted on the final version.
And on a personal note, J.D., I appreciate you saying that I mentioned your name "without malice," because that's the truth. I've read your work for 20 years now, and have the utmost respect for what you've done. If anyone would have told me when I was first discovering the RS Record Guide that J.D. Considine would be actively engaged in a dialogue with something I wrote, I would have laughed it off. So, thanks.
My wife, who is a mathematician, points out that “using means and standard deviations ... depends on two things: 1) ... ; 2) ... 3) ...
Heh ...
Sorry.
Post a Comment
Heh ...
Sorry.
<< Home