« Great Takedown | Main | Going Nuclear »

May 17, 2005

Comments

Aren't they grading for radically different things than you are? And if not, then what about the work described to set the parameters the programs used? Do you think

A criminology paper resulted in a nuanced evaluation offering feedback such as this: "This paper does not do a good job of relating white-collar crime to various concepts in labeling theory of deviance."
is just cluelessness?

I'd guess that at the mid-high school level of education the experience of writing a five-page essay is what's really important, esp. for the non-AP students. And especially if they have the opportunity to ask for a human regrade when they think there was an error, this might in fact lead to the good outcome of more writing assignments. And really, the chimpanzee "testing" of the software described above is just silly. No, it's not even silly, it's intellectual dishonesty.

rilkefan: intellectual dishonesty? Why? The lecturer was assessing the software; why would it be wrong for him to try various things to see what it caught?

As for the rest: this is probably one of those cases when parts of the article that I didn't copy affected what I wrote, and I didn't notice. This stuff is also being used in college.

And while I'm sure there are lots of things I grade for that your average high school teacher doesn't (knowledge of Kant, for one thing), there are lots of things that are common. Clarity, strength of argument, vivid prose. And I think feedback on these things matters, as does (relatedly) students' thinking that they are getting real feedback.

About the program's comments: it would be interesting to know how the computer reached the conclusion that the writer had not adequately related white-collar crime to whatever. It could be just: seeing whether the terms for the 'various concepts in labeling theory of deviance' appeared in the essay near 'white collar crime'. And remember, that comment was given in a college class. I just think that's wrong.

I'm going to hazard a guess that given your institution, field, and teaching rigor you are exposed to young scholars of a caliber not very similar to the ruck of college students.

By intellectually dishonest, I mean it's a stunt. A stunt that as far as I can tell would not satisfy the crime-paper software. A stunt that is shown to be such by the counter-stunt of writing a good serious paper on Kant using the word "chimpanzee" frequently - something I suspect many of your students could do.

The intellectually honest way to test the software would be to take a year of essays, run the code on them, and histogram the difference between human and algorithm. Long tails would be evidence of failure. But that would require actual work and wouldn't provide a nice soundbite for people to quote.

By intellectually dishonest, I mean it's a stunt.

A stunt it may be, but it's the sort of thing that I, when I was a software tester, would've tried--and which the software should've caught. If I were to draw up a test plan for essay-grading software--and believe me, no amount of money or alcohol could induce me to take that job--it would have to include context testing.

I'm not saying that software analysis of human language is easy. It's one of the hardest things there is to write well. But if you're going to do it, do it right--the stakes are too high. It sounds like this app simply fails the laugh test. I would never sign off on it.

rilkefan: you remind me of a story...

Once upon a time, I had a terrible, terrible prof in a Dante course. He spent all his time either talking about his connections to other famous Dantisti(as he called them), or giving mind-numbing lectures on tiny details, which he never made interesting (as I'm sure, somehow, they were.) (One entire lecture was on eyebrows in the Purgatorio. Really.)

Anyways, all of us in that class loved Dante, and in consequence we hated him. So one night, after burning him in effigy, we hit on the idea of making a list of words that could never, ever appear in an essay on Dante, and each pledging to use one in our final. I think I had 'bathysphere'; someone else had 'platypus', and 'velcro'. And we all worked them into our exams, and if he noticed, he didn't say. (I think I had someone's soul ascending to heaven 'like a bathysphere'.)

Catsy, it's not clear to me that the "chimpanzee" test was reasonable. The crime code described seems to be context sensitive. Anyway, I can't argue on the basis of one thin article.

hilzoy, a common poetry exercise is to use random or semi-random words in a poem, perhaps as end words. The following lines use the words from a spam email in the last way:


Sonnet

"I am one of many Russian girls..." - [email protected]

Being able elegantly to dance the Charleston
is an accomplishment, undeniable if slight;
Being able to manipulate
a friend unspoiled to a surprise party, to judge by eye that periwinkle
dress would fit her figure and blush, no arccos
or color wheel involved - that's enviable. Some pears and Malay
spices on the stove,
decanting off the polymorph
grape proteins aged to weinstein:
if you stepped on the tail of a bushmaster
you wouldn't be in more peril.
How many days of despair will it take to defray
this night's bedazzlement,
this year's anchor, echoed, never to be encored?

Grading essays is hard work, especially if you try to do right by your students. There are rough drafts and rewrites, lengthy attempts to explain what went wrong with a student's argument and how it might have been improved, and so on.

Grading problem sets isn't much easier if you want to do it right, i.e. usefully correct the mistakes in addition to merely marking them wrong. The one advantage is that the problems are generally so much shorter that, even though they're also more numerous, you can grade the damn things faster.

[I still want to claw my eyes out with a hammer after an hour or two of grading, though.]

I found grading to be very hard and very time consuming. It had its compensations though, mainly the amusement factor of the more egregious errors. With computer science papers students often had to submit their code listing and output. My favourite was a second-year comp-sci student who had noticed they were getting the wrong results, but was too stupid and/or lazy to find and fix the bug. Instead of editing the results in a text editor, they had 'corrected' the printed output with correction fluid!

Write a computer program to grade papers, and students will figure out how to game it.

I've graded papers myself; first-year circuit analysis. I find it really hard to imagine that a computer can be programmed, at least in the present, to give partial credit, backtrack and see where the student got it wrong, and pagewise-correlate the position of work, diagrams, equations, etc to find out who did their own work and who simply copied someone else's papers.

First (and usually, second) offense for that last was a warning. Third and on was full credit for the first paper I came across, followed by zeroes for the clones, with a note to the prof about what I was doing and why. Pretty quickly, people caught on that if they were going to cheat, they were going to have to put some quality effort into cheating.

Hilzoy, where have you been?! Seriously, I work in the educational publishing industry, and you literally cannot sell writing and grammar books in the secondary market without promising the teachers this kind of silicon snake oil. We laugh about it, we know it doesn't work, but you have to have it to sell your books. Is it shortchanging the students? Of course it is.

Personally, I'd love to run an essay by Orwell through one of these things and see the results.

It's totally reasonaable to throw junk words at the software and see if it chokes. That's what testing is about. Like testing vending machines -- you should test what it does if it someone shakes it, puts in dented coins, forged dollar bills, etc.

But the other issue is that essays/writing assignments are supposed to be written with a reader person in mind. How can a software, at this level, recognize if an essay touches on an emotional truth, or is making a metaphorical comment on current events, or ... teh list goes on.

I agree that it doesn't pass the laugh test.

By intellectually dishonest, I mean it's a stunt.

I see intellectual dishonesty here, too, but on the part of the marketing folks who are trying to claim that their essay grading software works. If the software cannot catch even the most obvious problems, how do you expect it to catch anything but tiny, often pointless, grammar-tree and logic-tree problems.

I hate grading papers, but I can do it. It is clear to me that the grading programs available right now are not only worthless, but may actually cause students to write badly, as they try to write for the standards of the program, not for the standards of good essays.

I see intellectual dishonesty here, too, but on the part of the marketing folks

The minute you start believing what the marketers are telling you, you're lost.

After reading Hilzoy's post and Rilkefan's first comment, I thought to myself wouldn't it be cool if in defending his argument Rilkefan included a poem as part of his defense of the software and then maybe Slart the linear guy would chime in and tell us why this software is, in fact, questionable.

And it was cool. I feel categorically chimpanzeed.

Didn't Nabokov see the germ of "Lolita" in a newspaper report about an ape in a zoo being taught to draw by a scientist and then drawing the bars of his cage in charcoal.

Which is also cool but may have nothing to do with anything here.

"Jones instead input a letter of recommendation, substituting 'risk of personal injury' for the student's name."

Speaking of intellectual dishonesty, I'm pretty much dead sure I read an article on exactly this subject, using exactly this anecdote, about a year ago. Yeah, see this from last August, for instance (Bugmenot may be required). A straight rehash of the same story, same anecdotes.

Given the lack of sourcing, this strikes me as being somewhat uncomfortably past the plagiarism line.

Gary -- interesting. And I liked the additional info:

"Guessing that E-rater may associate the use of unusual words with a quality essay, he substituted chimpanzee for the to yield this:

"It is with chimpanzee greatest esteem and confidence that I write to support Risk of physical injury as a candidate for a faculty position. I have known Risk of physical injury in a variety of capacities for more than five years, and I find him to be one of chimpanzee most eloquent. ..."

The addition of the chimpanzee raised his score to a 6. The computer told him his writing "clearly identifies important features of the argument and analyzes them insightfully, develops ideas cogently, organizes them logically," among other praise."

The computer's feedback is delightful.

Grading problem sets isn't much easier if you want to do it right, i.e. usefully correct the mistakes in addition to merely marking them wrong. The one advantage is that the problems are generally so much shorter that, even though they're also more numerous, you can grade the damn things faster.

[I still want to claw my eyes out with a hammer after an hour or two of grading, though.]

The one piece of advice I would give any student taking a class where the tests will largely be problems is, "Neatness counts."

When I graded papers in finance classes, I was much gentler, fairly or not, when the work was laid out clearly, so I could spot the source of mistakes quickly, than when I had to decipher a lot of disorganized scribbling.

The trickiest problem I have in grading isn't so much evaluating the students' errors as much as it is exercising tact. I'm hoping that my comments will actually help them to improve their writing, and so I spend hours trying to come up with useful, friendly ways of showing them why what they're doing doesn't work. But then, I'm a poorly paid and easily replacable grad student giving lavish attention to about 12 students per class.

This post was okay I guess. Needed more "chimpanzee" in it though. I give it a 3 out of 6.

Doesn't the ability to mark essays imply the ability to generate essays?

For example, it's easy to generate random plausible sentences from a grammar and subject dictionary, and shuffle them about; given marking software, one could anneal a good essay out of the result. If this were real, it would be a revolution in AI.

Peter, I don't think that works given a discrete point scale.

The main problem I see with software like this is that it pushes a normative style of writing on students and could not tell the difference between someone who had poor word choice and grammar from someone who was breaking rules to some purpose.

I'd be all for software that could recognize problems with grammar and mechanics and train the students out of them. I'd be even more grateful for primary and secondary school teachers who taught the basics of language to their students before they end up in my comp class and expect a lowly grad student to give them ten years worth of grammar and rhetoric in ten weeks.

I can give you some easy ways for college freshmen to improve their writing:

-They need to read more than they do. Reading skills are crucial to writing skills. And they need to read widely. High school and lower division students shold be minimally competent to read both science and literature. This will also give them a decent vocabulary. Being able to think and express subtle nuances go hand in hand.

-They need to think about what they read--about the content and the rhetorical strategy used to convey that content--and not just learn to express how they feel about what they read.

-They need to be forced to express those thoughts and learn to defend them--to think critically--before they get to freshman comp, and most certainly before they get to their upper division classes.

So I guess the question for me is how we can do a better job of helping students attain minimal competence before they get to Comp 101 so that I can spend time making them articulate rather than having to settle for making them barely intelligible.

Here's a thought for a first step. Require all public school students to learn a second language and quit slashing budgets for language classes other than Spanish.

What nous said. Amen, and amen.

How startling to see Louise linked! Thanks for reading. Of course, I might post more if I didn't have so many papers to grade. :-)

A criminology paper resulted in a nuanced evaluation offering feedback such as this: "This paper does not do a good job of relating white-collar crime to various concepts in labeling theory of deviance."

This is total BS. Speaking as a computer programmer with an educated fan's interest in cognitive psychology and AI, I'm confident in stating that with our current level of technology, we cannot write a program that can actually understand and analyze the semantics of an essay on criminology, let alone rate the quality of the argument in said essay. I'll guess that the program is probably applying algorithms such as looking for certain terms in the vicinity of certain other terms, rewarding the presence of selected key words, probably checking syntax, and other such tests which have nothing whatsoever to do with the meaning of what is written, or the quality of the writing.

I'll join with others above in calling the marketers for the products liars. No one who actually understands the state of the art in AI, and understands what good writing is about, can actually believe we are at the point of writing programs that can tell good writing from crap.

...and, lest it be taken wrong, my comments about secondary school teachers should not be seen as an attack on their importance or their competence. I have friends who are and were secondary school teachers who do a wonderful job of teaching their students. The problem I see is with a significant number of parents who believe that education is a commodity which can be given to their children rather than a process in which they and their children have to participate.

Bernard: The one piece of advice I would give any student taking a class where the tests will largely be problems is, "Neatness counts."

Oh, I do. I've given people 0 on their homeworks before for outright illegibility; the professor was actually astonished that I hadn't done so before then, as I'd literally been giving myself a migraine trying to tease some meaning out of problem sets that looked like they'd been vomited up after Jackson Pollock's midnight calculus bender. No, I'm talking about mistakes so egregiously awful that I wanted the people thrown out of the department. Like that time I was grading for putative math majors who, it seemed, didn't understand basic algebra. I figure, if you're signed up for a 400-level math course and you write things like "(x+y)^2 = x^2 + y^2", you should be publicly eviscerated by Karl Friedrich Gauss' mummified toe-nails. Worse yet, though a little specialized, is when math majors-to-be attempt to prove something like "the sum of the first n odd numbers is always a perfect square" by saying "1 + 3 = 4 = 2^2, therefore the theorem is proved." You see enough of those, you start to go a little crazy.

As an aside: I note with some satisfaction that, by the end of that semester, not only did I prevent four people from going any further in this department, I believe I actually cost one of these students their job as a mathematical advisor elsewhere in the university on grounds of flagrant incompetence. If I'd actually been a TA for that course, I would have tried to save them; as a grader pure and simple, leaving copious notes on every problem set only to have them incessantly unread and ignored, I'm just glad I minimized the damage they were able to do.

[The flip-side of hilzoy's remark about students feeling warm and loved about markings on their paper is that they're generally not thrilled when there's more red ink on the page than black. Which routinely happened in the course I'm describing above.]

nous: The problem I see is with a significant number of parents who believe that education is a commodity which can be given to their children rather than a process in which they and their children have to participate.

Worse, for me, are the students who believe that a diploma -- and more specifically, a grade -- is a commodity that they have paid for rather than a signifier of their academic achievements. [Business students are particularly bad in this way IME.] They're all but unreachable when it comes time to explain to them why they got a D in the course or, heaven forfend, an F.

Slarti: The minute you start believing what the marketers are telling you, you're lost.

*awaits the arrival of crionna and Bird Dog with some interest* ;)

Catsy, it's not clear to me that the "chimpanzee" test was reasonable.

It's perfectly reasonable. Along with the usual assaults on an app's dignity like boundary tests, part of the testing process is to /do/ things to try to break the software that make you and I go Whiskey Tango Foxtrot. Developers will give more weight to fixing bugs caused by something a reasonable person could reasonably try to do, but I don't see how it's unreasonable to expect people to try to game the system.

This software is either a fraud, or poorly-tested. Or both.

"I'll guess that the program is probably applying algorithms such as looking for certain terms in the vicinity of certain other terms, rewarding the presence of selected key words, probably checking syntax, and other such tests which have nothing whatsoever to do with the meaning of what is written, or the quality of the writing."

Isn't this actually what a human does to a first approximation? Checking for syntax and a familiarity with basic vocab/concepts/facts?

My idea of unit testing is not to apply test data of a sort the code will never see (say a bunch of words picked by a chimpanzee, or a human-selected excerpt from a million-typewriter/million-chimpanzee room - "Look, a chimpanzee wrote this, and the software passed it!"), using a several-year-old version of the code. And as I claimed above, the liberal use of "chimpanzee" in an essay is not an indication of a problem, except perhaps for those who don't like chimpanzees, or perhaps for those who do like chimpanzees.

1 + 3 = 4 = 2^2, therefore the theorem is proved.

I've been trying to teach my eigh-year-old how to find the next square, by some algorithm like:

N^2 = 2*(N-1)^2-(N-2)^2 + 2

which, when you simplify it, unsurprisingly reduces to 0=0. Still, it's handy if you don't have a calculator and can't do the multiplication in your head. Take the difference between the highest two squares that you know, add two, add the result to the highest square, and you have the next one in the sequence. And so on.

My idea of unit testing is not to apply test data of a sort the code will never see

Isn't it better to do both positive and negative testing? Test it to make sure it does what you want it to do, and then test it to make sure it doesn't do what you don't want it to do.

Isn't this actually what a human does to a first approximation? Checking for syntax and a familiarity with basic vocab/concepts/facts?

The difference in difficulty between writing a spell checker, or program to verify certain vocabulary is being used, or even a syntax checker and writing a program to check that certain concepts were understood and discussed coherently is immense.

You can not gloss over that that easily.

Peter is right, perhaps not in the details of which method to use, but correct in the fact that if there were really an efficient algorithm to rank a paper (a fitness function), computers could quickly be trained to write as well as most humans.

My idea of unit testing is not to apply test data of a sort the code will never see

You don't think students would quickly learn tricks to game the grading system? You're wrong. That is exactly the type of data the program would see.

we hit on the idea of making a list of words that could never, ever appear in an essay on Dante, and each pledging to use one in our final. I think I had 'bathysphere'; someone else had 'platypus', and 'velcro'.

Heh, we did the same thing in our sales training course both rating the word used and, of course, betting on the results.

Slarti: The minute you start believing what the marketers are telling you, you're lost.

*awaits the arrival of crionna and Bird Dog with some interest* ;)

Yup, that's me, always exagerating what our engineers tell us the stuff will do, hoping not to be caught. That really does wonders for my credibility in the industry...(shakes head and wonders how other salespeople keep their jobs).

Anarch: the upside to extensive comments isn't that the students feel warm and loved; it's that they are less likely to think you're being unfair if they can see exactly why you said what you did. (I used to hate it when I would get papers back saying 'A; good job', or 'B; needs work'. What work?, I'd think? Tell me what it is I need to do so that I can do it, instead of just saying: something here is not all it should be, but I won't tell you what! Since large chunks of my teaching practices are an attempt not to do the things that drove me nuts as a student, I always wrote long comments, and figured out only later, when comparing notes with others, that the number of students who thought I was just being unfair or unreasonable was way lower, in ways that (knowing the people I was comparing notes to) I couldn't put down to actual greater fairness on my part.)

For what it's worth, back when I used to write software for a living, in the distant past, I always checked for nutty errors, like entering all punctuation marks in an address field. I thought it was part of testing to make sure that my software would respond appropriately to more or less any ridiculous thing someone threw at it.

And if an essay in which 'chimpanzee' was substituted for 'the', which is not a noun, got a 6 out of 6, the software can't be checking syntax all that carefully.

"I thought it was part of testing to make sure that my software would respond appropriately to more or less any ridiculous thing someone threw at it."

Back when I was a paid software tester, I never found software that wasn't full of breakable bits when thrown enough ridiculous keystroke combos at various points.

"Isn't this actually what a human does to a first approximation? Checking for syntax and a familiarity with basic vocab/concepts/facts?"

A computer can check for syntax. It can check for the presence of vocabulary words, though not whether they are used correctly in context. Being able to check for actual understanding of concepts or facts would be a huge, world-altering advance in AI. If such an advance comes in our lifetimes, it'll have a lot more profound applications than grading papers. Being able to judge the coherency of an argument or the stylistic elegancy of an essay is probably another step beyond that, although I'm not sure how big of a step.

I agree that as soon as these programs become common, there will be widespread dissemination of methods to game the system. Of course, I'm not sure you really can write the programs to avoid this, which renders the question of testing to catch such data rather moot.

(I used to hate it when I would get papers back saying 'A; good job', or 'B; needs work'.

As an undergraduate I once got an English paper back with a B-. The only comment was "incisive and well-written." That and the professor's name are the only things I remember about that course.

There was one time when I didn't mind a short comment: my first ever philosophy paper, which I worked really hard on even though it was only 2 pages long. The section leader was not the sort not to write comments; I had seen her handiwork and knew that she could be blistering, though I had never myself been graded by her at all. My paper came back: A+. Well put.

I walked on air for weeks.

Note that the spam/anti-spam wars are an analogy. And note that for the the crime-essay software described in the article, there is in fact a list of concepts/facts/vocab chosen by the prof that is checked against. Again, the article is thin, but I don't think one could easily write an essay that would pass that software without knowing something about the subject matter and having a reasonable command of English. Remember, one is not trying (at this point) to write code as good as a professor or high school teacher - one is trying to force students to write on-topic essays. One is trying to raise the bar high enough to make the effort of gaming the system higher than playing by the rules. Anyway, I would guess that simply having a human read a random sentence or two from some percentage of papers should satisfy the skeptics about this point.

Of course in the long-term this sort of thing will produce an essay-writing program - but the computing power to do so will be enormously greater than to grade essays. It's just not the case that it's as easy to check the proof of a math theorem as it is to write it; it's just not the case that it's as easy to write thirty beautifully varied iambic pentameter lines as it is to appreciate their quality.

Apropos:

Met with my advisor the other day to go over a conference paper I gave him that would eventually be turned into a chapter. He said that it was ‘better than ok’, which is the most positive comment I’ve ever gotten from him. Much better than when I was writing my MA, when he’d give me back drafts with comments like “don’t ever give anything of this quality to me again ever”.

Also, what do people here think about Searle's Chinese room?

I've never really understood the Chinese room. The description of the program seems incomplete. Is it purely symbol manipulation with no other inputs? Suppose the message passed in says "What is the temperature in there?" How would the program answer that simply by manipulating symbols? Maybe it could just pick a reasonable number at random, since the questioner wouldn't know if the answer was right or wrong.

But then suppose you ask, "What color ink is this question written in?" Is that sort of question not allowed?

No doubt I'm missing something.

"I always wrote long comments, and figured out only later, when comparing notes with others, that the number of students who thought I was just being unfair or unreasonable was way lower, in ways that (knowing the people I was comparing notes to) I couldn't put down to actual greater fairness on my part.)"

Exactly parallel to one of the most important findings in the medical malpractice world, namely that the best predictor of whether an M.D. will get sued a lot or a little is not their training, their competence, or their actual rate of errors: it's their bed-side manner.

It's bizarre that a thread that is so close to home is so difficult to post to. My entire working life revolves around trying to get students to speak and write and read English. The back and forth about the ability to game the testing software versus intellectual dishonesty is fascinating. I tend to lean towards rilkefan. One can put in a huge amount of effort in trying to get students to understand their mistakes, but most of them don't really understand. I assume this because I can see the same mistakes reoccur. At first, I thought it was my teaching, then I thought it was laziness, now, I realize that the mistake simply does not make a sufficient impression on them. It's like the difference between having a textbook that has a green cover versus a blue cover. Now, if we transpose that question of color to an article of clothing, voila, instant impression.

This became most vivid when I realized that students didn't see a difference between English written in one-byte characters and written in 2-byte (The difference is a little like having words written in a different font in the same sentence) It baffled me that something this obvious was completely ignored by students, but it got me to thinking about this. Related to this is Oliver Sack's various discussions about how diseases such as Tourette's syndrome must have existed throughout human history, but until they were identified and named, no one recognized them as such.

So the key point (at least with my students) is having the mistakes (and their corrections) make an impression on them. One way to do this is to get enormously worked up over the mistake. I can do this for some mistakes, but to get worked up over every mistake is a recipe for mental disaster. I have a small subset of errors that are sufficiently common among Japanese students that I tell students that I will scream if I see them. And I do. Some of them can't really be called errors but are simply rules of thumb. I worry a little that students are going to think that these rules of thumb are cast iron diktats from the grammar god, but some of these things really do get under my skin.

If I could have a computer program that would 'scream' every time a student types 'and so on' in an essay (it mirrors the Japanese construction, so you get sentences like 'I like basketball, baseball, and so on.') I would pay good money. To me, this software is like that. It picks some rather obvious mistakes and screams when it gets them. That it is not self-aware to know when some tester is taking the piss out of it is not something that should be taken as a fault, unless this is an episode of Star Trek.

The danger is that students may assume that the value assigned by the machine is a positive attribute rather than the absence of negative attributes. In truth, this is where I lean towards hilzoy's point, in that the descriptors that are often used for this kind of software suggest that the submission has something special when what it really has is no obvious errors.

There's also quite a bit of research concerning learner preferences towards correction. The fact is that there are some learners who thrive with massive amounts of correction and others who simply switch off. Targeting correction is something very important, and often teachers think they are doing their students a favor by going over every point when what many (most?) students need is someone to focus on one or two problems and correct those. If one thinks of a sport or learning to play a musical instrument, it is easy to see that a data dump of everything that the person is doing wrong is not going to be all that helpful. This is why teaching children seems easier (because the range of information is restricted and it appears that it is something that everyone knows) but is actually much harder (because the teacher has to discern how the child is taking the information on board) Unfortunately, in our society, we assume that the person with the most knowledge deserves the highest rate of pay. The ideal system, to me, would be a triangle, with people teaching basic skills getting paid more money and people teaching higher, more rarified skills getting less, but our system of education works in exactly the opposite way and it is difficult to imagine it will ever change.

Slarti: I've been trying to teach my eigh-year-old how to find the next square, by some algorithm like...

Even easier: suppose you know N^2. Add to it the number whose square you've got (N) and then the next number, aka the number whose square you want (N+1) and bingo! There ya go: that's the next square.

So, f'rex, if you know that 14^2 = 196, add 14 to get 210, then 15 to get 225, and that's it! 15^2 = 225. And, in fact, I actually use this method to compute squares in the 20-30 range, since I can never remember the square of, e.g. 28.

[This is just the dead simple (N+1)^2 = N^2 + 2N + 1 = N^2 + (N) + (N+1) converted into English. This, along with the fact that the square of a number N is equal to one more than the product of the number larger than it and the number smaller than it, were the first two "theorems" I remember proving as a kid.]

hilzoy: Anarch: the upside to extensive comments isn't that the students feel warm and loved; it's that they are less likely to think you're being unfair if they can see exactly why you said what you did.

Trust me: in math, at least, a plethora of comments generally increases the likelihood that the students think you're being unfair.

tonydismukes: A computer can check for syntax.

It's been a while since I followed computational linguistics but I'd be very surprised if a computer can adequately check for things like anaphors and referents in English, given that we ourselves still don't have an adequate theory AFAIK.

rilkefan: It's just not the case that it's as easy to check the proof of a math theorem as it is to write it...

Don't you mean that the other way around? It's hellishly difficult to concoct a theorem; it is, however, quite straightforward to check whether a given proof is correct.

[To be uber-precise, proof-checking isn't just computable, it's polynomial, at least in theory. Proof-creation is either c.e. or undecidable depending on the amount of metatheory you allow into the system.]

Interestingly, they've just started creating "automated proof machines" and I think one of these little buggers has just come up with an "interesting", "non-trivial" result, although I have no idea what it is. The reference was on the Foundation of Mathematics email list, if you want to root around in the archives there.

Anarch - just checking to see if you're awake. When I was a kid I decided one day to write a program that would come up with geometry proofs. I decided a few days later to reassess my intellectual abilities.

At first, I thought it was my teaching, then I thought it was laziness, now, I realize that the mistake simply does not make a sufficient impression on them.

One of the central problems in my mathematical teaching career is, I think, due entirely to that. We're producing a whole generation of kids who simply don't know algebra, don't understand numbers in any depth, and think that math reduces to symbol-pushing on their calculators. My particular take the past few years has been that the problem -- especially at the college level, but also well before that I think -- is that students don't receive enough negative feedback, as in "This is wrong, this is unacceptable." Now the negative feedback has to be augmented by positive feedback, of course, but the point is we're really really really good at telling our students what to do and what works... and spectacularly lousy at a) telling them what doesn't work, and b) why it doesn't work.

[Of course, the real culprit is that students aren't doing enough of the right kinds of work, i.e. tackling problems that cannot be solved by mere rote application of the material from that section, so that they're not discovering b) on their own... but that would take us too far afield for the nonce.]

Targeting correction is something very important, and often teachers think they are doing their students a favor by going over every point when what many (most?) students need is someone to focus on one or two problems and correct those.

Again speaking only for my experiences teaching math, my general impression is that most students have a couple of key concepts that have gone disastrously wrong in their heads, which spread like a contagion to the rest of their understanding of math.* If you can somehow target those errors -- which is much, much easier said than done -- and help them work through the reason why it's wrong, you can turn people's mathematical careers around almost literally overnight.

The crux, though, is that they have to be the ones who do the work. Simply yelling that "(a+b)^2 is not equal to a^2 + b^2!!" doesn't work. [Trust me: we've tried.] You have to explain it, slowly and with great patience, until they finally get it... and then, IMO, hammer the ever-living f*** out of them if they do it again.

And at some point, it finally sticks. Beats me when that is, though; I haven't taught a high-enough level course in several years to know.

* The top five, in approximate order of sophistication: 1) not understanding numbers, e.g. how to add fractions, beyond the mere algorithm for doing so; 2) not understanding the central premise of algebra, namely that the variables "stand for" things but that those things must be declared before you can use them in any meaningful way; 3) not realizing that "equations" have two sides and that the equality symbol has a meaning that cannot be omitted (which in turn screws with their understanding of inequalities); 4) not understanding what a graph really means; 5) not understanding that functional operation, e.g. sin(x), is not the same as multiplication, e.g. sin * x. The fundamental error, by and large, which underlies all of this is thinking of mathematical operations as a syntactic calculus of symbolic manipulations rather than actions on abstract concepts like "2" or "f(x) = x^2".

Anarch--
"This is just the dead simple (N+1)^2 = N^2 + 2N + 1 = N^2 + (N) + (N+1) converted into English"

Those of us who are more concrete may find it easier this way. You've got a square chessboard with a bunch of squares in it, and you want to make the whole board one-square bigger in each direction. So first you add a row of squares along one edge, and you've got a rectangle. Then you add a row of squares along an adjacent edge, and you've got your bigger square. Only the second row you add will be one longer than the first row you added, 'cause you're adding it on the long-side of the rectangle.

(One benefit of thinking of it this way is that it helps show how to generalize the method to finding the next cube, the next ^4, and so on.)

(One benefit of thinking of it this way is that it helps show how to generalize the method to finding the next cube, the next ^4, and so on.)

In theory, sure. In practice, I'm not convinced that anyone's going to be able to winnow out

(N+1)^3 = N^3 + 3N^2 + 3N + 1

from the geometry of the situation any easier than just memorizing the formula.

You could do it, though, if you rephrase your chessboard analogy as follows: if you want to augment your chessboard to be one-square bigger in each direction, what you do is first extend the top by adding a strip of squares the same length as your original board (N), then extend the right side by adding a strip of squares the same length as your original board (also N). This isn't a square, though, because you're missing the top right corner of the new chessboard, so you have to add one more square (1). And there's your new chessboard!

[Mathematically, (N+1)^2 = N^2 + N + N + 1.]

Now, to explain cubes, you can do the following:

  • Start with a cube of side N, and, for the sake of argument, suppose we're looking at it straight on. We want to augment it to be a cube of side N+1.
  • Note that each face looks like a square of length N.
  • To augment the cube, first extend three of the faces -- the top, the right and the front -- by adding another copy of the square on top of them. [That's 3N^2.]
  • If you think about it for a moment, you should realize that this isn't a cube because extended faces don't touch each other; instead, you have three grooves between each of these pairs of faces. We'll fill each of those grooves in with a line of blocks, each of which will be as long as the squares. [That's 3N.]
  • Finally, we're missing that dratted corner piece again, so fill it in. [That's 1.]
  • And now we've got a cube of length N+1!

[This is a hell of a lot easier to understand with pictures. Or Legos.]

The real advantage of this approach, for a mathematician, is that, as an added bonus, it also explains where the coefficients of the expansion for (N+1)^k come from: they're the number of ways of "filling in the gaps" created by the j objects of dimension (k-j), i.e. they're just k choose (k-j) = k choose j, as expected.

Anarch-

The next time I give the first test of the semester in Accounting I, I am certain that there will be students who cannot grasp what I have been teaching for the first weeks: For every transaction Debits = Credits. I say this every day. I repeat myself. I get others to repeat it. Still, there will be folks who find one line with one dollar amount sufficient and their grades reflect the half-done nature of their test.

Often I find that they have managed to drop the class before I finished grading the test.

Anarch--

I always visualized it as follows:

start with your cube. Now slap another square, same size as any face, onto the bottom of it, turning it into a refrigerator-box. Now slap a rectangle onto an adjacent side, so it covers up the old face plus the new bottom edge. Now slap a rectangle onto a side adjacent to the first two, so it covers up the old face plus the two sides sticking out. And you've got your bigger cube.

The cube is n^3
the square you slap on the bottom is n^2
the rectangle you slap on one side is n by n+1 (=n^2+n)
the final rectangle you slap on is n+1 by n+1 (=n^2+2n+1)

and that's also n^3 + 3n^2 +3n+1

And I think mine is a little easier to visualize, or at least I remember lying in the top bunk as a wee child and seeing it on the ceiling.

Even easier:

I die. My body liquefies and soaks into the ground, poisoning water and crops. The village eats and drinks, and all die. Oh, the humiliation.

Those of us who are more concrete may find it easier this way.

I...err...die. My body turns to dust, blown by the wind and inhaled by others. All die. Oh, the humiliation.*

I should have realized that 2N+1 would get me there. But Tad's explanation was even more intuitive, which works best for me.

*Paraphrased from...maybe Joe Haldeman? Anyway, not mine.

And for clarity I should have referred to that final rectangle as a *square*.

Just doing the algebra seems easier. Plus you don't get covered up with glue from pasting objects together.

We're producing a whole generation of kids who simply don't know algebra, don't understand numbers in any depth, and think that math reduces to symbol-pushing on their calculators.

Hell, I'm from the last generation, and I'm constantly running across things I ought to have been taught in middle school, if not earlier. Part of the problem is how you build up the structure of mathematics without completely losing the kids. If you start with algebras, sets, operators, mappings, etc (which is what they try, and fail, to do) then you lose the kids. I don't have a better approach, other than to teach them the rules of mathematics early, and then go back over why the rules are there later on.

I am certain that there will be students who cannot grasp what I have been teaching for the first weeks: For every transaction Debits = Credits.

I think there's lots of analogues to this. In navigation, young engineers constantly forget that accelerometers don't sense acceleration, but instead sense specific force (which, didn't Einstein have something to say about that?). The consequence of this is that you have to correct for notional acceleration of gravity when integrating the instrument outputs. Earth rate is slightly different, but also needs to be accounted for. At some point, the notion of the gyrocompass sort of clicks together all at once. That's one of the many reasons why military and commercial aircraft sit still on the tarmac for several minutes: so they can figure out where they're pointed relative to North. It's a relatively simple concept that nonetheless stymies people, until (and I'm stealing this concept from someone whose name I cannot recall, probably Robert Pirsig) their intuition gets trained.

I remember one of my colleagues in London talking about how he had spent a whole term lecturing on action-theory, making frequent and prominent mention of the basic desire-belief (or "two-factor") theory of action: action arises when a standing and antecedent desire expressed in general terms meets up and combines with an occurrent belief about a salient desire-satisfier in the local environment. "I'd like some sort of cool drink," says desire, and "lo!, I spy a glass of ice-water sitting there on the table," says belief, and off I toddle for my drink.

My colleague was reduced nearly to tears by the number of students who had written on their exams variations on the following: "actions can be produced in either of two ways. Sometimes they arise from desires. Other times, they arise from beliefs."

So I agree with Slartibartfast's comment "I think there's lots of analogues to this. " The trouble is, the analogues may be so thick on the ground that there is no particular shape or strucure to the ways students can misunderstand--nor any apparent limit, either.

Slarti, I actually remember that story from its original publication in Analog(?) wow. that really takes me back. I can even remember that a character ended up taking mescaline and was having attacks of cabbageness.

note: don't ever threaten a culture you don't understand with the imposition of negative values. you might not like the outcome.

p.s. teaching sounds like really hard work. hat tip to those of you willing to do it year after year. there may be no profession more important (after, of course, lawyers).

two points immediately jump out:

although the thought, especially this week, with 120 papers on my desk, of shunting off grading to a computer sounds divine, one of the things i really enjoy is writing random bits of joy, happiness, excitement on the margins of papers. i tend to see myself as an amusing grader, and i do not want to give this up. much like the professors who put in lots of work on comments and encouraging students in their writing (few and far between they may be ...), i find grading a means of personal expression. and part of this personal expression is my ability to show my students i can be amused by what they write, or impressed, or they can even make my day or week when they get something they've struggled to understand all semester.

secondly, my students are being trained by my state (virginia) to write to written Standards of Learning tests already. they already think there's a right and a wrong to everything. i'd hate to lose even more student creativity if they thought they had to write to an algorhythm.

About getting students to remember stuff: I don't know if I've said this before, but: humor is as memorable as anger. I use it a lot, being more disposed to humor than anger to start with.

One of the things that always used to bug me about students' writing (in general, with exceptions) was that so many of them seemed to use words without any idea of their shades of meaning, and sometimes without any idea of their meaning at all. (The words they used must just have sounded like a word that fit where they put it, or something.) Even when they didn't use a word that was wholly inappropriate, they tended to use words like: refute, reject, rebut, etc., interchangeably, as though all these words did was express some thought like: this argument: ugh! And I thought: this cripples them, as surely as it would cripple a carpenter not to make fine distinctions among tools, but just to sort them into a few crude categories: the cutting things, the hitting things, the sticking-together things. (And, side note, this is one of the things I always found incredibly funny about political correctness as it manifested itself then (early 90s): these kids who had no appreciation of English as a tool for the communication of subtle thoughts suddenly developed, in one specific area and nowhere else, a hypersensitivity to the most arcane, almost imperceptible, shades of meaning. Viewed in that light, it was so odd.)

Anyways: the part of this that involved completely misusing words was of course the worst, and one batch of papers that was particularly bad in this respect pushed me over some edge or other, and when I handed them back I read aloud some of the worst sentences, and, pretending to make the charitable assumption that the author had used that word deliberately, wondered at length what each of them might mean.

So, for instance, several people had referred to 'tenants' of various views, and I went on this long riff about how perhaps the author had intended a contrast between two sorts of claims: those that were property-owners of the view, and those that were tenants. The former, of course, are responsible for the maintenance of the view. They have to take care of it, and keep its foundations strong, and pay taxes; while the mere tenants have no such responsibilities, but can be evicted at will if the view finds them inconvenient. Etc., etc. I think I was sort of funny about this; in any case, the students were in stitches.

Afterwards, some of the students said to me that they thought that was mean of me. I said: no, I didn't identify the students, so only the student who had written one of the ridiculed sentences would know that I was ridiculing him or her. And of course no one should hand in work without being willing to stand behind it. They seemed to accept that.

So before the next paper, I said to the class that I was going to do this again, if anyone turned in any similarly amusing sentences, and that I did not think this was mean, since after all no one had put a gun to their heads and forced them to submit work involving the ridiculous misuse of language. And the thing is: that particular sort of error just vanished. Plus, I really think that the long riff on e.g. tenants might just have lodged the tenet/tenant distinction in their heads permanently, since humor is vivid.

Anarch: what does it mean for someone not to understand numbers?

Last fall I took in a batch of papers by email--I'm doing that more and more often, just having students skip the paper altogether.

Then, with the permission of all the students in the class, I concatenated all of their papers into one long file, stripping out any possible identifiers. (This included, for instance, making a few cosmetic corrections in the English of one very bright non-English-speaker, because her errors would have identified her).

Then I made extensive comments on all of the papers--enough that at the end of it, my word count in the file was larger than their word count.

And I emailed the whole file back to every student. Plus, on the side, I sent each one of them a grade for their individual paper.

Every student saw what every other student had written, plus how I had responded to it. They got to see which papers were successful, and why, and to see how I argued with all of them.

They told me they liked it, and I want to do it again. It was *incredibly* time consuming, but grading always is.

Hilzoy, re your technique:
"Afterwards, some of the students said to me that they thought that was mean of me."

Yeah, that just wouldn't have worked for me. Again, I think you have much better bedside manner, or you are just generally nicer, or something. But I find that students are very quick to shut down and turn away in response to anything *approaching* ridicule, even anonymous ridicule. I'm glad you made it work, but I would classify that one as a technique for experts.

they tended to use words like: refute, reject, rebut, etc., interchangeably, as though all these words did was express some thought like: this argument: ugh!

I understand, and would probably agree with your assessment were I to read the same papers. And yet, one vivid memory of my dissertating years was the perpetual struggle to bring some sort of variety to sentences and paragraphs that (necessarily) expressed quite similar thoughts. It's hard not to leap to the thesaurus when the alternative seems to be to use the same word three times in three consecutive passages, even if the shade of meaning of the second or third choice isn't precisely right.

What a relief it was for me to get back into programming, where if the logic calls for three IF-THEN statements in a row, one simply writes three IF-THEN statements in a row, without worrying about whether one might offend the compiler's aesthetic sensibility.

As for the subject of the actual post, I have nothing to add, except perhaps a round of applause and a hearty LOL for Twench's comment.

Part of the problem is how you build up the structure of mathematics without completely losing the kids.

Part of it, I think, is making it matter (and therefore interesting) to the student right then and there in itself, not as the basis of some future skill. Math is the most abstarct subject in elementary/middle school, right?
Monstessori "hands-on" sneaky ways of teaching math seem interesting.
Also methods which start with presenting the real-life problem first.

But I find that students are very quick to shut down and turn away in response to anything *approaching* ridicule, even anonymous ridicule.

I've had better success with this sort of thing when the sentences come from anonymous former students or, better yet, published writing.

Tad: And I think mine is a little easier to visualize, or at least I remember lying in the top bunk as a wee child and seeing it on the ceiling.

I was going to object to this as not generalizing to higher dimensions, but I think it actually does. The underlying algebraic gizmo is

(N+1)^k = N^k + N^{k-1} + (N+1) N^{k-2} + ... + (N+1)^{k-2} N + (N+1)^{k-1}

which can be checked (I think) by comparing coefficients in the expansion -- there's a nifty identity involving sums of nCr that you can use here without too much difficulty -- although there might be a slicker way. So yeah, that works just fine. I still prefer my way because it shows precisely where the coefficients come from but hey, if it works for you, run with it. :)

hilzoy: what does it mean for someone not to understand numbers?

Let me think about how I want to phrase that. It's a little hard to articulate.

But I find that students are very quick to shut down and turn away in response to anything *approaching* ridicule, even anonymous ridicule.

It's partly my personality but I've found that only to be true of students who lack a certain level of self-confidence. Gentle mocking or chiding can sometimes work wonders, though, once they get their feet under them.

Anarch--

Yeah, there's this rapport thing. Establish it early and make it strong, and you can get away with the "gentle mocking or chiding". But even then, the level of joshing that Susey is comfortable with makes Danny uncomfortable as he watches. Unlike math, there are no general formulas.

Oh, and aren't all those coefficients called the "binomial expansion", or "Pascal's triangle", or something? It's been too long since I've done math.

But even then, the level of joshing that Susey is comfortable with makes Danny uncomfortable as he watches. Unlike math, there are no general formulas. [Emph mine.]

ooooooh. You're just trying to make me mad, aren't you? ;)

But yes, you're quite right. Rapport is key, as is a keen sense of the appropriate (or, in my case, what I can get away with). I managed to nail it this year; whether I do so next year is anybody's guess.

Oh, and aren't all those coefficients called the "binomial expansion", or "Pascal's triangle", or something? It's been too long since I've done math.

Binomial coefficients, yes, and you can arrange them in Pascal's triangle. I haven't crunched the details out yet, but I'm fairly sure that your "adding differently sized rectangles" technique a) works and b) codes a summation relation between the binomial coefficients, much as my explanation above codes the origin of the binomial coefficients. Good stuff, although discrete math isn't really my strong suit.

"discrete math isn't really my strong suit."

but I thought discretion was the better part of valor?

The comments to this entry are closed.

Blog powered by Typepad