The current "Able Danger" story is interesting, but I have resisted commenting on it for the same reason I don't comment on reports that bin Laden has been captured--initial reports on sensational subjects may differ greatly from final reports (see also "we found WMD"). But I was reading this comment thread at crookedtimber and one of the comments really struck me--because it reflects a common complaint, and it crystalized for me why the complaint is often wrong.
Henry,
As I posted over at Drum’s, the thing to think about is this: British counterintelligence was consistently punked throughout the 30s, 40s, and 50s (and some fairly vital nuclear weapon secrets lost in the process) by consistently rejecting suspects/hypotheses because no matter what their preliminary investigations came up with everyone knew that people like Philby couldn’t be double agents. Too well-bred and all that. Except of course some of them were.So the miners build a statistical analysis model (something I greatly distrust) to find bad guys, and out pops the names of a few Undersecretaries in critical positions. What do they do? Discard those names, of course, because “everyone knows” that “those people” couldn’t be suspects. Right.
And I would be reasonably certain that if we went back and ran various federal agencies’ and investment banks’ securities fraud models against the WorldCom data from 1995-2000 that Ebbers’ name would pop out as needing further investigation. For all we know it did at the time. Do you think anyone investigated as a result? As another commentor at Drum’s said, only pee-ons are targets for data mining.
I have lots of libertarian-type concerns about governmental data-mining. But the concern that some of the rich and powerful might get around it doesn't really bother me. I'm not rich or powerful, and I don't really have a hope of being such. But I take it as a given that the rich and powerful get around things all the time. If it happens with too much frequency it should be minimized whenever possible, but the fact that it happens at all is not a particularly good critique of the system as a whole. The fact that Kennedy wasn't charged with manslaughter for his youthful indiscretion at 37 in Chappaquiddick doesn't mean that prosecutorial discretion is necessarily a bad thing. The fact that OJ got off for murder doesn't mean the jury system is all bad (it didn't shake my faith too much, though other cases since have made me want to rethink the jury system but I can't come up with something that is obviously better). The liberal equivalent would be--the fact that some people game the welfare system doesn't mean the whole thing is bad if it is well balanced otherwise. In short the fact that systems can be gamed to some extent is an expected fact in the real world. Of course we should try to minimize that fact, but that isn't a good critique of a system that does well for most people most of the time. So the question for data-mining shouldn't be "Will anyone ever be able to escape its net?" but rather "Is it useful enough for what we have to give up?" It isn't clear to me what the answer is, but I'm certain that asking the right question is a good first step.
Well, except for the fact that Saudis seem to be particularly connected to this whole terrorist thing. As a country, they are obscenely rich. They are, as a whole, exceedingly well connected - the classic "rich and powerful" in your exposition above. So, the excuse for gaming a system searching for terrorists your presenting seems to have a big practical hole you can drive a truck through.
Just saying.
But hey. It was easier to invade Iraq, so I'm sure it'll be easier to data mine some other non-rich and non-powerful population as well.
Posted by: Hal | August 30, 2005 at 06:58 AM
Thank you, Mr. Holsclaw. Much to discuss here.
It certainly seems right to me that heuristic methods are valuable even when they do not produce 100% results. This is a familiar fact from e.g. disease-screening tests with varying levels of false positives and false negatives. You try to use the methods with the fewest false readings, and then maybe do follow-ups on the positives with a method that has a different sensitivity-bias, etc.
These common-sense thoughts are less welcome to some when the subject matter impinges on rights. "Better a hundred guilty should go free than one innocent be convicted" is sometimes a way of saying that you will accept a certain tolerable ratio between false negatives and false positives, but too often is a way of expressing the absolutist position that *no* level of false positives is acceptable at all. On principle! (In fact, I don't think I have *ever* heard that slogan used to say, with grudging acceptance, "eh, I can live with one false positive for every hundred false negatives.")
So one issue is the way that our acceptance of statistically-based heuristics is politically complicated by absolutism about rights.
I take it that the CT commenter is complaining about something slightly different, though. His claim is that even when the screening-method functions accurately, it is embedded within a decision-procedure that may throw away information. So his suggestion is *not* that the data-mining technique won't catch the wealthy and well-connected. He thinks it *would*, and possibly *did*. It was at the stage *after* the data-mining that those results were tossed out. This is more like the case of a doctor who gets back a positive result on a cancer-screening, and then doesn't tell the patient for some completely unrelated issue.
Here the question is not about data-mining per se, or any other method of information-gathering, but rather about how that info is used by the people in positions of power. Considered only in itself, data-mining techniques may be *less* likely to overlook people in power, and *less* likely to dismiss them from consideration prematurely. The dismissal comes at a later stage.
So the thought that the rich and powerful get around things may be orthogonal to the discussion of data-mining.
But on that topic there's a nice quote by Francis Bacon that I can't find after a few attempts on the web, which goes roughly as follows: the law is a netting of such a texture that small villains pass through its holes, and wealthy villains rip through its fabric, while only those of a middle stature are caught by it.
Oh--and the Kennedy's are one of my least favorite dynastic families in America. These family dynasties never arise without severe damage to the country; we have got to nip this habit in the bud.
Posted by: Tad Brennan | August 30, 2005 at 08:06 AM
Exactly. No one has problems with false positives - it happens all the time in various diagnostic mechanisms and it's certainly something we can't get rid of. Tad's point about the dismissal *after* the fact is indeed the real point. The gaming isn't in the mechanism, but in the interpretation and execution once the mechanism has identified a subject.
And another thing that should be recognized is that the population is very small - we're literally talking about needles in very small haystacks.
In the population of murderers, a Kennedy here and an OJ there isn't a large percentage of the population. Likewise, we believe (and lets hope we're right) that false convictions are also a rare and low percentage of the population.
But in the realm of terrorists, the population is already quite tiny. So any removal after being identified has a greatly magnified effect. Gaming this system - after the fact, as Tad points out - can have a devastating effect on the actual results. What you're left with at the end seems to be largely false positives and trivial matters.
Which seems to be precisely the effects we've seen so far - i.e. visa violations and petty criminals being rounded up.
And - at least to my thinking - these effects aren't neutral. They have a rather bad effect on the very population we need to have trust us so we get good information and they waste the time of law enforcement which is desperately needed to track down the needles.
Posted by: Hal | August 30, 2005 at 09:17 AM
Sebastian, there are the rich and powerful and then there's the rich and powerful. Depends on which rich etc you're talking about. Martha Stewart might disagree with you,and Michael Milken, and Dennis Kozlowski. If in using the word powerful you modify it to politicaly connected you hit much closer to home in which case you don't even have to be rich. In archaic or pre-historic America the rich and powerful threw their protective wings around Alger Hiss who wasn't rich but was powerful because of the reach of those wings. You might say that in that early dawn{ really,how far back can modern memory go] the primitive data mining of the period resulted in the type of decision or initial result that you mention in your post.
Posted by: johnt | August 30, 2005 at 10:13 AM
My only comment on Able Danger so far is here.
Posted by: Gary Farber | August 30, 2005 at 10:33 AM
I'm with Hal and Tad.
The reason I haven't commented on Able Danger is that, back in June, I was trapped in DC traffic, trying to get back to Baltimore, and got to listen to an hour-long interview with Weldon on the subject of what was then his latest conspiracy theory. I hadn't really paid attention to him before that -- so many congressmen, so little time -- but by the end of the hour I was convinced that he was a raving lunatic, and that if he ever came up with another conspiracy theory, especially one that turned on no one else having listened until he did, I should assume that it was nuts until it was proven conclusively. I am still waiting for that proof.
Posted by: hilzoy | August 30, 2005 at 11:55 AM
"I'm with Hal and Tad."
But not with what I said? It doesn't sound as if you aren't. (Perhaps your post was written before mine; impossible to tell without time stamps.)
Posted by: Gary Farber | August 30, 2005 at 12:00 PM
I'm with what Gary Farber said.
Posted by: Tad Brennan | August 30, 2005 at 12:01 PM
I'm with what Tad said about what Gary Farber said.
Posted by: Hal | August 30, 2005 at 12:20 PM
I need a Venn diagram to keep track of who is with who.
But ditto the above -- data-mining is nuetral; it's the interpretation of the results where there is mischief.
Posted by: dmbeaster | August 30, 2005 at 12:48 PM
"I need a Venn diagram to keep track of who is with who."
Let's try to keep this thread separate from the polygamy thread, okay?
Posted by: Tad Brennan | August 30, 2005 at 01:14 PM
In lieu of an appropriate containment hierarchy, might I suggest an excellent blog on the subject of security and such?
Bruce Schneier is a recognized expert in these areas and is actually part of homeland security consultations and panels deciding such things as the terrorism watch list and other issues. His blog is an excellent resource and his opinion is likewise well worth the dollar...
And if I can slip in one more, I also strongly suggest John Robb's Global Guerrillas blog. Not precisely about data mining, but his networked and open source terrorism theories are gold Jerry. Pure gold.
Posted by: Hal | August 30, 2005 at 01:16 PM
GF: (Perhaps your post was written before mine; impossible to tell without time stamps.)
As I've mentioned on two other threads: Until the time stamps return, why not enter the time of your post in the "Name" field?
See Below.
Or at the end of your post?
Or not.
Posted by: xanax 2:19 EDT | August 30, 2005 at 02:20 PM
"Bruce Schneier is a recognized expert in these areas and is actually part of homeland security consultations and panels deciding such things as the terrorism watch list and other issues."
He's also, as I've noted a jillion times on my blog, an old friendly acquaintance of mine. I've been out to dinner with him and Karen Cooper (his wife) and three or four other mutual friends on a variety of occasions over the years. Which is certainly a reason to read him, eh? (Bruce seems to be in every reporter in America's address file next to "computer security" or "security" in general these days, and well he should be.)
Truth be told, I've hung out far more with Karen online, though, in the past. (That sf fandom is just full of interesting people to meet; or it used to be, anyway, particularly in decades past.)
Posted by: Gary Farber | August 30, 2005 at 02:41 PM
The other issue with data-mining is what's going to be done with the database once someone with fewer morals gets a peek at it. The ACLU already has a few charming examples, like the banker who sat on the board of the local hospital...and called in the mortgages of terminally ill people after he saw their medical records.
Now, what do you think some fundamentalist might do upon learning that with a single search he could get the names and addresses of every person in the US who has a prescription for AZT?
I'd rather not see the infrastructure for a totalitarian state built while listening to the builders tell me they're only going to look for bad people. Bad can have an extraordinarily flexible definition.
Posted by: alex | August 31, 2005 at 12:09 PM