The Spam and the Spider
I just finished reading a great book called The Starfish and the Spider. The book does a really good job of illustrating how decentralized organizations trump centralized companies time and time again. Examples cited include the Apache web server, Napster, Alcoholics Anonymous, and Skype.
I’m still trying to figure out how I feel about some of their examples. The issue I have is how they group different types of decentralization. The authors associate decentralized organizations like Alcoholics Anonymous, Apache, and Linux with distributed platforms like Skype and Napster. There’s a big difference between an organization and a technology platform and it doesn’t make sense to group the examples together.
That aside, I was most interested in how this theory applies to the problem of spam. I think spam is a classic example of an “evil” starfish organization like Al Queda. The more you fight it, the more virulent it becomes.
The authors cite three primary examples of fighting decentralized organizations: challenge their ideology, centralize them, or decentralize yourself. I don’t think spammer have much of an ideology. They’re just trying to make a buck. There’s really no “cause” here. I don’t see much to centralize either. However taking a decentralized approach to filtering spam seems to make a lot of sense.
Every major webmail provider has a spam box that allows you to mark spam. Surely the vast majority of spam hits most of the major webmail providers’ systems. If a person marks a particular address or piece of content as spam it should get blacklisted in a shared database accessible to all email providers. This approach would basically created a distributed human computing engine to combat spam. Of course some people would occasionally mark non-spam messages (such as opt-in retail mailers) as spam, but statistically the wisdom of crowds would prevail and the system should be 99.9% accurate.
Most of the spam I get in my mailbox is highly repetitive. Almost all the messages have similar subject lines and content. I use Yahoo Mail to manage all my emails from multiple accounts. Why can’t the power of all Yahoo users be harnessed to filter my spam intelligently? Why can’t the power of all email users across all email providers be harnessed to filter spam intelligently?
You have companies like Symantec that keep a central database of all viruses. Why not a company that keeps a central database of all emails marked as spam across all email providers, which could, in turn, be licensed to each email provider. It seems like a win/win venture to me. Each new email company that signs up provides additional user generated spam data, and the analysis of that data provides dramatically improved real time filtering data to all webmail providers. As soon as an email gets marked as spam in one mailbox, it gets filtered as spam in all mailboxes. If Google can index and offer instant search results on all the content in the world, surely a company can index and offer instant filtering on all the spam in the world?