bot classes

ok, after a couple days of robots.txt love, i have now much less crap in my logs. a good opportunity to see which bots are well-written. based on what i am seeing with /robots.txt, i am sure glad i blocked most of these festering piles of dung from my site.

not using conditional get while requesting /robots.txt

Only kinjabot, OnetSzukaj/5.0 and Seekbot/1.0 get this right. All other bots, including google and yahoo, do not. lame.

requesting /robots.txt too often

The biggest offender is VoilaBot, checking /robots.txt every 5 minutes, every day. you gotta be kidding me. google and yahoo are not much better, you’d think they’d figured out a way by now to communicate the state of /robots.txt across different crawlers. Other bots fare better by virtue of being less desperate.

update: problems like this are economic opportunities.

safer browsing

here is what i am currently using to make browsing safer and less annoying:

  • Use Firefox (duh)
  • Don’t install the Flash plugin
  • Turn off “Allow sites to set cookies” and keep a small whitelist
  • Use NoScript to only allow javascript on a small number of sites
  • Install this hosts file to remove most advertising
  • Use TargetKiller to get rid of pages opening up in new windows
  • Disable Java

It’s amazing how much faster and pleasant the web becomes if you take the garbage out..

someone got fired today

first, i get this email to my british airways account:

-------- Original Message --------
Subject: test email on TCRM
Date: Fri, 1 Jul 2005 11:34:46 +0100 (BST)
From: British Airways Executive Club 
Dear Mr Rothfuss,
It's warm in here but I'm having fun
test email
Yours Sincerely

inevitably followed by:

-------- Original Message --------
Subject: Email sent in error by British Airways
Date: Fri, 1 Jul 2005 17:17:50 +0100 (BST)
From: British Airways Executive Club 
Dear Mr Rothfuss,
You may have received an email titled "Test email on TCRM" this  afternoon, please accept my sincere apologies as this email
was sent in error whilst we were undertaking routine testing.
I would like to reassure you that we have now rectified this error.
Yours sincerely
Sarah Keyes
Loyalty Programmes Manager Europe

you gotta be more careful, afzal, even when it’s hot.

spamconference 2004

Interested in solving the spam problem? Come join us at the 2004 spam conference in sunny Cambridge, Massachusetts. Speakers at this intensive, one-day conference include many of the leading experts on spam. Whatever the answer is, odds are it’s here somewhere.
the conference is free of charge, january 16, 2004, 9 am to 6 pm. i am especially interested in what paul graham and bill yerazunis have to say on spam.

under attack

a new threshold is reached. i received over 2 dozen comment spams in the last 24 hours. is it because a) my blog has a sufficiently high page rank b) blogspam is taking off?
a) is unlikely since i have had my page rank for almost a year. it must be endemic then. we need a solution fast, and IP banning won’t cut it. maybe some kind of distributed bayesian filtering may work? or do we have to disallow anonymous comments? is this the onset of global identity systems for the blogosphere, as outlined in this proposal?


every once in a while i have to turn off my ad-filtering proxy because a site wouldn’t render correctly. today i got the dumbest banner i have seen in a long time. it warns me that my computer has an IP address.


i went through nearly 200 (physical) mails today. things i hate:

  • unasked-for customer magazines
  • redundant account statements (hello web??)
  • bills-cum-advertising
  • “special” offers from lousy companies
  • “moving” pleas from dumb charities
  • “management” magazines

maybe 10 of the 200 were relevant. if i get my banks to stop sending me statements which are online anyway they wont have an opportunity to spam me with their “special offers”. funny how marginal businesses that should focus on transactional efficiency want to create “added value” through stupid customer magazines. i don’t want to hear from my bank. they are eating my margins with their posturing. time for some shakeout.