ok, after a couple days of robots.txt love, i have now much less crap in my logs. a good opportunity to see which bots are well-written. based on what i am seeing with /robots.txt, i am sure glad i blocked most of these festering piles of dung from my site.
not using conditional get while requesting /robots.txt
requesting /robots.txt too often
The biggest offender is VoilaBot, checking /robots.txt every 5 minutes, every day. you gotta be kidding me. google and yahoo are not much better, you’d think they’d figured out a way by now to communicate the state of /robots.txt across different crawlers. Other bots fare better by virtue of being less desperate.
update: problems like this are economic opportunities.