I run a quite small website and I'd like to have some analytics about the traffic it gets from regular users. And I'd like to continue allowing bots to scrape it. I don't really care enough to invest in trying to detect bad-acting bots masquerading as real users (nor can I think of a reason my site would be the target of such bots).
My quick-and-dirty approach has been to just look at all of the user agents I've received, eyeball which ones seem like bots, and put them in a manually-maintained list. This seems error-prone, and I have to update the list if the various bots ever change their strings, or if a new one discovers my site.
Is there a better approach? In terms of amount of upkeep and error-prone-ness.
I guess this also gets into a broader question: if there doesn't exist a standard way for a good-acting bot to identify itself as such, why not? Have there been proposals to make this a part of some spec?