How are You Preventing Mailto-Link Harvesting? 229
mixwhit asks: "In our ever increasing effort against spam, we are now considering replacing all mailto: links on our website with something unharvestable (i.e. 'user (at) address', javascript mailto links, character entity evasion, etc.). Obviously this won't stop the spam, but it seems prudent to stop the harvesting so that the spam may slow down someday (year 2024 maybe?). What are others doing with this issue? We would prefer to preserve mailto link clickability, but also only want to make this adjustment once." One suggestion I would make is to put your email address in an image. People can read it, but harvesters won't be able to harvest it (unless they download the image for OCR), but any barrier you can place in front of the spammer, without blocking people honestly interested in communicating with you, is probably a good thing.
Mail form (Score:5, Insightful)
Re:Mail form (Score:2, Redundant)
Re:Mail form (Score:4, Informative)
Re:Mail form (Score:3, Interesting)
Re:Mail form (Score:3, Informative)
Ironic, that in order to stop spam to you, you would use the notoriously buggy and insecure formmail, turning your box into an open mail relay for spammers to use. Use a secure alternative (there's compatible versions, but really it's not hard to use MIME::Lite yourself). Matt has never fixed formmail to a satisfactory degree, and shows no inclination toward doing so.
If you roll your own, it'd
Beware of disability advocates (Score:4, Interesting)
I can see both sides of this. Can't say I know where to stand though.
Re:Beware of disability advocates (Score:4, Insightful)
It's important to remember that web pages are not always rendered visually.
Un-what? (Score:5, Informative)
What makes you think "user at mail dot foo dot com" is unharvestable? The web archives of all the development mailing lists at gcc.gnu.org use that scheme, and we still get spam to unique addresses used only for sending mail to those lists.
It's a handy technique, and useful, but it's certainly not foolproof.
Missing the point (Score:5, Insightful)
I have a few websites with my email address all over them, in mailto links. I "mask" the email very lightly, by escaping most of the characters, and it has worked beautifully.
Here is a webpage [rochester.edu] that will quickly convert your mailto link into a form that bots will miss.
Could a bot be written that would be able to harvest these email messages? YES. But would it be worth the spammer's time to code it? NO, so it probably won't happen.
Put yourself in the spammer's shoes (or slime-covered bedroom slippers). Why would you want to go to a lot of work to build a bot that will harvest the email addresses of the very people you don't want to get your spam, because they will report you to spamcop, harass your ISP, and even hack your computer and post some very unattractive pictures of you [freewebsites.com] on the internet?
No, they want the chumps, and they want to find them without needing to check every webpage for dozens of patterns.
Re:Missing the point (Score:2)
You wish.
Just like the mailing list archives that cloak everyone's address "foo AT bar DOT baz". They don't get harvested quite as frequently by the regular web-crawing bots but they DO still get harvested, because someone notices that they can get a few hundred email addresses from that archive with a fairly small amount of programming.
As s
The other cost/benefit (Score:3, Interesting)
I think you're right as more websites use automated obfuscation; then the spammers need to decode it to get to their victims. But as long as most websites aren't doing what I'm doing, I know they don't want t
Re:The other cost/benefit (Score:2)
There are way too many ridiculously lazy people around these days.
Re:The other cost/benefit (Score:2)
Re:Missing the point (Score:2, Interesting)
I think that a partial solution is to speak about email addresses in a more casual form. For example, if my email address is foo@bar.biz.baz then I should tell people that they can contact me @ foo @ bar biz baz. You should have noticed 2 things.
Notice that there is no word, "dot", in there? That's because most people should already be able to figure it out on their own. If they can't then they shouldn't be usi
/.'s obfuscation is harvested so why not? (Score:2)
It is retarded to think that "fred at sheila dot com" won't get converted.
Once one has written one's harvester, it is prudent for one to inspect the results and tweak it.
It's for profit not fun! If it is possible to increaes the yield in *any* measure it will be done by someone somwhere.
Re:Missing the point (Score:3, Funny)
Server side scripting (Score:2, Informative)
Any method of munging the address must still be clickable within the visitor's browser. If it is clickable, it can be harvested. Javascript and html encoding may stop most of the bots, but bots exist that can slurp the address no matter how much javascript you wrap it in.
I use a PHP email form that never sends the address to the to client accessing it. Short of hacking the server and looking at the php script in plain text, there is no way to harvest the address. I have no need to let the public know my a
Re:Server side scripting (Score:2)
simple js (Score:5, Informative)
<!--
var u = "sales"
var d = "example"
var t = "com"
var a = u + '@' + d + '.' + t
document.write('<a href="mailto:'+a+'">'+a+'</a>')
//-->
</script>
Re:simple js (Score:3, Interesting)
1) Randomize the variable names for u, d, t, and a
2) Randomize the position of var XX = XX statements.
This will reduce simple regex replacements if you site is big enough with enough emails that someone would want to create a simple reg mod to harvest it.
Re:simple js (Score:2)
maybe, just maybe (Score:2, Insightful)
info@yourdomain
sales@yourdomain
help@yourdo
webmaster@yourdomain
postmaster@yourdomain
etc.etc.
Re:simple js (Score:2)
Re:simple js (Score:2)
Arms races are rarely effective.
Re:simple js (Score:2)
I agree it would be much slower, but the folks who sell harvesting tools are going to have to keep adding features to keep their customers coming back for upgrades. Unfortunately, the most people who protect themselves with this method, the more likely it will become part of harvesters.
The harvester would only share the crash risk a browser does. I haven't crashed in javascript code since the Mozilla guys fixed a stack recursion bu
Hiveware's Enkoder (Score:4, Informative)
Re:Hiveware's Enkoder (Score:4, Informative)
The other thing is, if you are using this, you'd be wise to change the string 'hiveware_enkoder' to something unique. The reason being, if spam harvesters really wanted to, they could recognize that string, and have their own javascript engine [mozilla.org] handy run the script to get at the email address hidden inside. That's a lot of work, but not entirely impossible. If the Hiveware system gains many users, it might be worthwhile for them.
There is a simpler one (Score:3, Informative)
Obfusticated Email Link Creator [tripod.com]
It does mixed dec and hex. Creates links like this [mailto]. But check the underlying code....
It's a Tripod site, so don't
Re:There is a simpler one (Score:2)
<a HREF="mailto:te%73t%40t%65%73%74%2E%63%6Fm" TITLE="mailto">this</a>
I use an image (Score:3, Insightful)
Meanwhile, I'm keeping an eye out for the next technology to replace email. IM was promising about five years ago, but went to hell faster than email.
Re:I use an image (Score:2)
Think before you do something like this people - first it's not section 508 compliant (if your site needs to be), and secondly it's just not nice to exclude a whole bunch of disabled people.
Use a form instead that mails you their input - never reveals their email address, and is accessible.
Fraid Not (Score:2)
Instead of "begging the question" it just "makes you want to ask".
Re:Fraid Not (Score:2)
Re:I use an image (Score:2)
Uhh... (Score:3, Informative)
Quoth the original message...
Err, doesn't this exactly not meet the given criteria? The guy wants links to be clickable. If you hide the image, you can only get as far as, say:
But that's just as easily harvestable as it would have been if you left the visible text as the plain address. What's the point?
It's the contents of the href attribute that need to be obscured, not the visible text (or image, or video clip, or whatever). You can't embed an image in the href text, so I don't see how this suggestion gains us anything at all.
---
The suggestion I like best is to encapsulate the address as HTML entities. Currently, this is enough to fend off the average address harvesting software, though if the practice catches on, I assume that the harvesters would start to take this into account -- at which point I don't know what the solution should be...
Barring that, it seems like the only way to provide an address will be to use literal text such as "write to us at foo at bar.com" and hope people just get it.
Alternatively, shy away from giving out your address, and provide a form where visitors can submit comments. This could allow you to filter out some of the incoming traffic (hint, if you're going to use "off the shelf" software for this, use NMS [sourceforge.net] instead of Matt Wright's ancient Formmail.PL script, it's much safer). Avoiding any publication of email addresses might piss Jakob Nielsen off, but under the circumstances I think it's probably a reasonable approach to the situation -- it's way to easy for a public address to get abused...
it works like this (Score:2)
<a href="wewillnevergethere.html" onclick="alert('myreal' + 'addy@site.com'); return false;">
<img src="pictureofemailaddy.png"
</a>
See it works. Note, it is important to concatenate the email address as i'm willing to bet mailto harvesters don't parse it out as being javascript. The extra obfuscation
Re:it works like this (Score:4, Interesting)
<a href="false@false.com" onmouseover="var a = 'in.com'; this.href = 'real@doma'+a;">email me</a>.
Re:it works like this (Score:2)
Great Idea (Score:2)
Re:Great Idea (Score:2)
And even if you could prevent automated harvesting, theres still people who'll do things like pay stay at home moms to harvest manually from mailing lists and archives.
Re:it works like this (Score:2)
Re:it works like this (Score:2)
Re:Uhh... (Score:4, Interesting)
Actually, you can.
data URL examples [mozilla.org]
Sick, eh?
Don't bother, it's too late (Score:2)
Re:Don't bother, it's too late (Score:5, Interesting)
Re:Don't bother, it's too late (Score:2)
Aren't most of the spams filled with random gibberish these days specifically targeting Bayesian filters? My Mozilla client filter was working better and better for awhile, but lately the trend has been reversing... anyhow, I disagree that it's the "only" way to go.
I think collaborative filtering (no link, I've read about it in the past but can't be bothered to look up a good example at the moment) will become a major tool. Also, why has nobody
Re:Don't bother, it's too late (Score:2)
That sounds good, until you find some Microsoft security hole has allowed a spammer to use your PC to send their filth for them. This approach would only DOS another of the spammer's victims (this includes the hapless ISP who didn't know they had a spammer as a customer, and all of that ISP's legitimate customers). That's worse than the blacklist vigilantes.
You're right, Bayesian filters are not the "only" way to go, but I think they'll prove to be the most effectiv
Amen, brother! (Score:2)
Security through obscurity is always a bad idea.
The trick is finding the right combination of tools to automatically reduce your spam to managable levels. If I get just one or two pieces of spam a day, I'm happy.
How I do it... (Score:3, Informative)
The first technique I used (described here [ofdoom.com]) was a simple RXML macro, that defined a tag called <cloak>. It would check to see if the client was on a list of known robots. If the client was a robot, a graphic version of the email address would be returned. If the client looked like a normal browser, then the address would be entity encoded, and returned as a mailto link.
Shortly after I set that up, I realized that entity encoding was pretty much useless - that if a web browser can figure out the address, so can a spam bot.
My second attempt appears to be working well. I wrote a Roxen module called mailcloak [ofdoom.com] which takes addresses, and replaces them with a graphic link to a dynamically generated form to send an email to that address.
As an example, the code <mailcloak> maileater@ofdoom.com</mailcloak> would be replaced with a graphical version of the address maileater@ofdoom.com and a link to this [ofdoom.com] page.
It also has support for finding and cloaking bare addresses in pages, and I'll probably add support for rewriting mailto tags sometime in the next few weeks.
Use a Form (Score:3, Informative)
Re:Use a Form (Score:2)
disposable email addresses (Score:2)
Might be interesting to try encoding the month and year into the email address, and change the address each month. That way you could get some measurements of how much those addresses are being harvested for spam. Who knows, maybe you'd find out October is a big spam harvesting month, when you get deluged with spam to me-oct2003@blahblahblah.com over Thanksgiving
Hivelogic Enkoder (Score:2)
The downside is that javascript is necessary to read any portion of my email address, and it only works if spambots refuse to execute arbitrary javascript. But in a year of use, I haven't had any problems with it, and my primary email address is remarkably spam-free. Nothing the spam filters can't handle anyway.
In message forums, etc, I just don
Re:Hivelogic Enkoder (Score:2, Informative)
#!/usr/bin/perl -w
use Socket; # Load socket functions
use CGI qw(:standard); # Load CGI standard functions
my $name = "harvestbait"; # yourname
my $domain = "example.org"; # yourdomain.tld
my $ipaddr = $ENV{'REMOTE_ADDR'}; # Get the requester's IP
$ipaddr = unpack 'H*', inet_aton($ipaddr); # Convert the IP to hex
my $date = `/bin/da
Lil' CGI thingy. (Score:2)
<a href="/x.cgi/mailto:abuse@localhost">mail me</a>
And then had x.cgi be a PERL script that generated an HTTP "Location" header to the real mailto: URL.
If I wanted more complexity, I'd substitute in whatever I felt like for the @ in the address, and have the PERL script un-do that. It's probably also doable in PHP, shells, TCL, or whatever. I like to
Use Flash! (Score:2)
You insensitive clod! (Score:2)
My php based site has a form that allows people to email me. They never get my email address until I reply to them.
My previous site was only allowed [X]HTML, no PHP/ASP. To combat harvesters, I had in my XHTML:
Then, in an embedded JavaScript file (email.js) I had:
Unicode (Score:3, Informative)
Unicode actually works! (Score:3, Insightful)
If you want to convert your whole address, E-cloaker [codefoot.com] is a neat little free program for converting text to Unicode.
Not for Netscape 4 (Score:3, Insightful)
Put it in a table (Score:2)
One of my colleagues came up with the following the other day:
If you put your email address in a table with the border set to '0' cell-padding and cell-spacing also set to '0', then it will still be readable by humans. But, the code to create the table will obfuscate the address enough that it won't be harvestable.
Oh yeah been there done that (Score:2)
Re:Oh yeah been there done that (Score:2)
Here is what we do (Score:2, Interesting)
blocking mailbots (Score:2, Interesting)
Each page of my site checks against this text file so the mailbot gets a 403 page for almost all pages/sites that I host. To deal with false positives there is a mailto link on
stuff (Score:2)
a href="mailto:joeblow@[10.0.0.1]"
Substituing your IP address, of course. Maybe spam harvesting bots would fail to treat that as a valid address.
On another note, this is a CGI thing that looks interesting: Master Spambot Buster [willmaster.com].
Operation Barndoor (Score:2)
I'm one of the sysadmins for a CS department [earlham.edu] in a liberal arts college [earlham.edu]. I've been working with the web content admins off and on for a couple months as they prepare a system that will execute a Perl script to generate an image that will replace the e-mail address. The project is still in its infancy, but here's the URL to the description, and here's the URL to the current version [earlham.edu] of the project, in gzip'd tarball format.
Images are probably the easiest all around. (Score:2)
For folks who won't be able to handle the images, you could put some human decipherable text in the "ALT" or Title text of the image- e.g. jim@_REMOVE_ALL_OF_THIS_23421232_me.com.
Re:Images are probably the easiest all around. (Score:2)
Personally, I think this is a case of diminishing returns. Putting your email address in clear text on a mail
Re:Images are probably the easiest all around. (Score:2)
Re:Images are probably the easiest all around. (Score:2)
'
a+='lto:'
b+='@zocalo'
e=''
b+='.uk.com'
d=b
document.write(a+b+c+d+e)
}
escramble()
Re:Images are probably the easiest all around. (Score:2)
Re:Images are probably the easiest all around. (Score:2)
blind people (Score:3, Insightful)
Bring it on. (Score:2)
Sure, I drastically increase the number of spams I get, but popfile [sourceforge.net] takes care of them all.
Re: do you really want that? (Score:2)
unicode, base-64 encoded (Score:3, Informative)
& # 105;& # 032;& # 100;& # 111;& # 032;& # 105;& # 116;& # 032;& # 116;& # 104;& # 105;& # 115;& # 032;& # 119;& # 097;& # 121;
For the past three years or so, the spammers haven't caught on to this, and they are unlikely to do so given the few people who take the effort to put this measure into place.
P.S. It's not just mailto links that are being harvested here. They'll scrape anything with an @ or a "at" or
Fight the problem, not the symptoms (Score:3, Interesting)
Also, don't munge [interhack.net].
Deal with the symptoms and the problems (Score:2)
Back when the 'net was young, and there was hope for stopping spam before it snowballed out of proportion, it was hoped that this naive "nip it in the bud" attitude might work. It hasn't. Spammers have proven as resilient as cockroaches, and more prolific.
Keep in mind who is paying
See my sig. (Score:2)
The nine domains for whom my email is the catch-all address receive an average of a hundred spams a day, but I don't see them, thanks to a Bayesian filter [sourceforge.net].
Any spammer who harvests the email address in my sig [mailto] just registers their latest spam so that I (and the dozen-odd other people who use the same filter) are that much less likely to see it.
Accessibility (Score:2)
The best method is to use a mailto form that allows you to receive the message but doesn't give away your address. That way you leave your site open and accessible to all users, but can protect your email address.
Re:Accessibility (Score:2)
In the year or so that the site's been up, I've not received a single bit of spam.
Sure, I had a couple of people input bogus information attempting to get the address from the results page, but that doesn't show them anything except a thank you message.
I can't seem to locate the link I originally used
I don't bother. (Score:2)
Of course, I take more care with other people's addresses; using mailto forms, intra-site private messaging systems, one-time-only addresses, that sort of thing. I also wrote a bit of PHP to munge email addresses (phps [aagh.net]/php [aagh.net]), but I don't actually use it.
You XHTML users better not be using these JS "solutions" which use document.write() by the way (that's HTML
My solution... (Score:2)
1) I have a mail form. It will only send to one mail address, it's not anything like formmail.pl.
2) I generate a unique email address with the IP address and time encoded in it. I actually could use spamgourmet to do this, but I've been doing things by hand because I want to collect some observations about how far a single address travels.
bad solution (Score:2)
One suggestion I would make is to put your email address in an image. People can read it
Unless they're blind! Yeah, yeah, no one cares about the blind, you insensitive clods.
After the harvest: broadcast seeding (Score:2)
But it mutates: aster@example.com, r@example.com, bob37, jenna624, etc. etc. Most of the spam we receive isn't to one of our known addresses. But we don't want to lock down all but a few (sales@, help@, webmaster@, orders@, myname@, hername@) so that we can help the poor sods who misspell "orders"
The solutions we offer here will not solve the.... (Score:2)
The issue isn't with the emails getting harvested. The issue is with a global infrastructure that uses an old policy of sending emails.
We need a new improved protocol that does a level of authentication at the host/isp level to say, this is a legitemate server with an emal from an acceptable user. All isps should be held up to a spam policy enforcement where if a user violates the policy, are automatically terminated, and their name, with evidence provided of course, is sent to a spammer list syst
Re:The solutions we offer here will not solve the. (Score:2)
I create uber mailto style 1, a some time passes and I find it's no longer working. I create uber mailto style 2, a some time passes and I find it's no longer working. I create uber mailto style 3, a some time passes and I find it's no longer working. I create uber mailto style Nth, a some time passes and I find it's no longer working.....
The solution
low tech solution (Score:2)
My suggestions (Score:2)
Secondly, anything you do to obscure a user's email address will eve
Any Standard is bad (Score:2)
Spam magnet (Score:2)
Write a small filter program on your site that stores all spam coming from th
that's a start.. (Score:2)
Start with to contact so-and-so clieck here. Have the users name embedded in the email form and the second half you get from the server. So if the user was thomas@englishmuffin.com the web form would have a hidden input called loosername and its value would be thomas. Call it something different than loosername, but the idea is that you don't want to just say username. When the web form gets posted you can have it read a text file (this is what e
"block images from this server" (Score:4, Insightful)
Re:my method... entertaining and it works well (Score:2)
Re:Javascript mailto links... vulnerable? (Score:5, Informative)
Wait... this provides some nice opportunities to cause them a major headache by including malicious JavaScript code on a page only seen by a bot not following the robots exclusion protocol [robotstxt.org] (to prevent a "real" search engine spider from visiting the page) by linking to that page using some hidden link from your home page...
Re:Javascript mailto links... vulnerable? (Score:3, Interesting)
A lot of people do that with a malicious honeypot page. It just outputs X phony, but real-looking, mailto links, where X is a member of the set of Very Large Integers.
(note to
Re:Unicode your email address (Score:2)
Re:JavaScript tricks (Score:2)
Re:Plug (Score:3, Informative)
I was able to use your form to send myself spam!
That's right.
I entered my e-mail address, a from address, and the mail went through.
Essentially, your web page is providing the equivalent of an open relay.
You need to remove the "mailto" field, as that allows the form to be used to send out an address to anybody. Once that's gone, your form should be secure again.