Should You Break TOS Because Work Asks You? 680
An anonymous reader writes "My boss recently assigned me a project that was all his idea, with two basic flaws that would require me to break multiple web sites' Terms of Service (TOS). Part requires scraping most of the site, parsing the data and presenting it as our own without human intervention. While we're safe on copyright issues, clearly scraping like this is normally not allowed. At times it might also put a load on those sites. The other is, for lack of better words, a 'load balancing' part that requires using multiple free accounts instead of purchasing space and CPU time for less than $2,000 USD per month. The boss sees it as 'distributed' computing when in reality it's 'parasitic.'
My question is: am I wrong about the ethics? If I do need to walk, how best can I handle it without damaging my reputation and future employment opportunities?"
Hilarity ensues when... (Score:5, Informative)
...you build a system that closely relies on this nonstandard (and unsupported) method of getting information, they change it and it breaks.
Either by accident, or because they spot a load of particular access patterns from your address, figure out what's going on and intentionally break it.
Get yourself caught, get him blamed (Score:1, Informative)
Wait until he takes credit, then create some problems. Put in a user agent string identifying your company... have the distribution not quite equal... Just do a few things to tip off others and sabotage his brilli
Simple Progammer, No Consequences (Score:1, Informative)
"My question is am I wrong about the ethics? If I do need to walk how best can I handle it without damaging my reputation and future employment opportunities?"
Well, you can escape most of the "consequences" of doing something like this [basically none] by virtue of being a simple programmer.
I was just following orders is the good way to go. Make sure you document everything, including your objections.
Real engineers on the other hand, do not have such a simple recourse.
Re:Short answer... "no". (Score:1, Informative)
well, depends if you need the salary to live on;
if so, keep the job, while you find a new one
Check with Compliance Officer/Department (Score:5, Informative)
Re:It's your job... (Score:3, Informative)
I don't have my copy of the ACM code of ethics
Well, look no further: The ACM Code of Ethics [acm.org]
Some sections relating to this issue would be:
1.1 Contribute to society (and human well-being.)
1.2 Avoid harm to others.
1.3 Be honest and trustworthy.
1.5 Honor property rights (including copyrights and patent.)
1.6 Give proper credit for intellectual property.
1.8 Honor confidentiality.
Re:You're Right, Of Course (Score:3, Informative)
When You Approach Your Boss (Score:1, Informative)
Don't emphasize the legal/ethical aspects of this issue. S/he won't be listening, and won't care. Emphasize the practical aspects of the issue - the site cutting you off, the possible problems of parasitic computing re the owners of those computers not being under your direct control. If that doesn't convince him/her, then discuss the legal problems.
Re:You're Right, Of Course (Score:4, Informative)
So the few times I've had someone ask me to do this sort of scraping, my response is usually that sure, fine, it works, but it's very easy to spot on the logs, and the information is very likely to become unavailable at unpredictable intervals.
Depends on how you do it. I tend to use tor and a random wait time between gets to bring down the data over a few hours (up to a few days) and in one instance, because the URLs were easily guessed, I randomized the list to make it seem as if the hits were going to pages all over the place. I was never banned for any scraping activity that I have done.
In the long run, it's usually pretty futile to scrape in the first place. When you're stealing content just to drive traffic, you tend to have a crappy site. The only time I ever did a professional scraping app that was "justified" and "legal", the victim was another business unit within the same corporation, and we had every right to the data that they "couldn't" compile for us.
It's not futile. Scraping provides a plethora of information in a useful format from places that aren't willing (or unable) to provide data in the necessary format. I used scraped data of course schedule information from MnSCU to develop a weekly report that showed data about how many courses were filled at other area institutions. It was to our competitive advantage to have this information and while it was publicly available, the system wouldn't provide it to us in the DW. I used that data for a variety of different reports than I originally intended and it would not have been possible otherwise.
While I wish that the data had been provided in a better format for my use, it wasn't and that's what made scraping necessary. Plus *I* was the one who got to determine what information I was allowed to glean from the data rather than whatever the system decided was appropriate for our needs.
Re:Why have an ethics dilemma at all? (Score:4, Informative)
This is a shortsighted view of the problem here.
"You're getting paid to do a job, and you're not going to be personally liable should anything go wrong anyway."
Incorrect. His boss isn't breaking the Terms of Service, he is. When the website in question terminates their access, guess who's gonna get the flak? The person who *implemented* the system, not the one who designed/thought of it, especially if they are non-technical and rely on lower-order technical beings to do things for them.
Take, for example, a situation that I regularly come across:
Boss: "It's okay, we'll just copy all these Microsoft CD's and save a fortune on licensing."
Boss's Boss: "Okay. You know best."
Boss (to underling): "Copy these CD's"
Underling in theory: "Okay". Underling in practice: "We *can't* do that."
When things go wrong, the underling in theory is going to get the blame here, because it's his area of expertise and he *wrote* the system that does it. I get people suggest to me all the time that we could just install another license of Office that we don't own, or we can just copy CD's that have blatant copyright notices on them, or breach a Data Protection Act directive by doing X, or a million and one other things that I *know* we can't do. The people in charge of me barely understand the terms, let alone whether what they are doing is illegal. I have to sit and explain to my boss and my boss's boss why we can't do them. Trust me, if something got noticed, Underling in Theory would get sacked/sued every time.
"Are you really going to walk out of your job over violating the terms of service of a few web sites?"
Why not? I get asked to do all sorts of crap and I point it out and say no. If I *chose* to do it instead, then it's a different matter. But when I *refuse* to do something on legal or ethical grounds (we're not just talking ethics here - it also sounds like they have a "subscription" of some kind to the data that they are scraping, or that it's a competitors website) then if you *make* me, I will walk (been there, done that - I've turned down a good career move and more money in order to sleep at night - not that I was being asked to break the law, not that I was being asked to sell my children, but that I was being asked to do things that I didn't agree with [wasting money within a school on useless IT cruft and consultants while the kids didn't have books or paper]). I'll also report you to the BSA or whatever organisation I need to if you really press me, or the local press like I did in the above case (they didn't do anything with it, but I breathed a sigh of relief once I'd sent off the information to them - my part was done and I'd done good by myself - if the press decide to sit on something, that's on *their* conscience, not mine). You don't do illegal stuff if you're honest and your mortgage depends on a wage.
"It's not your job to worry about the ethics of the situation, that's probably not even your boss's job -- it's somewhere in your corporate legal department, the Board, or an Ethics or HR department perhaps."
Wrong. Because they won't even *know* what the problem is until it comes up in court and they have it explained to them in excruciating detail. However, someone who decides to do something that's part of their job, within their area of expertise and breaks a law (or even does something a bit stupid) that *they* should know about will get fired/sued by their own company once the shit hits the fan. So your boss *and* you might get sacked - you're still no better off and your employment reference is now a million times worse.
"just do what you're being paid to do and ask fewer questions."
It's sad that people think this is a good way to live. He's *being paid* to do his job. Which does not entail questioning his ethics or breaking Terms of Service (even if legally unenforceable) or anything else. His *job* is to stand up and say "Whoa, hold on, we can't do that". If he doesn't do that, he's not doing his job an
Re:Hilarity ensues when... (Score:3, Informative)
That's pretty common. John McCain had an issue with that earlier in his campaign when his MySpace page got hit [techcrunch.com]. The guy who did the original template wasn't keen on having his images hotlinked from such a high volume site and made a hilarious substitution (which was widely misreported as a "hacking" incident in the media).
The AC is dead on. If you depend on someone elses data, they are going to notice, and they are going to remove your access, or, worse, start feeding you crap.
Re:No dilemma (Score:3, Informative)
Copyright. The copyright holder has the right to do pretty much what they want with their own data. If that means putting up a notice that says "free to view, pay if you want to download", they can do that. Copy an image from a website and then upload it somewhere else, or put it in an advert, or print it out and stick it on your office wall. Chances are you just broke copyright law. You can't do this with anything copyrighted, no matter how easy it is to "technically" do it. Dilbert cartoons, youtube vids, Slashdot comments, it doesn't matter. If it's copyrighted, you *can't* do this.
Re:You're Right, Of Course (Score:5, Informative)
I'd advise against discussing it with HR. I've encountered the following situation: I talked to a HR manager about something that obviously should've remained confidential. However that same HR manager was part of the management team and thus had two hats on. She proceeded to inform the management team, to my astonishment.
I've come to the conclusion that HR is just a staff department and owes allegiance to, you guessed it, the management team. Not you.
Terrible engineering (Score:3, Informative)
Re:Shit Falls Downhill (Score:3, Informative)
Then that's what he needs to tell his boss (and I agree with you). He needs to clearly inform his boss that it is probably illegal and opens his company up to (expensive) litigation and, more importantly, even if it doesn't get to litigation, the source site could make a change that renders their scraping efforts null and void. It needs to be put in a dollars-and-cents picture so that the boss realizes that the best (and only) solution is to pay the licensing fee. Doing otherwise will likely be more expensive and inconvenient. Any other depiction of the scenario won't matter to a boss that is only concerned with the bottom line.
And, if that doesn't work, polish your resume.
Re:You're Right, Of Course (Score:3, Informative)
Yea, but unless you're running that list across a botnet, the IP addresses are a give away.
Even if you are running it across a botnet it's pretty easy to pick out the patterns using some pretty trivial statistical hacks...If you graph bot traffic it looks like a heartbeat; even if you randomize the access times they don't match "human" numbers (unless you add so much random that it ceases to be an efficient scraper...If you could hire a guy to browse the site and write down the data faster than you can scrape it, they beat you.)
I've never actually been banned for it either, but it's all a crapshoot. I used to work for a company that did GIS data and we smote scrapers on a near-hourly basis, and that one turned freak-nasty because when we found a really good scraper, we'd feed them 60% crap data, and with GIS it's not easy to tell good data from bad.
Things like posted schedules, imho, are the real legitimate use for scrapers. Those people want their data to get out, but they may lack the tools to put it out there.
Re:Who cares? (Score:1, Informative)
Would you *REALLY* build a business that relies on no one else noticing that you are scrapping their site?
Like you said 'theyd fuck you over in a minute'. What would stop them from doing that? Just filter out that site and load problem gone.
I would personally go back to the boss and say 'do you really want to really on the good will of other web sites?' Sounds like the dude is a bit shady and will see someone messing him over much easier.
I personally wouldn't build a system where it relies on the 'niceness' of others. Not because I am a nice guy. But because it is a fragile system where the data you depend on might go away.
Also would you REALLY want to test those TOS in a court when their IT guy shows how you are stealing the data from them. They made a good faith effort to 'wall off' the data. They probably would have a good case. But more than likely they would just cut you off and ban your ip.
Re:You're Right, Of Course (Score:1, Informative)
I'm an admin for a pretty big site and we've had to deal with bots before. It's a bad business decision to rely on the the misappropriation of another company's work for your commodity. A lot of these times, these bot writers are competing with us in the marketplace, so we have every reason to screw them goatse style.
First, it can easily expose you to copyright liability. As a p2p filesharing kid, you probably find it easy to evade liability, but when you're running a company, you're a sitting target, possibly with deep pockets.
Second, you're depending on another company to provide you with all your data and all they have to do is blacklist you, swap the content with gibberish for you, or merely update their page format. All of this would substantially set you back, and you have no legal recourse.
As for the free services bit, doing stuff like writing workaround bots to host an image stash across photobucket accounts instead of buying your own hosting is only asking to get screwed.
1) free services are not even likely to have 90% uptime, nevermind 5-9s. As an admin, I wouldn't even consider anything less than 5-9s.
2) free services typically prohibit commercial use, and to access them, you have to actually click "accept" on their ToS. Many cases have found those are legally enforceable contracts. If you're intentionally writing code to abuse their expensive contracts and the cost difference is $2k/mo, you're talking at least $2k/mo in liability if they sue you, and possibly punitive damages for being a dick about it.
Re:You're Right, Of Course (Score:4, Informative)
Unless it's something about you, personally, then an HR employee has no requirement to keep it "confidential".
In other words, if you are talking about your health insurance, your personal information, etc., that's not general "company business" and shouldn't be spread around. But, if you bring them some information about someone doing something that could be detrimental to the business, they really do have an ethical requirement to pass that along. A good HR team would know not to bring your name into it if you are "snitching" until it was absolutely necessary, but sometimes that happens sooner than you would like.
What you were thinking of is a company ombudsman [wikipedia.org]. These people are somewhat like your "lawyer" within the company, and are there for the employees. What they would do is explain to you your options (go to management yourself, let them do it and respect your anonymity as far as possible, never divulge your name even if that means the complaint can't proceed, etc.), and then help you implement them.
Waste of money (Score:2, Informative)
My take on this is that, though your assignment has spawned an ethical question, the reality of the situation is economic.
Your boss believes that it will cost him less to "scrape" data from the website and use multiple free accounts than to simply pay for the data access. This may be true at first, but, ultimately, this is false.
On the off chance you've not scraped websites before, I'll tell you that this is extremely error prone. So, while this may work initially, you'll be constantly chasing down bugs in the process.
Based on your description I assume you'd be automatically logging into their systems before scraping the data. What if their login process changes? What if they restructure their website? What if they add a captcha to the registration and login process?
My point is, what your boss wants to do is, to use Steve Job's recent phrase, a bag of hurt.
I'll bet that given enough time, the cost to your company in terms of your salary to build and maintain this application will be greater than the cost to actually pay for the data and create a dependable connection.
Don't forget to factor in what it costs you when your users see bad data or error messages due to the process breaking.
It'll cost less to do it the right way. So forget the ethics of it and educate your boss on the economics of it.
From the other side of the fence (Score:2, Informative)
You're asking the wrong question (Score:2, Informative)
What is relevant is that any feed you use that isn't backed up a valid contract, can and will disappear at random times, sometimes permanently, as well as contain data you weren't expecting and be missing data you were expecting.
Ask your boss how happy he'll be when the domain owner sells to a spammer and his scraped data is now "Male Enhancement" ads instead of weather data.
Re:You're Right, Of Course (Score:5, Informative)
It's happened. ESPN connived a way to get to another sites private database [theregister.co.uk] and reported the data as its own. The website injected some fake data which ESPN picked up and reported and were caught.