


How Mainstream Can Code Scavenging Go? 139
The time-honored tradition of code scavanging has long been a way for new programmers to "break in" to a new language or task that they may not want to build from the ground up. The re-use of old code, cleaned up and tweaked to a new purpose can help developers learn many useful skills and accomplish tasks quickly, especially for small tasks that aren't of vital importance. One blogger wondered if this process could be formalized and tools could be built to help foster and enable code scavanging on a mass level. Is this a viable option, or are there just too many things to consider?
IP Laws? (Score:4, Interesting)
Re: (Score:3, Funny)
Re:IP Laws? (Score:5, Insightful)
That's a great knee-jerk reaction. Without understanding the motivation behind the article, you assume that code scavenging means stealing other people's code. What they're really talking about is (legitimately) re-visiting code that you or other people have written, and then picking and modularizing bite-sized chunks. In other words, you would design a large program (mark I) and then go back and pick out useful parts, clean/debug them and have working modules (mark II) for the next project.
Also, for people who haven't read TFA, it's 9 short paragraphs long and barely an article. They talk about a "formal approach to code scavenging" without even coming close to explaining what exactly that MEANS.
Don't we call that "refactoring"? (Score:2)
You can then use those modules in other programs.
Re: (Score:2)
short answer: yes. and even before "refactoring" came into vogue, there were other names for it. hence, TFA is not really a FA.
Re:Don't we call that "refactoring"? (Score:5, Funny)
It leaves us only "the". Which is an article. Liar.
Re: (Score:2)
Re:IP Laws? (Score:5, Interesting)
Agreed. Reading TFS, I thought it was going to be yet another "we can make programming like Lego!" thing. (Which it ain't, and probably never will be. [joelonsoftware.com] Bonus reference: "Lego" is mentioned in the second paragraph of this article [wired.com] about Steve Jobs/NeXT/WebObjects from Wired. God bless Wired and their eternally fucked-up CMS that can't serve images for any story in the archive and, this week, shows the actual HTML code that should be formatting the Question-and-Answer portion of the article.)
Reading TFA, I really don't know much more than I did before. This is the best I could come up with:
So, code scavenging is... um, re-use? Can anyone make better sense of that than I can?
:-)
"In other words, they 'scavenge' the good bits and tweak them to a new purpose."
Um, no. You scavenge the pieces you need, not necessarily the good bits. Have you ever been looking for some code to do parse phone numbers, and while looking at source, said "Hey! This looks like a great way to compare two lists!" Probably not. You're only looking for formatting code, so that's all you see, so that's all you get. Looking at source is not like looking at produce at the food store, where you can walk by the tomatoes and they catch your eye because they're perfectly ripe and really, really nice-looking.
Rather than searching Google, I think every good programmer should take the time to create a really good library. I don't mean take the time writing great code, I mean take the time to organize it into a proper library: make one, clean, well-commented version; put things into variables, ($tableName in queries instead of the actual table name, etc.) and pull code from that when you need it, rather than just copying-and-pasting from the last place you remember using it and then changing all the variable names, table names, etc.
I plan to make mine Real Soon Now.
>> So, how quickly would you run afoul of Intellectual Property laws doing this?
> That's a great knee-jerk reaction.
No, that's just the first thing that popped into his head. (Pardon me if I'm putting words in your mouth, Mr. Gambit.) With that one sentence, he did not say (or imply) "The only people who would use this are thieves." He just put out that question for people to discuss. That topic came up here just a couple days ago. [slashdot.org] I highly recommend reading that discussion. There are some very good points; among them, that if you publish something with no licensing info, it is copyrighted to you by default. (In the US at least,
Re: (Score:2)
Another good point in comment "every good programmer should take the time to create a really good library" And this should go without saying, build your own library of templates and snippets you carry as long as you do coding, often saves a lot of time. Some companies which use SM/CM ( source
Re: (Score:3, Insightful)
These are the ideas that CPAN, PEAR and other code repositories are built on. So instead of trying to reinvent the idea, the author should have poked around a little more, and to learn more about what is available as opposed to trying formalize a "hackup job."
The parent makes a good point here, where you should take the time to build a clean library base to work with. If one should have a well structured infrastructure, and should they need to implement a certain feature, more often is the case, that a g
Re: (Score:3, Funny)
Good point. I think I'll use it the next time someone comments about code of mine that is overly complex and convoluted.
"You see, all the simple ways of doing it
My reaction was different (Score:2)
So I am not sure at all to what extent this code scavenging is sufficiently helpful to make formalize
Re: (Score:2)
Exactly! You can visualize these modular chunks of code as actual objects actually. Each object could have a series of methods that you could manipulate to make it do thing
Re: (Score:2)
It's just another made-up specialization field for software engineering academics who didn't quite cut it in the real compsci stuff... even they need to write their papers on something.
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
P.S. MSDN docs do often suck... luckily their IDE and Google results are OK.
P.P.S. A fi bun mi sensi.
Re: (Score:2)
Some code cannot be copyrighted. HTML, for instance, cannot be locked away and hidden in a vault because it relies on being there so the browser can render it at the time of the download. Those, HTML is a great "scavenger" language, because it is easy to learn new techniques by poking around the "View Source" code.
Other code that cannot be copyrighted is copylefted. The GPL expressly guarantees that code can be used, studied, modified, and redistributed.
Now... formalizing a lesson plan to teach stude
Re: (Score:2)
Realistically, of course, there's the question of how far you can copyright many (small) HTML fragments and techniques anyway, but that's not the point you were making.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
You're highly unlikely to run afoul of trademark laws and absolutely guaranteed to run afoul of patent law in the US, or virtually guaranteed not run afoul of patent law elsewhere. So you must have meant copyright. Why didn't you just say copyright? Lumping copyrights, patents and trademarks together is futile for most purposes and calling them "property" is dangerously misleading.
I've been thinking of scavenging code from SCO... (Score:5, Funny)
In other news (Score:2)
Vernor Vinge (Score:5, Interesting)
Re: (Score:2)
Re:Vernor Vinge (Score:5, Insightful)
It is one of the most misunderstood concepts about programming. A programmer fresh out of college knows how to write algorithms really well, but has no idea that there are architectural and design land mines waiting for them just down the development road.
Too many development shops will get an app "working" and think that is all there is to development, because no one has the depth of experience to look a year down the road and see that they will need to rewrite the entire app from scratch in order to make the simplest of changes.
I'm sorry, but if your program is not extensible nor maintainable then you really haven't "succeeded" at anything. You've simply fooled yourself into thinking the process is simpler than it is while screwing your clients out of their development dollars.
Re: (Score:2)
I think you're wrong here.
In college, a lot of architectural designs have to be made and made quick. OK, they are for small projects but still those designs have to be made. Plus, you have the luxury of seeing what other people designed and how it work
Re: (Score:3, Insightful)
The method of assigning homework is to give 3 lines on what a program needs to do. For example, write an FTP client that you can use to download a file. The method of grading is if you can download a file or not. The decision on how to get the HW done with the least amount of time spent is an architectural challenge.
Which is what the parent hates: people who only think about getting the product out of the door and who sacrifice things like maintainability.
Even firefox and Mozilla did a few complete rewrites of the various parts. Rewrites are part of programming unfortunately. The nice thing about rewrites is that the programmers now have experience on how to do things better and are able to better compartmentalize the code. I wouldn't say rewrites are terrible things - though they do annoy management and people above to no end.
Firefox & Mozilla are prime examples of bad codebases. We all know the disaster that was the last rewrite in-house at Netscape.
_SOME_ re-writes are necesary and usefull, but a company where the usual way to add a new feature is a re-write of the whole software is doing somethign wrong.
Business Sense (Score:2)
Which, of course, makes good business sense. At least until customers start selecting on maintenance/improvement cost instead of initial development quote.
Re: (Score:2)
Thanks, I already know Python.
code repositories (Score:3, Informative)
we scavenge code online w/e, find it needs to be used by a lot of people
so we inherit the scavenged and put it in a nice module and tada!
this is nothing new
semi-formalized (Score:5, Informative)
Re: (Score:2)
http://snippets.dzone.com/tags/ruby [dzone.com]
http://snippets.dzone.com/tags/ruby/http [dzone.com]
http://snippets.dzone.com/tags/python/windows [dzone.com]
http://snippets.dzone.com/tags/rebol [dzone.com]
And so on..
It's called a "subroutine library" (Score:5, Funny)
The Web 2.0 crowd rediscovers subroutine libraries. Film at 11.
Re: (Score:1)
And to preempt those that like to pun: I mean the "DLL" kind. Not the "lots-o-books" kind.
Re: (Score:3, Funny)
Re: (Score:1, Insightful)
Re: (Score:3, Funny)
You gotta punch it up, PHB-ify it: "Reusable Enabling Action-Oriented Web Object Architecture Patterns".
Re: (Score:3, Funny)
Instead, he answered "uh... cut and paste?"
Re: (Score:2)
thats more just *using* it afaiac so I guess I would have been
stumped by that question too. Lisp can double as knitting patterns
and I guess perl can be a decent approximation to line noise, python
can sometimes be read as intensely obscure haikus but its all a bit
of a stretch.
Re: (Score:2)
At the end of the day, it's all ones and zeros.
Google Code (Score:3, Informative)
Re:Google Code (Score:5, Informative)
http://www.google.com/codesearch [google.com]
foreheat meet desk (Score:5, Funny)
And the first article suggests that trusting the code is an issue, because you didn't write it. Well let's see - it's short, and you just pasted it into your program. But you're not going to bother to read it? You fail. Seriously.
Re: (Score:2)
It might be a new idea for more people than you think. There is this ideal picture of people writing The Perfect Software from scratch, isolated from the rest of the world. And there is the other ideal, people assembling The Perfect Software
Re: (Score:2)
Don't know about you, but I don't have the legal background to figure out what happens to licenses for the code when you copy and paste others code into your project.
Re: (Score:2)
Isn't this a library? (Score:3, Funny)
"Scavanging"? (Score:2, Funny)
Unlikely? (Score:4, Interesting)
Since..when? Recently I've picked up perl again, and I've found more than what I need to scavenge to make my own personal extensions to blosxom [blosxom.com] through google searches.
I mean, granted, it depends on your definition of a bite-size task, but it's a blanket statement no matter which way you spin it.
Eh? (Score:2)
This is a great idea! (Score:4, Funny)
Maybe the C++ language could do it. Then you could just
Hmm, CC++AN sounds pretty dumb. It'd never catch on. Oh well.
Re: (Score:2)
Oh, c'mon! It has one potential use - It sounds like a munged pr0n version of something for Perl [cpan.org]. That alone would make the effort worth it, no?
Today Slashdot jumped the shark. (Score:3, Funny)
Seriously. I'm starting to lose brain cells when I read the "articles" these days.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
It's already been done. (Score:2, Funny)
Re: (Score:2)
(and half a bazillion other projects... Hell, not it's been going on before Microsoft decided they needed a working TCP/IP stack for Windows 2000 or anything...)
A Brand New Idea (Score:1, Funny)
Wow (Score:5, Funny)
Ah, business versus pleasure (Score:1)
When it comes to real business requirements, things are a little milky. I own and operate a small business, and there are a few problems with plucking code from the ether. The first is the notion of hiring a programmer to not write their own code. Getting the job done, an
Re: (Score:2)
A while back, I asked one of my programmers to write a routine to dump out a Perl structure. I said I needed it in about a week. Lo and behold, it worked on-time and all was good.
What is wrong with Data::Dumper? Or rather did you need something that worked with Data::Dumper?
Turns out, as any Perl expert here knows, my programmer simply took the Dump module, which dumps perl structures. I wanted to have the dump be a nice dynamic javascript html table thing, and my programmer told me no -- or rather that it would require him to do it from scratch, and of course now, a month later, we didn't have the time.
Having done a lot of work with Perl, I would say that Data::Dumper is the wrong thing to use for that. TemplateToolkit, OTOH, would allow you to quite quickly generate HTML based on your data structure. A competent Perl programmer shouldn't require too much time to do that, esp. if it is in a different document.
Re: (Score:2)
Part of the problem is that you are going to pay in
Re: (Score:2)
My question to you, as one who uses LSMB with the limitations that you've listed above, is very simple. I refuse, and prohibit my developers from using third-party modules for integrated features. Obviously, LWP, DBI, ImageMagick, some
Re: (Score:2)
My question to you, as one who uses LSMB with the limitations that you've listed above, is very simple. I refuse, and prohibit my developers from using third-party modules for integrated features. Obviously, LWP, DBI, ImageMagick, something PDF, and other extensions of the core don't count. I'm talking about integrated business features because that's my industry. If my industry were database management, then I wouldn't let them use DBI either. But in my world, database performance is not something on which I need a strangle-hold.
I have no idea why you would be worried about the database performance and DBI. If you are doing anything non-trivial, DBI is not going to cause you any headaches. Case in point-- most of our interactions with the db tend to use PostgreSQL for only a tiny portion of the total execution time of the page load.
Similarly, has anyone ever told you that premature optimization is the root of all evil?
My question is, when you reach something that LSMB can't do -- perhaps your cash register example is one -- what do you tell your clients? Do you say something to the effect of "sorry, that's not possible" or "we'll have to rebuild everything from scratch"? or do you say something like "it can't be done properly, but we can bolt-on this work-around"?
Ok, there are very few cases I will tell customers I wont do something. These generally involve cases where ac
Re: (Score:2)
Regarding the DBI, I agree, in my world, it's a barely noticable delay. But in other not so unrelated industries, database access does become the bottle-neck, and DBI does a lot of unnecessary work when you can bolt down both ends of the connection. I find that to be the general way of things with e
Re: (Score:2)
For example, if you are inserting records one by one (o
Re: (Score:2)
I've built a platform that lets me write all languages in a single file. I got sick of splitting up the html, the sql, the javascript, and the css into multiple files because I hated having one business feature spread across multiple files. So my file split is a split of business features.
For example, one t
Re: (Score:2)
Neither of us find specs particularly useful. I don't find UML diagrams terribly useful (actually I generate UML diagrams *from* my work rather than do them first). However, a lot of stuff happens via a proposal process which ensures that various businesses in the community get to weigh in about how to ensure that a feature works well for everyone. Yes it costs the customer a little more in the short run but a *lot* less once others are helping to maintain the co
One more note (Score:2)
Ultimately, my point is that when you control every line of code, you aren't hampered by other people's restrictions. I would have been happy had my programmer written his own dumping code from scratch, but I also would have been happy had he started with the cpan dump code, searned from it, and created a derivative version. Hell, in this case, I'd have been happy if he had studied it to the point where he could have modified it easily.
Sadly here is where you are terribly wrong. They key to doing what you want to do is to understand what you want from the beginning, do the requirements analysis up front, etc. and so forth. There is a *huge* difference between dumping a Perl object (the way Data::Dumper does) and creating a nice HTML document from it.
Your programmer, if he was smart, grabbed the CPAN module, wrote a little wrapper around it, and integrated it into your application. If you had known what you wanted up front, he would ha
Re: (Score:2)
Never will a client ever know what they want ahead of time, and never will anyone want to waid through weeks or months of specification agreements. Also, their business is changing as the project grows and they become integrated around it. There's just no way to determine things ahead of time.
If your clients don't know what they need up front, then your development is going to be disorganized. You will have to charge your customers accordingly (and they will probably pay through the nose). FWIW, I don't get spec agreements from my customers either, but there are many times when we spend weeks ensuring that I understand what they need well enough to do it. It saves months of development time :-)
Secondly there is a *huge* difference between wanting to just dump a Perl structure and wanting to
Re: (Score:2)
I too am a developer. Writing a routine that traverses a perl structure is five lines of code -- ten if you want to go nuts. Any formatting can be shoved in and shoved out as quickly as a developer can type. Whether it's a bunch of tabs a
Re: (Score:2)
See, yours is the perspective that I get from everyone -- including my developers. But that's just it. I do have a package that huge, and flexible, and makes things much easier, and supports dozens of structures in any sort of layout I want. I call it "the developer".
I too am a developer. Writing a routine that traverses a perl structure is five lines of code -- ten if you want to go nuts. Any formatting can be shoved in and shoved out as quickly as a developer can type. Whether it's a bunch of tabs and line breaks to indent a tree, or td's and tr's to produce a table, it's an extra line or two of code. Making it dynamic is just adding id's and onclick's to the td's. So our ten lines becomes twenty. That's it.
Ok, IMO, this is a maintenance nightmare. BTW, traversing Perl structures can be tricky. I do have to do it on occasion to process various things (for example, ensure that certain objects are turned into strings prior to template processing) and yes, one can do recursion in 10 lines, but there will be corner cases that don't get handled properly. Perl being Perl, you can probably do it in fewer lines, but will drastic
Re: (Score:2)
Nothing I'm doing is difficult. Forty million if statements, basic iteration, rare recursion, and a whole lot of simple code. The trick is only knowing where everything is.
Think after t
Re: (Score:2)
See, I disagree with your last point. For as long as software developers aren't software engineers, then you have two things. First, you have programmers who can't see three steps ahead. Second, you have designers who have no idea how things are actually implemented. Third, you have wasted energy passing between the two.
What I meant was that I can out-design most programmers out there. However, I have met a lot of developers who can out-code me. There is a deceptively large gap between those skill sets. I have been burned by assuming that there was less of a gap than there is. I am guessing that you are in the latter camp.
Nothing I'm doing is difficult. Forty million if statements, basic iteration, rare recursion, and a whole lot of simple code. The trick is only knowing where everything is.
IMO, nearly every application which relies on if statements for its structure is a maintenance nightmare.
And now, they call you in for a mystery meeting -- my all time favourite; it means free business. So you walk in two hours later, and they want a feature. They ask only two things: when, and how much. I take thirty seconds of silence, and I give them an answer. I know every line of code, so I know what's involved. And I've done shit before, so I know how long it takes. After all, they're asking for something as simple as an isolated feature, or a wide-ranging integrated cross-system entanglement. Either way, it's either something I've done a dozen times, or something that I planned they might want down the line.
But do you understand what, exactly, they need? 99% of my time in requirements engineering
Re: (Score:2)
Really, and it's hard to say this without insulting myself, nothing I do is complicated. Sure there are lots of parts, but it's like a chain. You make itty bitty links. Each one the same as the previous one. Not one of them complex. And then it holds hundreds or thousands of pounds. I've simply become really adept at making links of
Re: (Score:2)
We prefer the second set. Those that we trust allow us to walk away, and trust that they know what they're doing, and will do it well. Now I see what my clients see in me. They make widgets. They don't make software. They don't want to make software. They don't want to learn how to make software, and they don't want to supervise someone else making software. They just don't want to get screwed -- especially the ones that have been screwed in the past. I get to convince them that any supplier can screw them, and taht they need to trust me. Those that let me earn that trust get everything they want and more, without any work or worry.
With all due respect this doesn't sound like they are ceding business decisions to you. It sounds like you are essentially selling them a customized version of an off-the-shelf sort of product. This is sort of different.
Ok. In a previous post you mentioned web catalog software. In general, web catalog software is pretty straight-forward. However, where the real difficult issues come in have very little to do with software, building software, deciding what the software needs to do, etc. They are the s
Re: (Score:2)
Business concerns are fundamentally human issues. They are questions like "How many people do I need to hire to make sure that the content on the web site is correct?" "How do I go about ensuring that employees aren't stealing from my business?" and "How do I encourage customer loyalt
Re: (Score:2)
I have found the most scavenged code on earth (Score:1, Funny)
using namespace std;
int main()
{
cout "Hello World!";
return 0;
}
oops (Score:2, Funny)
Microsoft Popfly? (Score:2, Informative)
http://www.popfly.ms/Overview/ [popfly.ms]
Compare to other engineering materials (Score:2)
Standardized and well-understood components save a vast amount of effort in other engineering fields and help produce results that are more easily verified to be good.
Why not apply the same approach to software engineering? Isn't that the greatest promise of open source?
Re: (Score:2)
That is very different than taking a bunch of code, stitching it together, and building a system out of it.
Re: (Score:2)
Not really, libraries are an implementation, not a ISO/ANSI/DIN/whatever standard, so their behavior is defined for most part by implementation, not by specification. So you don't find an independent implementation of GTK or QT or most other stuff, aside from a few libraries that clones other libraries (i.e. lesstif vs motif, Wine, etc.), but even there its not following a defined standard but just cloning things as best as you can.
The only part of programming
Re: (Score:2)
Not really, libraries are an implementation, not a ISO/ANSI/DIN/whatever standard, so their behavior is defined for most part by implementation, not by specification. So you don't find an independent implementation of GTK or QT or most other stuff, aside from a few libraries that clones other libraries (i.e. lesstif vs motif, Wine, etc.), but even there its not following a defined standard but just cloning things as best as you can.
Some libraries are indeed standardized (standard libraries for C for example). I guess the question is why one would want every library to be standardized. There is, however, a standard *process* for determining what people should put into libraries: Analyze the code, identify reusable routines, and put them in libraries.
I think that is a huge problem, not only are the standard libraries to small, but they are often also terrible buggy and outdated, i.e. string handling in C is a total garbage, you have function like gets() which will cause buffer/heap overflows no matter what you do, you can't use that function correctly. And the alternatives like fgets() and such are cumbersome to use so that everybody ends up building its own little string library. I think in a day and age where security updates are installed on a weekly basis it would really help a lot to extend the standards a little more often instead of just every ten years, so that often used parts are more easily available and don't have to be reinvented each and every time. This would also help a lot in making libraries more compatible to each other, since you wouldn't need to convert forward and backward between types that are basically identical in their design.
Ok, that is a fair statement gets() really should be removed and replaced by something with a max size argument.
However, since standards bodies move at a glacial rate, why not create a
The Title (Score:2)
The reuse of old code... (Score:2, Informative)
Objects anyone? (Score:2)
Re: (Score:2)
Where objects help is in keeping related code together, but that is more syntactic sugar for stuff you would expect in a clean program anyway.
This reminds me of another book ... (Score:3, Interesting)
Re: (Score:3, Informative)
Extremely Simple example (Score:2)
int Tree( Char ***Node1, char **data)
Sort a-z
return height
int retreive (char **return, char **searchfor, int length)
return found
For simple functions it works well, for open source code it works great.
Between those levels lies the question, why aren't you using their code anyway?
First-class Copy & Paste for Code Reuse (Score:2)
How many times... (Score:2)
code snippet?
Oblig Quote (Score:2)
Re:scavanging turds is mainstream (Score:5, Funny)
Indeed...we need a -5 Asshole.
Re: (Score:2)
A pro