Which Compiler to Extend for a Small Project? 89
Andreas(R) asks: "While planning the design of my small programming language, and would appreciate some lessons learned from experienced programmers which have already tried this. I was investigating whether to start from an existing compiler and extend it. The compiler will be based on yacc, or bison. The programming language will be interpreted, object oriented and have higher order programming. Perl 1 seems like a decent starting point, as it's yacc based, and 5000 lines of code. Later versions of Perl are too large to get a good understanding of the whole program in a short period of time. Perl also has the right license (GPL). Is Python out of the question for such a project, since it's not GPL? What other small languages can be used instead? How do I go about designing a small programming language in practice, using what I already know about compiler theory?"
Holy cow! (Score:5, Insightful)
While this seems like beaucoup fun, I'd question the need to extend an existing language by altering the compiler. Towards that end, you might want to use LISP or Scheme, as language extension is built into the language. ( See what Paul Graham has to say about the subject) [yahoo.com]
Re:Holy cow! (Score:4, Informative)
Re:Holy cow! (Score:5, Interesting)
Re:Holy cow! (Score:2)
This is going to get me in trouble (Score:5, Informative)
The source is there. Use it as a base for your own program. Change it enough that you aren't blatantly copying it. Release it under your own license. If you like the GPL, then do that. Some of us like less restrictive licenses like BSD or the original (non-GPL compatible) Artistic License.
Definitely, go with Perl 1. From the look of it, it seems to have some pretty good foundations to build upon. Taking a look at where the language itself is now, there's much improvement needed in v1. It seems like a pretty good place to start.
Don't be restricted by licenses on small projects, much less ones that are essentially abandoned.
Re:This is going to get me in trouble (Score:4, Interesting)
I would add that (given time) that he may want to look at SmallTalk [smalltalk.org]also.
At least for inspiration.
Re:This is going to get me in trouble (Score:3, Informative)
but Re:This is going to get me in trouble (Score:5, Informative)
The GPL may grant you other privileges if you abide by its conditions. (Though AFAIK most PERL is licensed under the PERL Artisitic License which is more permissive than the GPL.)
Sam
Re:This is going to get me in trouble (Score:2)
-Sincerely, andreyw
Perl isn't just GPL. (Score:5, Informative)
Try a functional language for this (Score:5, Informative)
Functional languages, especially those with pattern matching primitives (like ML or Haskell), are really good at this kind of program -- in fact, writing compilers and interpreters is really their shining point. I highly recommend using one of these languages rather than C. Lots of undergraduate computer science classes (like 15-212 and 15-312 [cmu.edu] at my university) write interpreters in functional languages as part of the curriculum. You could try to find some course notes... I speak from experience when I say that this beats the hell out of mucking around in C or C++, not really knowing what you're doing.
Rewrite! (Score:5, Insightful)
If you have what it takes to write a new language, then you would be best starting from scratch. Read 3-5 codebases, make a list of the things you liked/didin't like and start out on you own. In the long run, having written the thing your self will give you the advantage. You will know intuitively how everyhting works and how extensions will fit in. That will give you a 2x advantage.
Don't be afraid to read over two other existing implementations as you go. Sharing ideas is very important.
An approach I have also taken is re-write a program in parts. You pick a major component, and replace it with what you need. This gives you testing check-points. The more often you get to a working state and test, the less time you will spend debugging overall. If you can look at perl 1 and determine you can add to it to make the parser a super-set of what you want you could start there. Then you can write something that interprets the byte-code output (the subset generated by your language) to what you want, and write your own interpreter. Then you can tackle replacing the parser and byte code generator... With flex and bison, that should be easy enough. But plan to replace the entire thing. Otherwise you will spend a lot of time reworking things that are not really what you want. If you discover a few gems along the way that you want to keep/port all the better, just don't take on any crutches.
Oh, I would recommed the STL if you are in a hurry. I don't know why functional is better, personally it just gave me a headache in school. But some people claim great results with it...
Finally, a real language must be able to compile itself. Or at least generate it's own byte-code in the case of interpreted langauges. Think about it! You could hack perl 1 to generate your byte code, and then write your parser in perl 1 and have a self compiling language.
Why functional is better (Example) (Score:3, Informative)
Among the many reasons that functional languages are superior for this kind of task is algebraic data types and pattern matching. Since C++ and Java don't even have sum types, the only way to simulate this is with a load of objects and "instanceof" stuff. For example, here's a simple calculator interpreter in ML:
datatype exp =
Int of int
| Plus of exp * exp
| Tim
Re:Why functional is better (Example) (Score:2)
Re:Why functional is better (Example) (Score:2)
Re:Why functional is better (Example) (Score:2)
While I admit it is not in the core of C++, Boost have some nice sum types (boost::variant and boost::any). Variant would allow you to do the same thing that you describe above, recursively descending through parse trees very simply. Though many features of Boost will apparently make it into the next C++ standard.
Re:Rewrite! (Score:3, Insightful)
Not necessarily.
A more accurate statement would be "a real general purpose language must be able to compile itself.".
Some languages fulfill narrow requirements that may not include compiling.
Adding compiling ability to such a language may make it less efficient for fulfilling its primary purpose.
Some examples of languages that would probably be made worse if self-compiling ability were added: SQL, APL, and most "descriptive" languages (e.g., HTML, XML, and ot
Two quick comments (Score:5, Insightful)
you might want to check out ANTLR [antlr.org].
Did you just say "I can only use GPL'd things. Python isn't GPL'd. Can I use it?" I'll assume you meant something like "I want something with a nice license. Does Python have a nice license?" instead. If that's what you meant, you should check out the Python license [python.org] for yourself. Summary: it's a nice license. It's a certified Open Source license, imposes fewer restrictions than the GPL and is compatible with the GPL.
Python definitely doesn't fit into 5000 lines of source, though, so Perl1 might be a better bet. PyPy [codespeak.net] is pretty cool, if you're looking for something smallish and Pythonish.
Sorry if you're looking for something else and these comments turn out to be totally useless.
More quick comments (Score:5, Informative)
I agree completely that only looking for GPL code is a mistaken approach, as the GPL is more restrictive than other licenses. You can almost always include BSD code in GPL, but not vice versa. In any case, the GPL is probably not an ideal license if you want your language to be very widely used.
Thirdly, if you want to look at some nice source code and an interesting way of doing things, have a look at Tcl. It's C sources (modulo the regexp package which came from somewhere else) are some of the nicest I have read. Beautifully commented, and very clear to read. Also, Tcl is a smart approach for a language designed to be extensible.
The article's author really needs to state his purpose, though... just for fun? To 'get famous'? For work? As a homework assignment?;-)
Re:More quick comments (Score:2)
If, OTOH, you want to encourage people to modify the compiler/interpreter and sell *that*, then the GPL might not be the best choice.
Re:More quick comments (Score:2)
Re:More quick comments (Score:2)
If what the article author wants is for companies to develop the compiler/interpreter itself and sell *that*, then yo
Re:Two quick comments (Score:2)
Object BF (Score:5, Funny)
Re:Object BF (Score:1)
Uh. (Score:3, Interesting)
How to implement an OO language on a Turing Machine: Do something equivalent to how OO is implemented in C, ie. store "function pointers" (tape addresses) along with the data and use indirection to call those every time a virtual method is needed.
Re:Uh. (Score:3, Interesting)
Quite interesting... (Score:2)
Re:Quite interesting... (Score:2)
Re:Uh. (Score:3, Insightful)
Practicality... (Score:2)
The GP was suggesting that OO was somehow not implementable on a TM and I felt compelled to correct him/her/it.
IIRC, it has also been shown that algorithms always stay within their "class" (P, NP, P-space, etc.) when transformed from a TC language to another TC language. So your observation about complexity is only half true in that an O(n) algorithm may turn into an O(n^10), but it will never turn into an exponential-time algorithm wh
Re:Practicality... (Score:1)
Re:Object BF (Score:1)
Re:Object BF (Score:2)
(Though in fact,
Add one to brainfuck giving brainfuck+1
would have the same effect (anything except +-,.>[] gets ignored by the compiler).
my $0.02 after a couple compiler classes (Score:5, Informative)
if you have never taken a compiler class before or written a compiler on your own, i suggest the following:
there are several "toy" grammars out there which allow you to do useful stuff (recursion, 'interesting' data structures [i.e. self referential], etc.) without wading through a lot of useless cruft (implementing huge amounts of runtime support, for example). i'd go with one of those. once you're comfortable that you can make one of these learning languages work, then try to hack one of your own.
this all said, good luck! i am by no means an expert on compiler construction (worked on a custom in-house scripting language as an intern a couple years back and had to take compiler classes to satisfy breadth requirements for my m.s. c.s.) but i do hope this is a little bit useful.
Re:my $0.02 after a couple compiler classes (Score:5, Interesting)
Re:my $0.02 after a couple compiler classes (Score:2)
If I read the ANTLR pages correctly, it will only permit you to use your language of choice if your language
Re:my $0.02 after a couple compiler classes (Score:2)
Re:my $0.02 after a couple compiler classes (Score:2)
Licenses (Score:4, Informative)
You seem to be concerned about the limitations the license puts on you. Python's license is less restrictive than the GPL - read about it (http://python.org/psf/license.html).
One of many things this means is that if you decide the Python License isn't restrictive enough for you, you can relicense the combination of Python plus your changes under the GPL, as long as you adhere to Python's license (leaving its copyright and other required information intact).
A small language for what? (Score:5, Informative)
If you just want to build a language that will teach you about programming languages, look at old fashioned Pascal [inf.ethz.ch] not Delphi or Kylix or even Turbo Pascal, but good old-fashioned Jensen and Wirth 1974 Pascal.
If you want to design a programming language, the best advice is to write some code in the proposed language. Remember Tony Hoare's rule, and keep it simple. Most programming languages, from Perl to Python to Java 5, suffer from being accumulations of features.
Have a look at Ruby, Modula-n, Oberon, and so forth.
If you're looking to learn lots about programming in general, think about things you want to do, and construct a lanaguage that does them. Icon is a nice example. Look at SNOBOL, if only because you'll appreciate the "five miles through the snow" stories we old farts tell.
Re:A small language for what? (Score:2)
But yeah, that's the tricky part. You need to both keep it simple and have what you need to do everything you want to do.
Of course, that's also why you need to ask "what do you want to do with the language?"
As far as formality goes, the problem -- and I speak as a confirmed and evangelical formal methodist -- with attempting to determine what you should include with a formal method is that formal methods can easily tell you if what you got is what you wanted, but they can't tell
Pascal and/or Basic are the standard (Score:1)
A few ideas (Score:5, Insightful)
1. small languages like lua [lua.org], io [iolanguage.com], and scheme [mit.edu] that are small in the built-in libraries and in the total distro. These three are great places to start- both are small, OOPish, allow higher-order programming by passing classes, objects, functions and methods as objects.
2. Then there are languages that are big in some ways, but small in syntax. Some of these are easier to extend than so-called "little languages." The reason is usually that their syntax is small, in an isolated place, easy to get at, and meant to be modified. The two best examples for this are Smalltalk [smalltalk.org] and Lisp. Both of these languages satisfy your other requirements and really kick ass for extention. Unlike the above languages, the so-called little-languages, most Smalltalk and Lisp dialects have big, useful libraries. Unlike a big fat language like perl or C++, having a useful library doesn't mean that the language is a huge pain in the ass to extend.
Both Lisp and Smalltalk have a number of implementations. I am a big fan of Squeak Smalltalk [squeak.org], though systems like Little Smalltalk [smalltalk.org] or even GNU Smalltalk [gnu.org] maybe worth checking out.
A lot of people here have bad feelings about Lisp-like languages. It's a shame, since Scheme, ISLISP (OpenLisp is a great implementation) and Common Lisp are all *very* powerful languages. You can be quite productive with them once you get over the part about whining about parens. But Lisp may very well be the best option here, there is a long history of people writing custom-syntaxes and language extensions. Look up Common Lisp macros- power almost beyond comprehension, a lot of fun to play with, and with an elegance all its own.
There are examples of people writing a C-like syntaxes [umin.ac.jp] for various Scheme implementations. IIRC, Gambit-C [umontreal.ca] (a Scheme to C compiler) comes with one. On Cliki, there are a bunch of other alternative Scheme syntaxes [tunes.org] listed.
To, one of the big advantages to using a language in the second category is that syntax extension/modification is done in the language itself, rather than in C. With that comes the familiarity of the language you're creating and the other benefits you gain by using a high-level language like Smalltalk or Common Lisp.
Just some thoughts...
Try a compiled Lisp (Score:1, Interesting)
As it happens, there already exists a class of languages that are strong at manipulating syntax trees, and at writing parsers. Several of them also support dynamic compilation, meaning that your language implementation can choose when to stop dicking with the syntax tree and instead
Scheme is the way to go (Score:1)
Reason Two, lots of nice literature available online. Check out Shriram Krishnamurthhi's text
Programming Languages: Applications and Interpretation on his website at http://www.cs.brown.edu/~sk [brown.edu]
Reason Three, its already conceptually an AST, so you can get involved in the more interesting work sooner
It depends on your goals (but usually Lisp) (Score:5, Informative)
I'm not really sure what your goal here is. Are you wanting to write a compiler for the sheer joy of writing a compiler? That's a good goal, for sure, and I recommend that every programmer write a compiler or two during their careers, or at least some interpreters.
On the other hand, maybe you want to extend an existing language because you need some specific language feature. Also a good goal, but I do want to caution you to evaluate existing languages first; you may find that some language does what you want, or makes it easy to write a language library that does what you want.
Or maybe you want to write an interpreter to script a bigger program. Then I'd say that you may be better off using something that's already there.
In the first case, if you want to learn how to write a compiler, I generally recommend writing Scheme, or some other simple Lisp. Scheme has advantages in that its parse tree representation is obvious (that's why Lisp looks like it does), the structure of an interpreter and of a compiler are quite similar, and it covers the fundamentals of compilers without burdening you with a bunch of cruft. (If this is your first compiler, you may want to leave out continuations and garbage collection for now.) The very excellent book Structure and Interpretation of Computer Programming [mit.edu] has what you need: chapter 4 is about writing interpreters (for Scheme, some modified Schemes, and Prolog), and chapter 5 is all about writing a compiler. All the code in SICP is in Scheme, but the book starts from the beginning with (+ 1 1).
SICP may be too academic for your taste, so you may prefer Paradigms of Artificial Intelligence Programming [norvig.com] instead. It uses Common Lisp, and has a little bit more practical feel that SICP. Chapter 22 is about writing a Scheme interpreter, and chapter 23 is about writing a Scheme compiler. Unlike SICP, PAIP doesn't cover the garbage collector. PAIP uses Common Lisp, and although it has enough of an introduction to be a "refresher" for somebody already familiar with Lisp, it's not really suitable for learning the language.
It's really simple to write Lisp in Lisp. Indeed, a month ago I wrote one for a comp.lang.lisp post [google.com], just to get a silly quine to work! That's why I keep talking about Lisp and Scheme: it's easy to do.
If Scheme isn't your thing, then Pascal may be a good alternative. Don't try to get fancy with heap allocation, pointers, objects, and other new add-ons; I'd start with plain old Wirth-designed Pascal, and get fancy later if you want. Pascal is really designed for a classroom setting. I have a dim memory of Ada being used for this purpose in some books, but I can't speak very authoratively there.
Of course, the definitive book on writing compilers is the Dragon Book [wikipedia.org]. But you may want to be familiar with some basic CS theory about FSMs [wikipedia.org] first.
Now, that's if you want to learn about compilers for the joy of learning about compilers. But what if you just need one particular language feature for your problem, and that's why you want to write a compiler? Well, then I'd suggest you make sure you've looked at a lot of different languages first. Some languages have surprising features that may let you write a small in-language library to do what you need, instead of needing to extend the compiler.
Lisp is a good candidate for this. John Foderaro once described Lisp as "a programmable programming language". You can alter the language to suit the problem at hand, instead of having to work the other way around. At work now, I use Lisp as a bridge between two very different programming languages because I can extend it in both directions to cover what I nee
Lua (Score:4, Informative)
http://www.lua.org [lua.org]
In fact, I recommend you not create a new language at all and just use Lua. It's a deceptively simple language that allows you to extend it through certain meta-constructs to become pretty much anything you need-- from simple data description to full Object-Orientation.
I once read through the entire dragon book with the intention of creating my own language; I gave it all up when I found Lua.
Good stuff.
Re:Lua (Score:2)
That's my case too. Lua is absolutely great!
for a long time, i've been looking at Lisp and Scheme for more expressive power than usual languages. (in fact, my first C program was a stupidly slow lisp interpreter). but the syntax is just too awkward.
i also looked at lazy evaluation languages, but couldn't wrap my mind on that model.
a few months ago i found Lua... and it's perfe
Re:Lua (Score:1)
Otherwise, I would recomment looking into Lisp or Haskell. There are nice books like Essentials of programming languages [amazon.com] dealing with Lisp/Scheme and how to implement a new language with them. No previous experience of Lisp/Scheme is required for this book.
Another book which is more technical and performance oriented is Lisp In Small Pieces [amazon.com].
Good luck.
Regards, Tommy
Lisp (Score:1)
OCaml (Score:3, Interesting)
That said, interpreted languages are stupidly easy to do, so much so that it's hard to learn much -- they forgive every mistake until long after it's too late to fix it. (Witness Perl.) A high-performance language is a Good Thing not particularly because the programs run faster, although that's a nice side benefit, but rather because it enforces a fundamental rigor. There's no faking performance. When you face that kind of problem squarely, God speaks to you, and if you listen carefully enough you can learn deep truths.
But that's not for everybody. Most people have had any desire to learn anything, deep or otherwise, beaten out of them. By all means go for easy comfort, the economic vitality of the nation depends on people like you. Forget I said anything. Garbage collection, bytecodes, regexps, yeah!
Have A Look At LLVM (Score:3, Interesting)
While you haven't provided enough details to comment in length, I do have some experience with what you're planning. A couple of years back I started a programming system (XPS) which was rather audacious in scope. After two years of working on it, I realized that I too needed a "back end" compilation system. I looked at various alternatives like GCC (too complex), research compilers (low quality), open source virtual machines like Mono (immature at the time). I was quite surprised when I looked at UIUC's LLVM compiler toolkit. I thought it would be just another half-baked compiler system from a University that never got finished when the Ph.D student left. Instead, I found a well designed, working, *toolkit* for compiler construction. While LLVM still lacks some features, its core is very solid and easily extensible. I've been working with it for a year now and its been quite a pleasure. Check it out at http://llvm.cs.uiuc.edu/ [uiuc.edu]
Re:Do not design a programming language. (Score:2, Insightful)
Re:No, stupid. (Score:1, Troll)
Just how is he supposed to aquire that knowledge. I guess the people you mentioned were born smart like yourself.
He didn't say he was creating the next big thing in compiler tech to replace all others. It sounds like he is in school trying to learn something new.
You must know my dad! He always said don't get wet until you know how to swim. Of course maybe this guy is jumping in the deep end and your just trying to save him.
Signed,
Clearly Stupid
Don't extend. Its overrated. (Score:5, Informative)
Honestly, its easier to write a recursive descent parser by hand for a programming language than you think, and interpreters are ridiculously easy unless you're worried about making it fast, which is way overrated too. It mattered with 640KB of RAM at 20MHz, but these days, its just stupid to care unless you notice its insanely slow.
First off, if you've not found this link: http://compilers.iecc.com/crenshaw/ [iecc.com], then I recommend you start with it. While its about writing a compiler, it really help make parsing much clearer.
Scheme is a good language to check out if you want to start with another design(a scheme interpreter can be written in a few hours, even in C, if you're slick, even if you're not, it would be short project to get 90%).
Some other reference material: Parsing Techniques [cs.vu.nl](free online). Also: Modern Compiler Design [cs.vu.nl] by the same guys and well worth the investment. Concepts, Techniques, and Models of Programming Languages [ucl.ac.be], teaches kernal theory of language design, and may open your mind to some other techniques you may not be aware of.
Checking out the archives on Lambda The Ultimate [lambda-the-ultimate.org] would be wise too. Also, if you're in Boston on December the 4th, you might check out the Lightweight Languages Workshop [mit.edu] at MIT.
Re:Don't extend. Its overrated. (Score:2)
Moreover getting some decent error messages out of RD parsers is easier.
Battery life (Score:3, Interesting)
It mattered with 640KB of RAM at 20MHz
How much CPU power is in inexpensive handheld devices again? I program for one that has 384 KB of RAM at 16.8 MHz. Wouldn't overclocking a handheld device drain the battery faster?
Use a simple parser (Score:3, Interesting)
Forget about using a Virtual Machine. That is nice for speed, but it requires a lot of work. Beter make an interpretter that interprets the abstract program tree. I once started doing this for a JavaScript interpretter, but I never came to finish it.
Parrot (Score:5, Informative)
I also recommend you to have a look at Lua [lua.org] which is a minimalistic yet beautiful language..
Compatible (Score:2)
Choose something that requires some extra work to get compatible with some other language.
Don't even bother too much to write an initial compiler, and go straight to the horrible task of making something compatible.
While you are making an own language, dealing with compability issues will lead you to more pitfalls than a straight clean 1st order approach of some language than.
Parrot! (Score:5, Informative)
Lots of people are doing that right now, and you could learn a lot from the included compilers for different languages, all in varying stages of completion.
Very good suggestion (Score:3, Insightful)
Don't be a pussy .. (Score:2)
Re:Don't be a pussy .. (Score:1)
machine code
Slacker. REAL programmers wave magnets over the disk to code.
Suggested reading/viewing (Score:2, Interesting)
People suggesting Lisp/Scheme: Sure, these languages are extensible, but any extension of Lisp/Scheme you can create with macros or whatever will still look like Lisp/Scheme. If the point of this exercise is to design one's own l