Which Compiler to Extend for a Small Project?

Follow Slashdot stories on Twitter

Which Compiler to Extend for a Small Project? 89

Posted by Cliff on Wednesday November 10, 2004 @11:10PM from the ground-floor-of-computer-language-construction dept.

Andreas(R) asks: "While planning the design of my small programming language, and would appreciate some lessons learned from experienced programmers which have already tried this. I was investigating whether to start from an existing compiler and extend it. The compiler will be based on yacc, or bison. The programming language will be interpreted, object oriented and have higher order programming. Perl 1 seems like a decent starting point, as it's yacc based, and 5000 lines of code. Later versions of Perl are too large to get a good understanding of the whole program in a short period of time. Perl also has the right license (GPL). Is Python out of the question for such a project, since it's not GPL? What other small languages can be used instead? How do I go about designing a small programming language in practice, using what I already know about compiler theory?"

This discussion has been archived. No new comments can be posted.

Which Compiler to Extend for a Small Project?

Load All Comments

Search 89 Comments Log In/Create an Account

Comments Filter:

Holy cow! (Score:5, Insightful)

by sidecut ( 126820 ) writes: on Wednesday November 10, 2004 @11:15PM (#10783987) Homepage Journal

Can you give us a few more specifics on what the language will be used for? Will it be embedded? Database connected? Real-time? Interactive?
While this seems like beaucoup fun, I'd question the need to extend an existing language by altering the compiler. Towards that end, you might want to use LISP or Scheme, as language extension is built into the language. ( See what Paul Graham has to say about the subject) [yahoo.com]

Share
twitter facebook
- Re:Holy cow! (Score:4, Informative)
  
  by sidecut ( 126820 ) writes: on Thursday November 11, 2004 @12:40AM (#10784550) Homepage Journal
  
  On the other hand, Paul Graham seems to like creating new programming languages [yahoo.com]. Very useful observations on how to go about creating a new language.
  
  Parent Share
  twitter facebook
  - Re:Holy cow! (Score:5, Interesting)
    
    by xp ( 146294 ) writes: on Thursday November 11, 2004 @02:41AM (#10785171) Homepage Journal
    
    Or why don't you design a meta-language using which other languages can be designed -- a language that remains completely extensible -- something like MDef [sourceforge.net].
    
    Parent Share
    twitter facebook
    - Re:Holy cow! (Score:2)
      
      by pkhuong ( 686673 ) writes:
      
      Oh, you mean Lisp? :p
This is going to get me in trouble (Score:5, Informative)

by Dancin_Santa ( 265275 ) writes: <DancinSanta@gmail.com> on Wednesday November 10, 2004 @11:21PM (#10784012) Journal

Look. The source code to Perl 1 is only 5000 lines long. The source code is open and available for anyone who wants to use it for research and investigation as well as for educational purposes. If we had to release all our source code because at one time in our lives we saw some source code that was covered by the GPL, none of us could ever get jobs programming.

The source is there. Use it as a base for your own program. Change it enough that you aren't blatantly copying it. Release it under your own license. If you like the GPL, then do that. Some of us like less restrictive licenses like BSD or the original (non-GPL compatible) Artistic License.

Definitely, go with Perl 1. From the look of it, it seems to have some pretty good foundations to build upon. Taking a look at where the language itself is now, there's much improvement needed in v1. It seems like a pretty good place to start.

Don't be restricted by licenses on small projects, much less ones that are essentially abandoned.

Share
twitter facebook
- Re:This is going to get me in trouble (Score:4, Interesting)
  
  by SpaceLifeForm ( 228190 ) writes: on Wednesday November 10, 2004 @11:32PM (#10784071)
  
  I can't argue your excellent points.
  I would add that (given time) that he may want to look at SmallTalk [smalltalk.org]also.
  At least for inspiration.
  
  Parent Share
  twitter facebook
- Re:This is going to get me in trouble (Score:3, Informative)
  
  by AuMatar ( 183847 ) writes:
  
  Its perfectly legit to read a GPL program and emulate its functionality. You don't need to copy it, and its legal. GPL is a copyright, if you write your own program that acts the same it isn't a copyright violation.
  - but Re:This is going to get me in trouble (Score:5, Informative)
    
    by samjam ( 256347 ) writes: on Thursday November 11, 2004 @05:15AM (#10785704) Homepage Journal
    
    If you do as the parent says and "use it as a base" and "change enough so you aren't blatantly copying it" then you are still copying it and restricted in distrution by copyright law.
    
    The GPL may grant you other privileges if you abide by its conditions. (Though AFAIK most PERL is licensed under the PERL Artisitic License which is more permissive than the GPL.)
    
    Sam
    
    Parent Share
    twitter facebook
- Re:This is going to get me in trouble (Score:2)
  
  by andreyw ( 798182 ) writes:
  
  Your ideas are intriguing to me and I wish to subscribe to your newsletter.
  
  -Sincerely, andreyw
Perl isn't just GPL. (Score:5, Informative)

by hattmoward ( 695554 ) writes: on Wednesday November 10, 2004 @11:33PM (#10784078)

Also available under The Artistic License [perl.com]. You might like it over the GPL.

Share
twitter facebook
Try a functional language for this (Score:5, Informative)

by Tom7 ( 102298 ) writes: on Wednesday November 10, 2004 @11:34PM (#10784083) Homepage Journal

Perl might not be the best starting point if you want to learn something about desining a programming language. If you're thinking of using that, I'd say you're better off starting from scratch.

Functional languages, especially those with pattern matching primitives (like ML or Haskell), are really good at this kind of program -- in fact, writing compilers and interpreters is really their shining point. I highly recommend using one of these languages rather than C. Lots of undergraduate computer science classes (like 15-212 and 15-312 [cmu.edu] at my university) write interpreters in functional languages as part of the curriculum. You could try to find some course notes... I speak from experience when I say that this beats the hell out of mucking around in C or C++, not really knowing what you're doing. ;)

Share
twitter facebook
- Rewrite! (Score:5, Insightful)
  
  by acidrain ( 35064 ) writes: on Thursday November 11, 2004 @12:23AM (#10784429)
  
  Adapting something that is not really what you are making is just taking on a crutch. You get moving faster to start, but become dependant on something that slows you down.
  
  If you have what it takes to write a new language, then you would be best starting from scratch. Read 3-5 codebases, make a list of the things you liked/didin't like and start out on you own. In the long run, having written the thing your self will give you the advantage. You will know intuitively how everyhting works and how extensions will fit in. That will give you a 2x advantage.
  
  Don't be afraid to read over two other existing implementations as you go. Sharing ideas is very important.
  
  An approach I have also taken is re-write a program in parts. You pick a major component, and replace it with what you need. This gives you testing check-points. The more often you get to a working state and test, the less time you will spend debugging overall. If you can look at perl 1 and determine you can add to it to make the parser a super-set of what you want you could start there. Then you can write something that interprets the byte-code output (the subset generated by your language) to what you want, and write your own interpreter. Then you can tackle replacing the parser and byte code generator... With flex and bison, that should be easy enough. But plan to replace the entire thing. Otherwise you will spend a lot of time reworking things that are not really what you want. If you discover a few gems along the way that you want to keep/port all the better, just don't take on any crutches.
  
  Oh, I would recommed the STL if you are in a hurry. I don't know why functional is better, personally it just gave me a headache in school. But some people claim great results with it...
  
  Finally, a real language must be able to compile itself. Or at least generate it's own byte-code in the case of interpreted langauges. Think about it! You could hack perl 1 to generate your byte code, and then write your parser in perl 1 and have a self compiling language.
  
  Parent Share
  twitter facebook
  - Why functional is better (Example) (Score:3, Informative)
    
    by Tom7 ( 102298 ) writes:
    
    Oh, I would recommed the STL if you are in a hurry. I don't know why functional is better, personally it just gave me a headache in school.
    
    Among the many reasons that functional languages are superior for this kind of task is algebraic data types and pattern matching. Since C++ and Java don't even have sum types, the only way to simulate this is with a load of objects and "instanceof" stuff. For example, here's a simple calculator interpreter in ML:
    
    datatype exp =
    Int of int
    | Plus of exp * exp
    | Tim
    - Re:Why functional is better (Example) (Score:2)
      
      by n1ywb ( 555767 ) writes:
      
      Except it's ML so you're the only person on earth who can read it :) J/K don't hit me.
      - Re:Why functional is better (Example) (Score:2)
        
        by Tom7 ( 102298 ) writes:
        
        Slashdot of all places should be immune to this kind of argument. Don't you guys pride yourselves on picking technology (even underground technology) based on its merits, not its popularity or corporate backing?
    - Re:Why functional is better (Example) (Score:2)
      
      by TwistedSquare ( 650445 ) writes:
      
      Since C++ and Java don't even have sum types
      While I admit it is not in the core of C++, Boost have some nice sum types (boost::variant and boost::any). Variant would allow you to do the same thing that you describe above, recursively descending through parse trees very simply. Though many features of Boost will apparently make it into the next C++ standard.
  - Re:Rewrite! (Score:3, Insightful)
    
    by some guy I know ( 229718 ) writes:
    
    a real language must be able to compile itself.
    
    Not necessarily.
    A more accurate statement would be "a real general purpose language must be able to compile itself.".
    Some languages fulfill narrow requirements that may not include compiling.
    Adding compiling ability to such a language may make it less efficient for fulfilling its primary purpose.
    Some examples of languages that would probably be made worse if self-compiling ability were added: SQL, APL, and most "descriptive" languages (e.g., HTML, XML, and ot
Two quick comments (Score:5, Insightful)

by poincaraux ( 114797 ) writes: on Wednesday November 10, 2004 @11:37PM (#10784096)

I can't really tell when you're asking questions and when you're stating project requirements, but ..

I was investigating whether to start from an existing compiler and extend it. The compiler will be based on yacc, or bison.

you might want to check out ANTLR [antlr.org].

Perl also has the right license (GPL). Is Python out of the question for such a project, since it's not GPL?

Did you just say "I can only use GPL'd things. Python isn't GPL'd. Can I use it?" I'll assume you meant something like "I want something with a nice license. Does Python have a nice license?" instead. If that's what you meant, you should check out the Python license [python.org] for yourself. Summary: it's a nice license. It's a certified Open Source license, imposes fewer restrictions than the GPL and is compatible with the GPL.

Python definitely doesn't fit into 5000 lines of source, though, so Perl1 might be a better bet. PyPy [codespeak.net] is pretty cool, if you're looking for something smallish and Pythonish.

Sorry if you're looking for something else and these comments turn out to be totally useless.

Share
twitter facebook
- More quick comments (Score:5, Informative)
  
  by DavidNWelton ( 142216 ) writes: on Thursday November 11, 2004 @05:57AM (#10785816) Homepage
  
  A cool site for people interested in languages and their design is: http://lambda-the-ultimate.org/
  
  I agree completely that only looking for GPL code is a mistaken approach, as the GPL is more restrictive than other licenses. You can almost always include BSD code in GPL, but not vice versa. In any case, the GPL is probably not an ideal license if you want your language to be very widely used.
  
  Thirdly, if you want to look at some nice source code and an interesting way of doing things, have a look at Tcl. It's C sources (modulo the regexp package which came from somewhere else) are some of the nicest I have read. Beautifully commented, and very clear to read. Also, Tcl is a smart approach for a language designed to be extensible.
  
  The article's author really needs to state his purpose, though... just for fun? To 'get famous'? For work? As a homework assignment?;-)
  
  Parent Share
  twitter facebook
  - Re:More quick comments (Score:2)
    
    by Khelder ( 34398 ) writes:
    
    If you're designing a new language, I don't see that the compiler and/or interpreter being GPL'ed would be a hindrance. For example, gcc is GPL'ed, but places no restrictions on what you do with code you compile using it.
    
    If, OTOH, you want to encourage people to modify the compiler/interpreter and sell *that*, then the GPL might not be the best choice.
    - Re:More quick comments (Score:2)
      
      by DavidNWelton ( 142216 ) writes:
      
      It has to do with economics. Languages are really something you want people to spread as widely as possible, because there are positive network externalities. And part of spreading things widely is also getting them used commercially...
      - Re:More quick comments (Score:2)
        
        by Khelder ( 34398 ) writes:
        
        I agree, but I think you missed my point: that if what he wants is for people to write programs in his language, choosing to use GPL for his compiler/interpreter needn't affect commercialization at all. I can write code, compile it with a GPLed compiler, gcc for example, and sell the resulting binary as a commercial product just fine, without having to release the source code for my program.
        
        If what the article author wants is for companies to develop the compiler/interpreter itself and sell *that*, then yo
- - Re:Two quick comments (Score:2)
    
    by poincaraux ( 114797 ) writes:
    
    That's a good point. If I could mod it up, I would .. my reply should get rated higher than an AC reply, though, so if the OP is still looking at this thread, maybe this'll catch his eye.
Object BF (Score:5, Funny)

by forkazoo ( 138186 ) writes: <wrosecrans@@@gmail...com> on Wednesday November 10, 2004 @11:44PM (#10784123) Homepage

You know, i was wondering, earlier today, if anybody was ever going to make an object oriented Brainfuck. Object BF (of B F++ if you prefer) would be a delightfully horrible lanfuage. And, since it is such a compact language (though I make no promises about programs written in it) you should have no problem acquianting yourself fully with the existing compiler in about the length of a coffe break.

Share
twitter facebook
- Re:Object BF (Score:1)
  
  by zrail ( 50290 ) writes:
  
  How would you suggest making a Turing Machine, for that is what BF is, object oriented?
  - Uh. (Score:3, Interesting)
    
    by warrax_666 ( 144623 ) writes:
    
    All Turing Complete languages (which BF is) are equivalent. That means that any OO Turing Complete language can be implemented in a non-OO Turing Complete language. Case in point: C++ compiles to assembly/machine code which is certainly not OO.
    
    How to implement an OO language on a Turing Machine: Do something equivalent to how OO is implemented in C, ie. store "function pointers" (tape addresses) along with the data and use indirection to call those every time a virtual method is needed.
    - Re:Uh. (Score:3, Interesting)
      
      by TwistedSquare ( 650445 ) writes:
      
      You may be interested in this guy [brown.edu]'s thoughts on the matter of Turing Completeness.
      - Quite interesting... (Score:2)
        
        by warrax_666 ( 144623 ) writes:
        
        ... but AFAICT it doesn't really counter anything I said about TC language equivalence. Granted, the languages might actually be supersets of (classical) TMs, but that's nitpicking... they're still computationally equivalent since their only non-TM mechanism is I/O and they would both posses equivalent I/O mechanisms.
        
        Re:Quite interesting... (Score:2)
        
        by TwistedSquare ( 650445 ) writes:
        
        That's true in terms of OO-language X vs plain language X. I'm not sure if BF has I/O but it is all nitpicking. Glad you found it interesting :-)
    - Re:Uh. (Score:3, Insightful)
      
      by keesh ( 202812 ) * writes:
      
      They're only 'equivalent' at a really basic level. Sure, from an academic "what can be calculated?" POV they're the same (although *not* when time complexity is considered), but for any practical purpose they are not equivalent. Compare how long it takes to write, say, a program which adds up ten numbers read from stdin in BF or INTERCAL with how long it takes to write the same program in Ruby or Haskell...
      - Practicality... (Score:2)
        
        by warrax_666 ( 144623 ) writes:
        
        ... don't enter into it (to paraphrase a famous skit w/John Cleese). :)
        
        The GP was suggesting that OO was somehow not implementable on a TM and I felt compelled to correct him/her/it.
        
        IIRC, it has also been shown that algorithms always stay within their "class" (P, NP, P-space, etc.) when transformed from a TC language to another TC language. So your observation about complexity is only half true in that an O(n) algorithm may turn into an O(n^10), but it will never turn into an exponential-time algorithm wh
        
        Re:Practicality... (Score:1)
        
        by zrail ( 50290 ) writes:
        
        I wasn't implying that it couldn't be done. I was just curious how someone might do it, as I had never thought about it before.
- Re:Object BF (Score:1)
  
  by PickyH3D ( 680158 ) writes:
  
  You could make your signature more nerdy by changing it to "three long days."
- Re:Object BF (Score:2)
  
  by shrykk ( 747039 ) writes:
  
  Uh, ++ increments twice in brainfuck. I think you mean bf+
  
  (Though in fact,
  Add one to brainfuck giving brainfuck+1
  would have the same effect (anything except +-,.>[] gets ignored by the compiler).
my $0.02 after a couple compiler classes (Score:5, Informative)

by blackcoot ( 124938 ) writes: on Wednesday November 10, 2004 @11:44PM (#10784126)
if you have never taken a compiler class before or written a compiler on your own, i suggest the following:
- while i encourage re-use, if your purpose is to learn how to write a compiler, don't extend someone else's. find a grammar for a language you're comfortable with (e.g. pascal) and start from there. you'll find that getting just plain pascal to compile properly will be quite a challenge. oo-ness just adds another layer of complexity to the compiler.
- to aid your debugging, you'll want to spend some time thinking up good ways of a) visualizing your parse tree, b) representing your IL in a human readable format, c) representing the entire state of your interpreter in human readable format. these just by themselves can be very challenging projects.
- acquire a copy of the dragon book [amazon.com]. this is only a starting point, you may also want to peruse some of andrew appel's compiler books (such as this one [amazon.com] or this one [amazon.com])
- lex/flex and bison/yacc are rather antiquated, you may want to check out terence parr's antlr [antlr.org] (formerly pccts) instead. this allows you to implement your compiler+interpreter in your language of choice, rather than being forced to use c. my compiler classes all required that we used lex/yacc, so that's what i did; however, i would have really liked to have the option of doing it all in java or c++.
- how you setup your symbol table will have a large determining factor in your success. i've used trees of hashtables in the past quite successfully (each node in the tree corresponding to a lexical context such that all symbols visible in the current scope are on the path between the present node and the root). i expect that extending this to support an OO language shouldn't be too hard. e.g: augment nodes representing a type to include back pointers to parent types. you will have to modify search to do lookups as appropriate.
there are several "toy" grammars out there which allow you to do useful stuff (recursion, 'interesting' data structures [i.e. self referential], etc.) without wading through a lot of useless cruft (implementing huge amounts of runtime support, for example). i'd go with one of those. once you're comfortable that you can make one of these learning languages work, then try to hack one of your own.

this all said, good luck! i am by no means an expert on compiler construction (worked on a custom in-house scripting language as an intern a couple years back and had to take compiler classes to satisfy breadth requirements for my m.s. c.s.) but i do hope this is a little bit useful.
Share
twitter facebook
- Re:my $0.02 after a couple compiler classes (Score:5, Interesting)
  
  by Profane MuthaFucka ( 574406 ) writes: <busheatskok@gmail.com> on Thursday November 11, 2004 @12:38AM (#10784529) Homepage Journal
  
  Wow, all excellent ideas. I will add just one more: if you can possibly manage it, use a garbage collected language, or make use of the Boehm collector. If you're using a bison/yacc approach, the structure you're working in can make proper allocation and deallocation pretty complicated.
  
  Parent Share
  twitter facebook
- Re:my $0.02 after a couple compiler classes (Score:2)
  
  by Haeleth ( 414428 ) writes:
  
  lex/flex and bison/yacc are rather antiquated, you may want to check out terence parr's antlr (formerly pccts) instead. this allows you to implement your compiler+interpreter in your language of choice, rather than being forced to use c. my compiler classes all required that we used lex/yacc, so that's what i did; however, i would have really liked to have the option of doing it all in java or c++.
  
  If I read the ANTLR pages correctly, it will only permit you to use your language of choice if your language
  - Re:my $0.02 after a couple compiler classes (Score:2)
    
    by blackcoot ( 124938 ) writes:
    
    apparently antlr does python (first entry) too. i stick by my claim that lex/bison are really antiquated (you have to do some majorly crufty and poorly documented stuff to make bison output thread safe, for example). lalr is a powerful approach to parsing. it's by no means the only approach, or, for that matter, necessarily the best approach. antlr is capable of compiling a superset of the languages that lalr tools (specifically yacc/bison) are capable of ( [antlr.org] see here [iecc.com]). does the average joe care about the dif
- Re:my $0.02 after a couple compiler classes (Score:2)
  
  by Khelder ( 34398 ) writes:
  
  I agree with all of the parent's comments, but wanted to put in another plug for antlr. I've used it for small projects and it works well. It has built-in support for traversing the parse tree, which lex/bison do not (at least, last time I used them), and it comes with a visualization of the parse tree, which is quite handy.
Licenses (Score:4, Informative)

by Mr.Ned ( 79679 ) writes: on Thursday November 11, 2004 @12:09AM (#10784304)

"Is Python out of the question for such a project, since it's not GPL?"

You seem to be concerned about the limitations the license puts on you. Python's license is less restrictive than the GPL - read about it (http://python.org/psf/license.html).

One of many things this means is that if you decide the Python License isn't restrictive enough for you, you can relicense the combination of Python plus your changes under the GPL, as long as you adhere to Python's license (leaving its copyright and other required information intact).

Share
twitter facebook
A small language for what? (Score:5, Informative)

by crmartin ( 98227 ) writes: on Thursday November 11, 2004 @12:11AM (#10784318)

You don't really say what sort of problems you're talking about or why you want to build the language.

If you just want to build a language that will teach you about programming languages, look at old fashioned Pascal [inf.ethz.ch] not Delphi or Kylix or even Turbo Pascal, but good old-fashioned Jensen and Wirth 1974 Pascal.

If you want to design a programming language, the best advice is to write some code in the proposed language. Remember Tony Hoare's rule, and keep it simple. Most programming languages, from Perl to Python to Java 5, suffer from being accumulations of features.

Have a look at Ruby, Modula-n, Oberon, and so forth.

If you're looking to learn lots about programming in general, think about things you want to do, and construct a lanaguage that does them. Icon is a nice example. Look at SNOBOL, if only because you'll appreciate the "five miles through the snow" stories we old farts tell.

Share
twitter facebook
- - Re:A small language for what? (Score:2)
    
    by crmartin ( 98227 ) writes:
    
    "Features" not "futures".
    
    But yeah, that's the tricky part. You need to both keep it simple and have what you need to do everything you want to do.
    
    Of course, that's also why you need to ask "what do you want to do with the language?"
    
    As far as formality goes, the problem -- and I speak as a confirmed and evangelical formal methodist -- with attempting to determine what you should include with a formal method is that formal methods can easily tell you if what you got is what you wanted, but they can't tell
Pascal and/or Basic are the standard (Score:1)

by rubee ( 826908 ) writes:

there are plenty of tiny implementations of either. if I were you I'd use a higher level parser/lexical analyzer generator than lex/flex/yacc/bison. also check out the 'Introduction to Compiler Construction With Unix' from prentice hall, sadly out of print, and 'Compilers' by Aho.
A few ideas (Score:5, Insightful)

by RevAaron ( 125240 ) writes: <revaaron AT hotmail DOT com> on Thursday November 11, 2004 @12:38AM (#10784538) Homepage

First, there are two kinds of small languages:
1. small languages like lua [lua.org], io [iolanguage.com], and scheme [mit.edu] that are small in the built-in libraries and in the total distro. These three are great places to start- both are small, OOPish, allow higher-order programming by passing classes, objects, functions and methods as objects.

2. Then there are languages that are big in some ways, but small in syntax. Some of these are easier to extend than so-called "little languages." The reason is usually that their syntax is small, in an isolated place, easy to get at, and meant to be modified. The two best examples for this are Smalltalk [smalltalk.org] and Lisp. Both of these languages satisfy your other requirements and really kick ass for extention. Unlike the above languages, the so-called little-languages, most Smalltalk and Lisp dialects have big, useful libraries. Unlike a big fat language like perl or C++, having a useful library doesn't mean that the language is a huge pain in the ass to extend.

Both Lisp and Smalltalk have a number of implementations. I am a big fan of Squeak Smalltalk [squeak.org], though systems like Little Smalltalk [smalltalk.org] or even GNU Smalltalk [gnu.org] maybe worth checking out.

A lot of people here have bad feelings about Lisp-like languages. It's a shame, since Scheme, ISLISP (OpenLisp is a great implementation) and Common Lisp are all *very* powerful languages. You can be quite productive with them once you get over the part about whining about parens. But Lisp may very well be the best option here, there is a long history of people writing custom-syntaxes and language extensions. Look up Common Lisp macros- power almost beyond comprehension, a lot of fun to play with, and with an elegance all its own.

There are examples of people writing a C-like syntaxes [umin.ac.jp] for various Scheme implementations. IIRC, Gambit-C [umontreal.ca] (a Scheme to C compiler) comes with one. On Cliki, there are a bunch of other alternative Scheme syntaxes [tunes.org] listed.

To, one of the big advantages to using a language in the second category is that syntax extension/modification is done in the language itself, rather than in C. With that comes the familiarity of the language you're creating and the other benefits you gain by using a high-level language like Smalltalk or Common Lisp.

Just some thoughts...

Share
twitter facebook
Try a compiled Lisp (Score:1, Interesting)

by Anonymous Coward writes:

You're building a compiler? A compiler requires a parser in the front and a code generator in the rear, tied together by some sort of syntax tree. Various language constructs will require transformations upon this syntax tree.
As it happens, there already exists a class of languages that are strong at manipulating syntax trees, and at writing parsers. Several of them also support dynamic compilation, meaning that your language implementation can choose when to stop dicking with the syntax tree and instead
Scheme is the way to go (Score:1)

by deusmorti ( 683366 ) writes:

Reason One, trivial to write a parser for the core language.

Reason Two, lots of nice literature available online. Check out Shriram Krishnamurthhi's text
Programming Languages: Applications and Interpretation on his website at http://www.cs.brown.edu/~sk [brown.edu]

Reason Three, its already conceptually an AST, so you can get involved in the more interesting work sooner
It depends on your goals (but usually Lisp) (Score:5, Informative)

by Piquan ( 49943 ) writes: on Thursday November 11, 2004 @01:58AM (#10785009)

I'm not really sure what your goal here is. Are you wanting to write a compiler for the sheer joy of writing a compiler? That's a good goal, for sure, and I recommend that every programmer write a compiler or two during their careers, or at least some interpreters.
On the other hand, maybe you want to extend an existing language because you need some specific language feature. Also a good goal, but I do want to caution you to evaluate existing languages first; you may find that some language does what you want, or makes it easy to write a language library that does what you want.
Or maybe you want to write an interpreter to script a bigger program. Then I'd say that you may be better off using something that's already there.
In the first case, if you want to learn how to write a compiler, I generally recommend writing Scheme, or some other simple Lisp. Scheme has advantages in that its parse tree representation is obvious (that's why Lisp looks like it does), the structure of an interpreter and of a compiler are quite similar, and it covers the fundamentals of compilers without burdening you with a bunch of cruft. (If this is your first compiler, you may want to leave out continuations and garbage collection for now.) The very excellent book Structure and Interpretation of Computer Programming [mit.edu] has what you need: chapter 4 is about writing interpreters (for Scheme, some modified Schemes, and Prolog), and chapter 5 is all about writing a compiler. All the code in SICP is in Scheme, but the book starts from the beginning with (+ 1 1).
SICP may be too academic for your taste, so you may prefer Paradigms of Artificial Intelligence Programming [norvig.com] instead. It uses Common Lisp, and has a little bit more practical feel that SICP. Chapter 22 is about writing a Scheme interpreter, and chapter 23 is about writing a Scheme compiler. Unlike SICP, PAIP doesn't cover the garbage collector. PAIP uses Common Lisp, and although it has enough of an introduction to be a "refresher" for somebody already familiar with Lisp, it's not really suitable for learning the language.
It's really simple to write Lisp in Lisp. Indeed, a month ago I wrote one for a comp.lang.lisp post [google.com], just to get a silly quine to work! That's why I keep talking about Lisp and Scheme: it's easy to do.
If Scheme isn't your thing, then Pascal may be a good alternative. Don't try to get fancy with heap allocation, pointers, objects, and other new add-ons; I'd start with plain old Wirth-designed Pascal, and get fancy later if you want. Pascal is really designed for a classroom setting. I have a dim memory of Ada being used for this purpose in some books, but I can't speak very authoratively there.
Of course, the definitive book on writing compilers is the Dragon Book [wikipedia.org]. But you may want to be familiar with some basic CS theory about FSMs [wikipedia.org] first.
Now, that's if you want to learn about compilers for the joy of learning about compilers. But what if you just need one particular language feature for your problem, and that's why you want to write a compiler? Well, then I'd suggest you make sure you've looked at a lot of different languages first. Some languages have surprising features that may let you write a small in-language library to do what you need, instead of needing to extend the compiler.
Lisp is a good candidate for this. John Foderaro once described Lisp as "a programmable programming language". You can alter the language to suit the problem at hand, instead of having to work the other way around. At work now, I use Lisp as a bridge between two very different programming languages because I can extend it in both directions to cover what I nee
Read the rest of this comment...

Share
twitter facebook
Lua (Score:4, Informative)

by rbright ( 54766 ) writes: on Thursday November 11, 2004 @02:00AM (#10785020)

I highly recommend Lua. It's a brilliant language. The code base is clean, portable, and easy to read.

http://www.lua.org [lua.org]

In fact, I recommend you not create a new language at all and just use Lua. It's a deceptively simple language that allows you to extend it through certain meta-constructs to become pretty much anything you need-- from simple data description to full Object-Orientation.

I once read through the entire dragon book with the intention of creating my own language; I gave it all up when I found Lua.

Good stuff.

Share
twitter facebook
- Re:Lua (Score:2)
  
  by Kz ( 4332 ) writes:
  
  I once read through the entire dragon book with the intention of creating my own language; I gave it all up when I found Lua.
  That's my case too. Lua is absolutely great!
  
  for a long time, i've been looking at Lisp and Scheme for more expressive power than usual languages. (in fact, my first C program was a stupidly slow lisp interpreter). but the syntax is just too awkward.
  
  i also looked at lazy evaluation languages, but couldn't wrap my mind on that model.
  
  a few months ago i found Lua... and it's perfe
- Re:Lua (Score:1)
  
  by thallgren ( 122316 ) writes:
  
  There's IO langauge [iolanguage.com] too. Prototype based, very small, looks very interesting to play with.
  Otherwise, I would recomment looking into Lisp or Haskell. There are nice books like Essentials of programming languages [amazon.com] dealing with Lisp/Scheme and how to implement a new language with them. No previous experience of Lisp/Scheme is required for this book.
  Another book which is more technical and performance oriented is Lisp In Small Pieces [amazon.com].
  Good luck.
  Regards, Tommy
Lisp (Score:1)

by avida ( 683037 ) writes:

Use lisp. Lisp has been called the programmable programming language. Think XSLT for code. You can define your own mini languages that work your way but are themselves valid lisp and can use all the cool lisp stuff.
OCaml (Score:3, Interesting)

by Markus Registrada ( 642224 ) writes: on Thursday November 11, 2004 @02:47AM (#10785200)

OCaml was pretty much designed specifically for this sort of thing. Every part of the language system accepts plug-ins for your private variants. If you can bear to use a parsing language that is far more powerful than yacc, you might consider it.
That said, interpreted languages are stupidly easy to do, so much so that it's hard to learn much -- they forgive every mistake until long after it's too late to fix it. (Witness Perl.) A high-performance language is a Good Thing not particularly because the programs run faster, although that's a nice side benefit, but rather because it enforces a fundamental rigor. There's no faking performance. When you face that kind of problem squarely, God speaks to you, and if you listen carefully enough you can learn deep truths.
But that's not for everybody. Most people have had any desire to learn anything, deep or otherwise, beaten out of them. By all means go for easy comfort, the economic vitality of the nation depends on people like you. Forget I said anything. Garbage collection, bytecodes, regexps, yeah!

Share
twitter facebook
Have A Look At LLVM (Score:3, Interesting)

by swagora ( 729659 ) writes: on Thursday November 11, 2004 @02:59AM (#10785246)

Andreas(R):

While you haven't provided enough details to comment in length, I do have some experience with what you're planning. A couple of years back I started a programming system (XPS) which was rather audacious in scope. After two years of working on it, I realized that I too needed a "back end" compilation system. I looked at various alternatives like GCC (too complex), research compilers (low quality), open source virtual machines like Mono (immature at the time). I was quite surprised when I looked at UIUC's LLVM compiler toolkit. I thought it would be just another half-baked compiler system from a University that never got finished when the Ph.D student left. Instead, I found a well designed, working, *toolkit* for compiler construction. While LLVM still lacks some features, its core is very solid and easily extensible. I've been working with it for a year now and its been quite a pleasure. Check it out at http://llvm.cs.uiuc.edu/ [uiuc.edu]

Share
twitter facebook
- Re:Do not design a programming language. (Score:2, Insightful)
  
  by radpole ( 39181 ) writes:
  
  Your right noone should ever try anything new. All the smart people have already done everything anyway. Oh yeah 640k is enough memory.
  - - Re:No, stupid. (Score:1, Troll)
      
      by radpole ( 39181 ) writes:
      
      Not quite clear yet.
      
      Just how is he supposed to aquire that knowledge. I guess the people you mentioned were born smart like yourself.
      
      He didn't say he was creating the next big thing in compiler tech to replace all others. It sounds like he is in school trying to learn something new.
      
      You must know my dad! He always said don't get wet until you know how to swim. Of course maybe this guy is jumping in the deep end and your just trying to save him.
      
      Signed,
      Clearly Stupid
Don't extend. Its overrated. (Score:5, Informative)

by Anonymous Coward writes: on Thursday November 11, 2004 @03:23AM (#10785329)

Honestly, its easier to write a recursive descent parser by hand for a programming language than you think, and interpreters are ridiculously easy unless you're worried about making it fast, which is way overrated too. It mattered with 640KB of RAM at 20MHz, but these days, its just stupid to care unless you notice its insanely slow.

First off, if you've not found this link: http://compilers.iecc.com/crenshaw/ [iecc.com], then I recommend you start with it. While its about writing a compiler, it really help make parsing much clearer.

Scheme is a good language to check out if you want to start with another design(a scheme interpreter can be written in a few hours, even in C, if you're slick, even if you're not, it would be short project to get 90%).

Some other reference material: Parsing Techniques [cs.vu.nl](free online). Also: Modern Compiler Design [cs.vu.nl] by the same guys and well worth the investment. Concepts, Techniques, and Models of Programming Languages [ucl.ac.be], teaches kernal theory of language design, and may open your mind to some other techniques you may not be aware of.

Checking out the archives on Lambda The Ultimate [lambda-the-ultimate.org] would be wise too. Also, if you're in Boston on December the 4th, you might check out the Lightweight Languages Workshop [mit.edu] at MIT.

Share
twitter facebook
- Re:Don't extend. Its overrated. (Score:2)
  
  by marcovje ( 205102 ) writes:
  
  Moreover getting some decent error messages out of RD parsers is easier.
- Battery life (Score:3, Interesting)
  
  by tepples ( 727027 ) writes:
  
  It mattered with 640KB of RAM at 20MHz
  
  How much CPU power is in inexpensive handheld devices again? I program for one that has 384 KB of RAM at 16.8 MHz. Wouldn't overclocking a handheld device drain the battery faster?
Use a simple parser (Score:3, Interesting)

by Frans Faase ( 648933 ) writes: on Thursday November 11, 2004 @03:51AM (#10785437) Homepage

Consider to use a parser that is easier and more powerful than yacc and lex. Have a look at IParse [planet.nl], a simple, small interpretting parser. The whole source is in a single 92 Kbyte file.
Forget about using a Virtual Machine. That is nice for speed, but it requires a lot of work. Beter make an interpretter that interprets the abstract program tree. I once started doing this for a JavaScript interpretter, but I never came to finish it.

Share
twitter facebook
Parrot (Score:5, Informative)

by Per Wigren ( 5315 ) writes: on Thursday November 11, 2004 @04:09AM (#10785493) Homepage

Maybe you should have a look at Parrot [parrotcode.org]? The CVS includes a lot of working and non-working language-implementations [perl.org] which you could have a look at..

I also recommend you to have a look at Lua [lua.org] which is a minimalistic yet beautiful language..

Share
twitter facebook
Compatible (Score:2)

by marcovje ( 205102 ) writes:

Choose something that requires some extra work to get compatible with some other language.

Don't even bother too much to write an initial compiler, and go straight to the horrible task of making something compatible.

While you are making an own language, dealing with compability issues will lead you to more pitfalls than a straight clean 1st order approach of some language than.
Parrot! (Score:5, Informative)

by cyberkreiger ( 463962 ) writes: on Thursday November 11, 2004 @05:56AM (#10785812) Homepage

Write a compiler for parrot [parrotcode.org].

Lots of people are doing that right now, and you could learn a lot from the included compilers for different languages, all in varying stages of completion.

Share
twitter facebook
- Very good suggestion (Score:3, Insightful)
  
  by turgid ( 580780 ) writes:
  
  FORTH [forth.org] is trivial to implement (in a few hundred lines rather than a few thousand) and can be compiled or interpreted. It is interactive, the parser is completely minimal (all tokens are seperated by spaces with few exceptions) and the compiler/interpreter/system can be extremely compact. The code also runs relatively quickly. FORTH was fairly popular in the days of 8-bit micros and 16-bit minis for these reasons, and is still used in microcontrollers and workstation firmware [openbios.org].
Don't be a pussy .. (Score:2)

by arhar ( 773548 ) writes:

.. use the only REAL hardcore programming language remaining - machine code
- Re:Don't be a pussy .. (Score:1)
  
  by mopslik ( 688435 ) writes:
  
  machine code
  
  Slacker. REAL programmers wave magnets over the disk to code.
Suggested reading/viewing (Score:2, Interesting)

by C Joe V ( 582438 ) writes:

Original poster: What, pray tell, is "compiler theory"? I'm a little perplexed that someone who knows something about "compiler theory" is asking Slashdot how to write a compiler. Most of the answers you will get here are from people who don't really know any "compiler theory".
People suggesting Lisp/Scheme: Sure, these languages are extensible, but any extension of Lisp/Scheme you can create with macros or whatever will still look like Lisp/Scheme. If the point of this exercise is to design one's own l

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Holy cow! (Score:5, Insightful)

Re:Holy cow! (Score:4, Informative)

Re:Holy cow! (Score:5, Interesting)

Re:Holy cow! (Score:2)

This is going to get me in trouble (Score:5, Informative)

Re:This is going to get me in trouble (Score:4, Interesting)

Re:This is going to get me in trouble (Score:3, Informative)

but Re:This is going to get me in trouble (Score:5, Informative)

Re:This is going to get me in trouble (Score:2)

Perl isn't just GPL. (Score:5, Informative)

Try a functional language for this (Score:5, Informative)

Rewrite! (Score:5, Insightful)

Why functional is better (Example) (Score:3, Informative)

Re:Why functional is better (Example) (Score:2)

Re:Why functional is better (Example) (Score:2)

Re:Why functional is better (Example) (Score:2)

Re:Rewrite! (Score:3, Insightful)

Two quick comments (Score:5, Insightful)

More quick comments (Score:5, Informative)

Re:More quick comments (Score:2)

Re:More quick comments (Score:2)

Re:More quick comments (Score:2)

Re:Two quick comments (Score:2)

Object BF (Score:5, Funny)

Re:Object BF (Score:1)

Uh. (Score:3, Interesting)

Re:Uh. (Score:3, Interesting)

Quite interesting... (Score:2)

Re:Quite interesting... (Score:2)

Re:Uh. (Score:3, Insightful)

Practicality... (Score:2)

Re:Practicality... (Score:1)

Re:Object BF (Score:1)

Re:Object BF (Score:2)

my $0.02 after a couple compiler classes (Score:5, Informative)

Re:my $0.02 after a couple compiler classes (Score:5, Interesting)

Re:my $0.02 after a couple compiler classes (Score:2)

Re:my $0.02 after a couple compiler classes (Score:2)

Re:my $0.02 after a couple compiler classes (Score:2)

Licenses (Score:4, Informative)

A small language for what? (Score:5, Informative)

Re:A small language for what? (Score:2)

Pascal and/or Basic are the standard (Score:1)

A few ideas (Score:5, Insightful)

Try a compiled Lisp (Score:1, Interesting)

Scheme is the way to go (Score:1)

It depends on your goals (but usually Lisp) (Score:5, Informative)

Lua (Score:4, Informative)

Re:Lua (Score:2)

Re:Lua (Score:1)

Lisp (Score:1)

OCaml (Score:3, Interesting)

Have A Look At LLVM (Score:3, Interesting)

Re:Do not design a programming language. (Score:2, Insightful)

Re:No, stupid. (Score:1, Troll)

Don't extend. Its overrated. (Score:5, Informative)

Re:Don't extend. Its overrated. (Score:2)

Battery life (Score:3, Interesting)

Use a simple parser (Score:3, Interesting)

Parrot (Score:5, Informative)

Compatible (Score:2)

Parrot! (Score:5, Informative)

Very good suggestion (Score:3, Insightful)

Don't be a pussy .. (Score:2)

Re:Don't be a pussy .. (Score:1)

Suggested reading/viewing (Score:2, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals