Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
News

Are There Perl Optimization Guides? 11

ara818 asks: "I have written a 4,000-line personal web assistant using Perl. After getting everything to work I am now working on making the code run faster. The problem I keep running into though is that there are so many ways to do the same thing in Perl that I don't know which is faster. Right now, I am working on intuition but I'd really like a site or book that could give me at least a few pointers or some guidelines. Is there any such resource available for Perl, or for that matter, other popular programming languages?"
This discussion has been archived. No new comments can be posted.

Are There Perl Optimization Guides?

Comments Filter:
  • The mod_perl guide is an excellent resource to get you started on. A lot of the performance tidbits are not specific to mod_perl.

    http://perl.apache.org/guide/performa nce.html [apache.org]

  • Programming Perl [oreilly.com] has a section called "Efficiency" in the "Other Oddments" chapter, which contains many useful pointers for time and space efficiency.
  • Aside from Programming Perl, which was mentioned earlier, another book, that focuses on more some more obscure optimizations, is "Advanced Perl Programming", by Sriram Srinivasan. O'Reilly, of course.
  • I'm not sure how well-developed the thing is yet, but if it's in a workable state, would compiling some parts of your program help? (At least a bit?)
  • A lot of times, programs compute values redundantly. This is unnecessary and costly. The solution is to "memoize", or save the results of the computations in a local variable. Accessing a variable is a lot faster than recomputing its value!

    Also, be sure to avoid overloading the garbage collector. I know that a lot of Java (another garbage collected language) programs needlessly allocate String objects by using the concatenation operator. Find out a little more about how Perl's GC works and you'll probably see that your code is working it a bit too hard at least someplace.


    ~wog

  • I mean, Perl is reasonably fast most of the time. Is there a real need to optimize? That's the first question you need to ask yourself. If the answer is yes, figure out what's slowing it down. If the algorithms you're using are good, and yet the code is still too slow for acceptable performance in Perl, try to find a standard Perl module (or something on CPAN) that's written in C that does what you want. If that's not avaiable either, write it in <favorite compiled language> (C, C++, Ada95, Objective C, whatever floats your boat) and call it from the shell (be careful about tainted paths, though) - or if you're ambitious, learn SWIG or XS and make a Perl module (then submit it to CPAN!)

  • by QBasic_Dude ( 196998 ) on Thursday June 08, 2000 @11:30AM (#1016770) Homepage
    This is mostly from Chapter 8 of Programming Perl.
    • Use hashes instead of linear searches. Instead of iterating over @keywords to see if $_ is a keyword, construct a hash with it:
      my %keywords;
      for (@keywords) {
      ++$keywords{$_};
      }
      Then test $keywords{$_} for a nonzero value to see if $_ is a keyword.
    • Consider using foreach, shift, or splice rather than subscripting.
    • Use use integer.
    • Avoid goto
    • Avoid printf if print will work
    • Avoid $&, $`, and $'
    • Avod using eval on a string. Eval of a string forces recompilation every time the program is ran. In particular, symbolic references instead fo using eval to to construct variable names: ${$pkg . '::' . $varname} = &{ "fix_" . $varname}($pkg)
    • Avoid eval inside a loop. Put the loop into eval instead, to avoid redundant recompilations of the code.
    • Avoid run-time-compiled patterns, that is, /$pattern/. Use the /pattern/o (once only) pattern modifier to avoid pattern recompilations when the pattern doesn't change over the life of the process. For patterns that change occasionally, you can use the fact that a null pattern refers back to the previous pattern, like this:
      "foundstring" =~ /$currentpattern/; # Dummy match, must suceed
      while () {
      print if //;
      }
    • Short-circuit alternation is often faster than the corresponding regular expressions. So:
      print if /one-hump/ || /two/;
      is likely to be faster than:
      print if /one-hump|two/;
      at least for certain values of one-hump and two. This is because the optimizer likes to hoist ceertain simple matching operations up into higher parts of the syntax tree and do very fast matching with a Boyer-Moore algorithm. Complicated patterns defeat this.
    • Reject common cases early with next if inside a loop. As with simple regular expressions, the optimizer likes this. You can typically discard comment lines and blank lines even before you do a split or chop:
      while () {
      next if /^#/;
      next if /^$/;
      chop;
      @line = split(/,/);
    • Avoid regular expressions with many quantifiers, or with big {m,n} numbers on parenthesized expressions.
    • Maximize length of any non-optional literal strings in regular expressions. This is counterintuitive, but longer patterns often match faster than shorter patterns. That's because the optimizer looks for constant strings and hands them off to a Boyer-Moore search, which benefits from longer strings. Compile your pattern with the -Dr debugging switch to see what Perl thinks the longest literal is.
    • Avoid expensive subroutine calls in tight loops.
    • Avoid getc, use sysread instead (for single-character I/O only). . To get all the non-dot files within a directory, say something like this:
      opendir(DIR, ".");
      @files = sort grep(!/^\./, readdir(DIR));
      closedir(DIR);
    • Avoid frequent substr on long strings
    • Use pack and unpack instead of multiple substr invocations.
    • Use substr as an lvalue rather than concatenating substrings.
    • Use s/// rather than concatenating substrings.
    • Use modifiers and equivalent and and or, instead of full-blown conditionals. Statement modifiers and logical operators avoid the overhead of entering and leaving a block. They can often be more readable too.
    • Use $foo = $a || $b || $c instead of:
      if ($a) { $foo = $a; }
      elsif ($b) { $foo = $b; }
      elsif ($c) { $foo = $c; }
    • Set default values with $pi ||= 3;
    • Don't test things you know won't match. Use last or elsif to avoid falling through to the next case in your switch statement.
    • Use special operators like study, logical string operations, unpack 'u' and pack '%' formats.
    • Beware of the tail wagging the dog. Misresembling ()[0] and 0 .. 2000000 can cause Perl much unnecessary work. In accord with UNIX philsophy, Perl gives you enough rope to hang yourself.
    • Factor operations out of loops.
    • Slinging strings can be faster than slinging arrays.
    • my variables are normally faster than local variables.
    • tr/abc//d is faster than s/[abc]//g
    • Print with a comma separator may be faster than concatenating strings.
    • Prefer join("", ...) to a series of concatenated strings.
    • Split on a fixed string is generally split on a pattern. That is, use split(/ /, ...) rather than split(/ +, ...).
    • system("mkdir ...") may be fsater on multiple directories if mkdir(2) isn't available.
    • Cache entries from passwd and group and so on.
    • Avoid unnecessary system calls.
    • Avoid unecessary system() calls.
    • Keep track of your working directory rather than calling pwd each time.
    • Avoid shell matacharacters in commands -- pass lists to system and exe where appropriate.
    • Set the sticky bit on the Perl interpreter on machines without demand paging. chmod +t /usr/bin/perl
    • Using defaults doesn't make your program faster
    The same chapter also lists Space Efficiency, Programmer Efficiency, Maintainer Efficincy, Porter Efficiency, and User Efficinecy. Each section contradicts each other.

  • Do not shift and do not increment integers for cycles unless necessary (perl integer math sucks and shift loads the garbage collector). Foreach is faster.

    Do not compare and substr, regexp is faster. If anything can be formulated as a regexp, regexp it. That what perl is good at. Have fun.

  • This book spends a lot of pages down in the hardware level of optimization, but covers a lot of optimizations in other programming languages, particularly C. The author also discusses some general principles of optimization: Concentrate your optimization efforts on the code which is used the most, such as inner loops; consider alternative representations of your data, etc.
  • I just had a look at your code, and you're using the same "technologies" I used at first when I started a VHS covers database in September. Since then I'm almost 10x faster, and I also much less memory (you can have a look at the result : http://www.moviecovers.com/ [moviecovers.com]). The two main changes were :
    • I switched from CGI interface to Embperl environment (I could have used plain mod_perl instead, but I've found Embperl, which runs on top of mod_perl, to be easier to program with). Keeping compiled Perl script in memory gives a big speed boost.
    • I switched from Berkeley BTree databases to MySQL. It's much faster, simplifies locking problems, and reduces memory usage by a good factor. Optimizing SQL requests can also be much more rewarding than many Perl optimizations.
  • Write a good sub debug that prints it's argument to a log file together with lots of timing info (like how much user and system time has elapsed).

    Sprinkle liberally through the code.

    Study the results. You'll find that often what you though was slow isn't and what you thought was fast is slow. In one case a CGI I saw was taking 2 seconds to just load all the libs that were being requird even though most were not needed!

    In another case we found that the fastest way to beuild a free text endgine was to munge the input into a regex and use the builtin grep on the data (cut the search from 500ms to sub 100ms of CPU).

    Overall it's very application dependent - most of the optimizations (apart from the regex stuff) are about good coding practice as much as anyting else.

Software production is assumed to be a line function, but it is run like a staff function. -- Paul Licker

Working...