Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Unix Operating Systems Software

Advanced Job Scheduling? 24

Kagato asks: "I'm trying to make my company's Unix boxes more mission critical in the area of job scheduling. Scheduling jobs in Unix has been around since the dawn of time. On most systems you have 'cron' and 'at' to provide most of your scheduling needs. But outside the basic world of 'do this at such time' there are a slew of commercial products that handle dependencies, failure routes, monitoring, dependent notification, etc. Commercial products of this type have been around for years. Is there anything like this available in the GNU and Open Source worlds? I've been looking at Freshmeat, SourceForge and Google. I've found the pickings for advanced scheduling are pretty slim."
This discussion has been archived. No new comments can be posted.

Advanced Job Scheduling?

Comments Filter:
  • Freshmeat? [freshmeat.net] SoucreForge? [sf.net] or Google? [google.com] Oh you have? Crap.
  • I'm trying to make my company's Unix boxes more mission critical in the area of job scheduling.

    ...use of the phrase "mission critical" :-)

  • Mission critical? (Score:3, Insightful)

    by CoderDevo ( 30602 ) <coderdevo@hotmail.com> on Wednesday November 27, 2002 @03:08AM (#4765596) Homepage
    There may be no open source products out there that match the functionality of the currently available commercial scheduling products.

    So here's what you do. Get a dollar figure from management that represents just how "mission critical" job scheduling is at your work. That number becomes your scheduling budget.

    If that number is too low to buy software, then I guess scheduling isn't all that critical at your business after all.
  • Scripting (Score:1, Informative)

    by Anonymous Coward
    Some of the stuff you talked about wouldn't be difficult to script at all. Once you knew what services you wanted to run and when, you could just script them to run and add all the error catching stuff in yourself. Each program has a startup script that fires it up, watches it for errors, etc, and reports back to the Master Poobah script. If something really goes wrong, the master script can page you at 2am. Too bad you wouldn't get paid for that kind off effort, other than your paycheck.
  • Job scheduling (Score:3, Interesting)

    by dhall ( 1252 ) on Wednesday November 27, 2002 @03:51AM (#4765678)
    Unfortunately I've found job scheduling commercial software to often be less reliable then cron/at jobs. There are a few nice features available to them that is not available in cron/at.

    1. the ability to list ancestors/descendant jobs, the first job(s) must complete before the next job is kicked off. Of course you must break up your job into smaller components.

    2. cross platform scheduling, the ability to schedule jobs on more then one platform. I'm sure there are plenty of ways to schedule for jobs to be kicked off on NT or what not, but what about the mainframe?

    3. central log maintence, if done correctly can keep the jobs in sync, which can be vitally important when you've got jobs that span your entire environment.

    I really wish there was a unix based solution that encompassed all of these. There's a probably a good reason as to why there isn't an open source/"free" alternative for this process. The people who need it are less likely to use a free product. You're dealing with people so entrenched in archaic business practices, that it is difficult sometimes to authorize the use perl in your environment without going through weeks of business jutification.

    It's easy to setup an existing framework to work correctly on Unix. Computer Associates has one. At times it seems to be the most bass-ackwards implementation I have seen, but then I have to remember it was originally designed for the mainframe.

  • A few options (Score:5, Informative)

    by muleboy ( 123760 ) on Wednesday November 27, 2002 @04:11AM (#4765713)
    I have been looking into this lately, and here are the options I have found:

    • Condor [wisc.edu] - seems to be the best free as in beer scheduler, but it's not free as in speech.
    • OpenPBS [openpbs.org] - This one is sort of Free, but it is being developed by a company that doesn't seem so sure it likes it that way. The code goes BSD after a couple of years, and they've been doing that for several years, yet they don't make the old (now BSD) versions available, and they make you register just to download.
    • Sun GridEngine [sunsource.net] - Free, and it looks pretty sweet. I couldn't get it to work on Debian, but people on the mailing list said they were using it with Debian.
    • Globus Toolkit [globus.org] - Not so sure about this one.
    • Maui [supercluster.org] - Scheduler system for supercomputers
    • OSCAR [sourceforge.net] - Sweet project from IBM to put together all the best Free tools for clustering! They are using the Maui scheduler in their system.

    What I would really like to see is a HOWTO that gives a good overview of scheduling and clustering. Everything I have found so far is not so good.

    • Re:A few options (Score:2, Informative)

      by d^2b ( 34992 )

      Hmm, I thought about moderating the parent up, but surely the original poster will read _all_ of the answers :-)

      Anyway, I wanted to give a vote for OpenPBS. It works pretty well, and the code is moderately ok (i.e., I could sit down and add some new features).

      It is true that the license [openpbs.org] is not Open Source (whomever) compliant, it only restricts your rights to redistribute commercially. For many people this is not an onerous restriction. Sun probably makes you register as well; they seem to like registration forms :->

      PBS can use the MAUI scheduler as well. One thing that PBS does, that condor does not, is support parallel jobs.

      Anyway, I don't hate it, which is more than I can say for a lot of software.

    • by crath ( 80215 )

      GNU Queue [gnuqueue.org] offers batch scheduling for clusters of computers; however, a cluster only needs to contain a single computer.

      One additional commercial tool we use where I work is Platform Computing's Load Sharing Facility [platform.com]. It works well, but it's expensive (read "over priced") and I suggest you try something else first.

      • Oh, yeah. I forgot it, but not because I haven't looked at it. In fact, I looked at it first. From looking at the manual, though, GNU Queue doesn't seem to be in the same league as Condor, OpenPBS, and Maui. For simple scheduling tasks it looked good though.
  • by Zocalo ( 252965 ) on Wednesday November 27, 2002 @05:55AM (#4765929) Homepage
    It depends upon what level of sophistication you are after, of course, but I've never had any problems getting things like this working with that old UNIX standby: shell scripts.

    Basically, what you need to do is use a shell script to wrap around the commands you are scheduling and call the shell script from crons instead. The shell script then takes responsibility for any error handling, email/SMS/pager notifications, failover, or whatever, based on return codes and error messages etc. I've usually found that for most sites it's possible to write a generic template script and a small set of support scripts that do the notifications and what not that cover >75% of crons with no major customisation beyond the exit code "case" statement and the command to be executed.

    • Yeah, but this can get to be pretty hairy if your needs are complex... I'm writing a perl script now, made to be extensible by writing your own jobs as modules - it should be pretty easy to have those call existing shell scripts, etc. It's going to have different criticality levels and have a conf file that defines dependencies, retry and state change intervals, suppress notifications on acknowledgement, automatic generation of a status page, persistent state across restarts, etc.

      Anyway, if it's code I feel doesn't suck, I'll make it available... I looked at the alternatives and didn't find precisely what I need. That is probably why there are so many options yet so many people who feel they aren't quite right - the needs can be really specific. I'm sure my solution will be great for some, but rotten for others.

  • I've found there is very little I can't do with cron and scripting - in fact I, like many others, have cron jobs that check up on other system processes.

    This all worked well until I had cron die on a SCO box. I eventually figured out what job screwed it up but that screwed up everything else that cron managed and left me feeling rather uneasy about relying on cron (I mean, if it can be killed by an errant script...).

    So...I've been considering launching cron from init with the respawn option to ensure that it stays running. Does anyone see a problem with this?
  • $ man anarcon

    ANACRON(8) Anacron Users' Manual ANACRON(8)

    NAME
    anacron - runs commands periodically

    SYNOPSIS
    anacron [-s] [-f] [-n] [-d] [-q] [-t anacrontab] [job] ...
    anacron -u [-t anacrontab] [job] ...
    anacron [-V|-h]

    DESCRIPTION
    Anacron can be used to execute commands periodically, with
    a frequency specified in days. Unlike cron(8), it does
    not assume that the machine is running continuously.
    Hence, it can be used on machines that aren't running 24
    hours a day, to control daily, weekly, and monthly jobs
    that are usually controlled by cron.

    When executed, Anacron reads a list of jobs from a config
    uration file, normally /etc/anacrontab (see
    anacrontab(5)). This file contains the list of jobs that
    Anacron controls. Each job entry specifies a period in
    days, a delay in minutes, a unique job identifier, and a
    shell command.

    For each job, Anacron checks whether this job has been
    executed in the last n days, where n is the period speci
    fied for that job. If not, Anacron runs the job's shell
    command, after waiting for the number of minutes specified
    as the delay parameter.

    After the command exits, Anacron records the date in a
    special timestamp file for that job, so it can know when
    to execute it again. Only the date is used for the time
    calculations. The hour is not used.

    When there are no more jobs to be run, Anacron exits.

    Anacron only considers jobs whose identifier, as specified
    in the anacrontab matches any of the job command-line
    arguments. The job arguments can be shell wildcard pat
    terns (be sure to protect them from your shell with ade
    quate quoting). Specifying no job arguments, is equiva
    lent to specifying "*" (That is, all jobs will be consid
    ered).

    Unless the -d option is given (see below), Anacron forks
    to the background when it starts, and the parent process
    exits immediately.

    Unless the -s or -n options are given, Anacron starts jobs
    immediately when their delay is over. The execution of
    different jobs is completely independent.

    If a job generates any output on its standard output or
    standard error, the output is mailed to the user running
    Anacron (usually root).

    Informative messages about what Anacron is doing are sent
    to syslogd(8) under facility cron, priority notice. Error
    messages are sent at priority error.

    "Active" jobs (i.e. jobs that Anacron already decided to
    run and now wait for their delay to pass, and jobs that
    are currently being executed by Anacron), are "locked", so
    that other copies of Anacron won't run them at the same
    time.

    OPTIONS
    -f Force execution of the jobs, ignoring the times
    tamps.

    -u Only update the timestamps of the jobs, to the cur
    rent date, but don't run anything.

    -s Serialize execution of jobs. Anacron will not
    start a new job before the previous one finished.

    -n Run jobs now. Ignore the delay specifications in
    the /etc/anacrontab file. This options implies -s.
    -d Don't fork to the background. In this mode,
    Anacron will output informational messages to stan
    dard error, as well as to syslog. The output of
    jobs is mailed as usual.

    -q Suppress messages to standard error. Only applica
    ble with -d.

    -t anacrontab
    Use specified anacrontab, rather than the default

    -V Print version information, and exit.

    -h Print short usage message, and exit.

    SIGNALS
    After receiving a SIGUSR1 signal, Anacron waits for run
    ning jobs, if any, to finish and then exits. This can be
    used to stop Anacron cleanly.

    NOTES
    Make sure that the time-zone is set correctly before
    Anacron is started. (The time-zone affects the date).
    This is usually accomplished by setting the TZ environment
    variable, or by installing a /usr/lib/zoneinfo/localtime
    file. See tzset(3) for more information.

    FILES /etc/anacrontab
    Contains specifications of jobs. See anacrontab(5)
    for a complete description. /var/spool/anacron
    This directory is used by Anacron for storing
    timestamp files.

  • I'm trying to make my company's Unix boxes more mission critical in the area of job scheduling.

    ...way to use the phrase "mission critical" :-)

For God's sake, stop researching for a while and begin to think!

Working...