Advanced Job Scheduling? 24
Kagato asks: "I'm trying to make my company's Unix boxes more mission critical in the area of job scheduling. Scheduling jobs in Unix has been around since the dawn of time. On most systems you have 'cron' and 'at' to provide most of your scheduling needs. But outside the basic world of 'do this at such time' there are a slew of commercial products that handle dependencies, failure routes, monitoring, dependent notification, etc. Commercial products of this type have been around for years. Is there anything like this available in the GNU and Open Source worlds? I've been looking at Freshmeat, SourceForge and Google. I've found the pickings for advanced scheduling are pretty slim."
Have you tried? (Score:2, Funny)
Re:Have you tried? (Score:1)
Re:What commercial products? (Score:3, Informative)
ActiveBatch32 [advsyscon.com]
UC4 [uc4.com]
Unicenter Autosys Job Management [ca.com]
Control-M [bmc.com]
I wish this was a post back in August.
Good Luck!
Re:What commercial products? (Score:3, Informative)
UNIX dot COM has Flash? (Score:1)
UNIX no flash [unix.com]
What an interesting... (Score:2)
Mission critical? (Score:3, Insightful)
So here's what you do. Get a dollar figure from management that represents just how "mission critical" job scheduling is at your work. That number becomes your scheduling budget.
If that number is too low to buy software, then I guess scheduling isn't all that critical at your business after all.
Scripting (Score:1, Informative)
Job scheduling (Score:3, Interesting)
1. the ability to list ancestors/descendant jobs, the first job(s) must complete before the next job is kicked off. Of course you must break up your job into smaller components.
2. cross platform scheduling, the ability to schedule jobs on more then one platform. I'm sure there are plenty of ways to schedule for jobs to be kicked off on NT or what not, but what about the mainframe?
3. central log maintence, if done correctly can keep the jobs in sync, which can be vitally important when you've got jobs that span your entire environment.
I really wish there was a unix based solution that encompassed all of these. There's a probably a good reason as to why there isn't an open source/"free" alternative for this process. The people who need it are less likely to use a free product. You're dealing with people so entrenched in archaic business practices, that it is difficult sometimes to authorize the use perl in your environment without going through weeks of business jutification.
It's easy to setup an existing framework to work correctly on Unix. Computer Associates has one. At times it seems to be the most bass-ackwards implementation I have seen, but then I have to remember it was originally designed for the mainframe.
A few options (Score:5, Informative)
What I would really like to see is a HOWTO that gives a good overview of scheduling and clustering. Everything I have found so far is not so good.
Re:A few options (Score:2, Informative)
Hmm, I thought about moderating the parent up, but surely the original poster will read _all_ of the answers :-)
Anyway, I wanted to give a vote for OpenPBS. It works pretty well, and the code is moderately ok (i.e., I could sit down and add some new features).
It is true that the license [openpbs.org] is not Open Source (whomever) compliant, it only restricts your rights to redistribute commercially. For many people this is not an onerous restriction. Sun probably makes you register as well; they seem to like registration forms :->
PBS can use the MAUI scheduler as well. One thing that PBS does, that condor does not, is support parallel jobs.
Anyway, I don't hate it, which is more than I can say for a lot of software.
You missed one: GNU Queue (Score:3, Informative)
GNU Queue [gnuqueue.org] offers batch scheduling for clusters of computers; however, a cluster only needs to contain a single computer.
One additional commercial tool we use where I work is Platform Computing's Load Sharing Facility [platform.com]. It works well, but it's expensive (read "over priced") and I suggest you try something else first.
Re:You missed one: GNU Queue (Score:1)
Level of sophistication required? (Score:3, Insightful)
Basically, what you need to do is use a shell script to wrap around the commands you are scheduling and call the shell script from crons instead. The shell script then takes responsibility for any error handling, email/SMS/pager notifications, failover, or whatever, based on return codes and error messages etc. I've usually found that for most sites it's possible to write a generic template script and a small set of support scripts that do the notifications and what not that cover >75% of crons with no major customisation beyond the exit code "case" statement and the command to be executed.
Re:Level of sophistication required? (Score:2)
Anyway, if it's code I feel doesn't suck, I'll make it available... I looked at the alternatives and didn't find precisely what I need. That is probably why there are so many options yet so many people who feel they aren't quite right - the needs can be really specific. I'm sure my solution will be great for some, but rotten for others.
Protecting cron (Score:2)
This all worked well until I had cron die on a SCO box. I eventually figured out what job screwed it up but that screwed up everything else that cron managed and left me feeling rather uneasy about relying on cron (I mean, if it can be killed by an errant script...).
So...I've been considering launching cron from init with the respawn option to ensure that it stays running. Does anyone see a problem with this?
man anarcon (Score:1)
ANACRON(8) Anacron Users' Manual ANACRON(8)
NAME
anacron - runs commands periodically
SYNOPSIS
anacron [-s] [-f] [-n] [-d] [-q] [-t anacrontab] [job]
anacron -u [-t anacrontab] [job]
anacron [-V|-h]
DESCRIPTION
Anacron can be used to execute commands periodically, with
a frequency specified in days. Unlike cron(8), it does
not assume that the machine is running continuously.
Hence, it can be used on machines that aren't running 24
hours a day, to control daily, weekly, and monthly jobs
that are usually controlled by cron.
When executed, Anacron reads a list of jobs from a config
uration file, normally
anacrontab(5)). This file contains the list of jobs that
Anacron controls. Each job entry specifies a period in
days, a delay in minutes, a unique job identifier, and a
shell command.
For each job, Anacron checks whether this job has been
executed in the last n days, where n is the period speci
fied for that job. If not, Anacron runs the job's shell
command, after waiting for the number of minutes specified
as the delay parameter.
After the command exits, Anacron records the date in a
special timestamp file for that job, so it can know when
to execute it again. Only the date is used for the time
calculations. The hour is not used.
When there are no more jobs to be run, Anacron exits.
Anacron only considers jobs whose identifier, as specified
in the anacrontab matches any of the job command-line
arguments. The job arguments can be shell wildcard pat
terns (be sure to protect them from your shell with ade
quate quoting). Specifying no job arguments, is equiva
lent to specifying "*" (That is, all jobs will be consid
ered).
Unless the -d option is given (see below), Anacron forks
to the background when it starts, and the parent process
exits immediately.
Unless the -s or -n options are given, Anacron starts jobs
immediately when their delay is over. The execution of
different jobs is completely independent.
If a job generates any output on its standard output or
standard error, the output is mailed to the user running
Anacron (usually root).
Informative messages about what Anacron is doing are sent
to syslogd(8) under facility cron, priority notice. Error
messages are sent at priority error.
"Active" jobs (i.e. jobs that Anacron already decided to
run and now wait for their delay to pass, and jobs that
are currently being executed by Anacron), are "locked", so
that other copies of Anacron won't run them at the same
time.
OPTIONS
-f Force execution of the jobs, ignoring the times
tamps.
-u Only update the timestamps of the jobs, to the cur
rent date, but don't run anything.
-s Serialize execution of jobs. Anacron will not
start a new job before the previous one finished.
-n Run jobs now. Ignore the delay specifications in
the
-d Don't fork to the background. In this mode,
Anacron will output informational messages to stan
dard error, as well as to syslog. The output of
jobs is mailed as usual.
-q Suppress messages to standard error. Only applica
ble with -d.
-t anacrontab
Use specified anacrontab, rather than the default
-V Print version information, and exit.
-h Print short usage message, and exit.
SIGNALS
After receiving a SIGUSR1 signal, Anacron waits for run
ning jobs, if any, to finish and then exits. This can be
used to stop Anacron cleanly.
NOTES
Make sure that the time-zone is set correctly before
Anacron is started. (The time-zone affects the date).
This is usually accomplished by setting the TZ environment
variable, or by installing a
file. See tzset(3) for more information.
FILES
Contains specifications of jobs. See anacrontab(5)
for a complete description.
This directory is used by Anacron for storing
timestamp files.
what an interesting... (Score:1)
...way to use the phrase "mission critical" :-)