Ben is a simple and versatile batch job scheduler. It lets
you run a queue of jobs, in parallel, on multiple machines. It comes in
a single executable, ben
, and does not require any
configuration, only a working ssh setup.
By default, ben relies on Unix-domain sockets and
piggy-backs on file permissions for security. It uses ssh
forwarding under the hood (ssh -L
and ssh -R
)
for networking. As an alternative, ben can also use TCP/IP
sockets directly1. Ben is free software
(GPLv3) written in C with no dependencies.
Say we want to transcode a hundred mp4
/H.264 videos
(v00.mp4
to v99.mp4
) into
ogv
/Theora (v00.ogv
to v99.ogv
).
We want to run commands like:
$ ffmpeg -i v00.mp4 v00.ogv
Now, transcoding takes time, and we assume for this example that
ffmpeg
’s encoder is singlethreaded only (this is actually
true for the Theora encoder in the current default build of
ffmpeg
). Therefore, we want to run multiple instances in
parallel.
We can do it as follows. First, we start the ben server. It will maintain the job queue, dispatch jobs and handle coordination.
$ ben server -d
The -d
option puts the server “in the background”, so
that we get our command prompt back. We can omit it, and we will see log
information displayed in the terminal. Next, we start a client. The
client will take jobs off the queue (first-in-first-out) and run them.
The server and client are separate processes because, as we will see
later, they can be run on distinct machines.
$ ben client -n 4 -d
The -d
option has the same meaning as for the server, it
puts the client “in the background”. The -n 4
option
specifies that we allow up to 4 jobs to run simultaneously. We are now
ready to transcode. Let us start with the first six videos:
$ ben add -c 'ffmpeg -i $job.mp4 $job.ogv' -j v00 v01 v02 v03 v04 v05
The -c
option specifies the command to run. Each command
is run with the environment variable job
set to the job
name, so we can use $job
to refer to it. The output of the
jobs (the output of ffmpeg
) is stored in files named after
the jobs: stdout
in $job.out
and
stderr
in $job.log
. So we will get 12 files
v00.out
, v00.log
, v01.out
,
v01.log
, etc.
Here, we added all six jobs in a single command, but we could have
added them equivalently with multiple ben add
commands, for
example one at a time, or two by two. Jobs can be added to the queue at
any time. As soon as some client is available, it will run jobs from the
queue. In this example, the first four will start running immediately.
As soon as one of them is finished, the fifth will start.
It is often more convenient to store the job command and the job
names in files instead of passing them as arguments to
ben add
. Many ben options have a capital-letter
variant that lets you do exactly that. The above ben add
command can thus be replaced by:
$ echo 'ffmpeg -i $job.mp4 $job.ogv' > command.sh
$ echo 'v00 v01 v02 v03 v04 v05' > job-list.txt
$ ben add -C command.sh -J job-list.txt
Of course, we want job-list.txt
to contain our 100
videos not just the first six. For example, we could generate the
appropriate file as follows:
$ basename -s .mp4 v*.mp4 > job-list.txt
Four parallel jobs is still not much. Assume that all the previous
commands were run as user user-A
on
machine-A.host.com
. Furthermore, we have ssh
access as user user-B
to a larger computer,
machine-B.host.com
. We can start a client there, either
remotely:
[user-A@machine-A ~]$ ben client -r user-B@machine-B.host.com -n 48 -d
or by first logging into it, then contacting machine-A
from there:
[user-A@machine-A ~]$ ssh user-B@machine-B.host.com
[user-B@machine-B ~]$ ben client -f user-A@machine-A.host.com -n 48 -d
Note that the latter is more robust, because remote-starting a client
needs ben to be installed on the remote host in a specific
way2. Ben uses ssh
under the hood, so a password may be prompted. We can now check that
everything worked fine with the command ben nodes
:
[user-A@machine-A ~]$ ben nodes
# node R P
0 machine-A 0 4
1 machine-B 0 48
2 machine-A: ben nodes - C
We can see that we have two clients connected, which can run up to 4
and 48 simultaneous jobs, respectively. The last line indicates the
connection that ben nodes
is currently using to communicate
with the server; it is a “control” client that does not execute jobs. If
we proceed to add the first 6 jobs, we will see that the clients get
busy (the R
column gives the number of running jobs).
[user-A@machine-A ~]$ ben nodes
# node R P
0 machine-A 3 4
1 machine-B 3 48
4 machine-A: ben nodes - C
We can also display the queue with the command
ben list
:
[user-A@machine-A ~]$ ben list
# id dir job S node duration
0 v00 r 0 machine-A >00:00:27
2 v02 r 0 machine-A >00:00:27
4 v04 r 0 machine-A >00:00:27
1 v01 r 1 machine-B >00:00:27
3 v03 r 1 machine-B >00:00:27
5 v05 r 1 machine-B >00:00:27
After adding jobs, one can remove them with ben rm
. If
we remove a job that is currently running, it is stopped and its
(partial) output files are removed. If it is completed, the output is
preserved.
One can dynamically change the maximum number of simultaneous jobs on
a client with ben scale
. If the new maximum number exceeds
the current number of running jobs on that client, some jobs will be
interrupted (and re-queued3), unless the
--retire
option is specified. In the latter case, the
client will not accept new jobs until its load falls below the new
maximum, but no jobs will be interrupted. Similarly,
ben kill
, which disconnects a client, also has a
--retire
option, letting currently running jobs finish
beforehand.
Another useful command is ben exec
. It queues one
special “sync” job per client. A each of those sync jobs can only run on
its specified client, and cannot be run simultaneously with any other
job. Sync jobs are useful for scheduling code updates or recompilations,
or for getting notified when all jobs queued previously are completed.
Unlike ben add
, ben exec
hangs until the sync
jobs are run, and it displays their output. However, it can be
interrupted (for example with control+c
), without affecting
the sync jobs: only their output is discarded.
See the manual for more detailed info.