[Published in Open Source For You (OSFY) magazine, October 2013 edition.]
GNU Parallel is a tool for running jobs in parallel in a Bash environment. The job can be a single command or a script, with variable arguments. The simultaneous execution can occur on remote machines as well. Released under the GPLv3+ license, you can install it on Fedora using the following command:
$ sudo yum install parallel
After installation, you need to remove ’–tollef’ from the /etc/parallel/config file, if it is present. This option will be permanently removed in future releases.
GNU Parallel takes a command and a list of arguments for processing. The arguments are provided in the command line after the notation ’:::’, and the command is executed for each argument. For example:
$ parallel echo ::: alpha beta gamma
alpha
beta
gamma
You can pass multiple arguments to GNU parallel, and it will run the command for every combination of the input, as shown below:
$ parallel echo ::: 0 1 ::: 0 1
0 0
0 1
1 0
1 1
The order in the output may be different. The tool provides a number of replacement string options. The default string ‘{}’ represents the input:
$ parallel echo {} ::: /tmp
/tmp
The replacement string ‘{/}’ removes everything up to and including the last forward slash:
$ parallel echo {/} ::: /tmp/stdio.h
stdio.h
If you want to return the path only, use the ‘{//}’ string:
$ parallel echo {//} ::: /tmp/stdio.h
/tmp
The string ‘{.}’ removes any filename extension:
$ parallel echo {.} ::: /tmp/stdio.h
/tmp/stdio
The output of a GNU Parallel command may not necessarily be in the order in which the input arguments are listed. For example:
$ parallel sleep {}\; echo {} ::: 5 2 1 4 3
1
2
4
3
5
If you wish to enforce the order of execution, use the ’-k’ option, as shown below:
$ parallel -k sleep {}\; echo {} ::: 5 2 1 4 3
5
2
1
4
3
A test script, for example, may need to be run ‘N’ times for the same argument. This can be accomplished with the following code:
$ seq 10 | parallel -n0 echo "Hello, World"
Hello, World
Hello, World
Hello, World
Hello, World
Hello, World
Hello, World
Hello, World
Hello, World
Hello, World
Hello, World
The ’-n’ option represents the maximum number of arguments in the command line.
The commands that will get executed by GNU Parallel can be observed with the ’–dry-run’ option, as illustrated below:
$ parallel --dry-run -k sleep {}\; echo {} ::: 5 2 1 4 3
sleep 5; echo 5
sleep 2; echo 2
sleep 1; echo 1
sleep 4; echo 4
sleep 3; echo 3
The ’–eta’ option will give an estimate on the time it will take to complete a job:
$ parallel --eta -k sleep {}\; echo {} ::: 5 2 1 4 3
Computers / CPU cores / Max jobs to run
1:local / 4 / 4
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 5s 1left 1500avg local:1/4/100%/1.0s
2
1
4
3
ETA: 1s 0left 1.00avg local:0/5/100%/1.0s
Suppose you have a large number of log files that you wish to zip and archive, you can run the gzip command in Parallel, as shown below:
$ parallel gzip ::: *.log
To unzip them all, you can use the following command:
$ parallel gunzip ::: *.gz
The ‘convert’ command is useful to transform image files. High resolution images can be scaled to a lower resolution using the following command:
$ convert -resize 512x384 file.jpg file_web.jpg
If you have a large number of files that you wish to resize, you can parallelize the task, as shown below:
$ find . -name '*.jpg' | parallel convert -resize 512x384 {} {}_web.jpg
GNU Parallel with wget can help in parallel downloads of large Linux kernel releases, as shown below:
$ parallel wget ::: www.kernel.org/pub/linux/kernel/v3.x/linux-3.11.tar.xz \
www.kernel.org/pub/linux/kernel/v3.x/linux-3.10.10.tar.xz
The URLs can also be stored in a text file (“input.txt”), and passed as an argument to Parallel:
$ parallel -a input.txt wget
The file “input.txt” contains:
https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.11.tar.xz
https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.10.10.tar.xz
The downloaded kernel images can also be extracted in Parallel:
$ find . -name \*.tar.xz | parallel tar xvf
A ‘for’ loop in a Bash script can be parallelised. In the following script, the file sizes of all the text files are printed:
#!/bin/sh
for file in `ls *.txt`; do
ls -lh "$file"
done | cut -d' ' -f 5
The parallelized version is as follows:
$ ls *.txt | parallel "ls -lh {}" | cut -d' ' -f 5
The number of CPUs and cores in your system can be listed with GNU Parallel:
$ parallel --number-of-cpus
1
$ parallel --number-of-cores
4
The ’-j’ option specifies the number of jobs to be run in parallel. If the value 0 is given, GNU Parallel will try to start as many jobs as possible. The ‘+ N’ option with ’-j’ adds N jobs to the CPU cores. For example:
$ find . -type f -print | parallel -j+2 ls -l {}
The input to GNU parallel can also be provided in a tabular format. Suppose you want to run ping tests for different machines, you can have a text file with the first column indicating the ping count, and the second column listing the hostname or the IP address. For example:
$ cat hosts.txt
1 127.0.0.1
2 localhost
You can run the tests in parallel using the following code:
$ parallel -a hosts.txt --colsep ' ' ping -c {1} {2}
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.074 ms
--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.074/0.074/0.074/0.000 ms
PING localhost.localdomain (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=1 ttl=64 time=0.035 ms
64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=2 ttl=64 time=0.065 ms
--- localhost.localdomain ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.035/0.050/0.065/0.015 ms
GNU Parallel can also execute jobs on remote machines, for which you need to first test that ssh works:
$ SERVER1=localhost
$ ssh $SERVER1 echo "Eureka"
guest@localhost's password:
Eureka
You can then invoke commands or scripts to be run on SERVER1, as shown below:
$ parallel -S $SERVER1 echo "Eureka from " ::: $SERVER1
guest@localhost's password:
Eureka from localhost
Files can also be transferred to remote machines using the ’–transfer’ option. Rsync is used internally for the transfer. An example is shown below:
$ parallel -S $SERVER1 --transfer cat ::: /tmp/host.txt
guest@localhost's password:
1 127.0.0.1
2 localhost
Refer to the GNU Parallel tutorial and manual page for more options and examples.