Process Substitution - Linux Expert Tip
Process Substitution in Bash
So I am sure you have all heard of, and used, command substitution and parameter substitution or expansion but process substitution?
Process substitution allows you to send stdout of one process to stdin of another process. Bah you say! This is just piping stdout to stdin and can be done simply with pipes (|). What is so awesome about chaining together standard streams?
Yes, it is true that pipes can be used to pump the stdout of one process to stdin of another and is trivial to implement, but what if you have a command that takes two inputs not just one? Or if you need to branch of processing into two different pipelines i,e duplicate the intermediate results on stdout and process them differently from then on?
Since you can only redirect one stream's stdout or stdin piping won't help. What happens if the command reads from or writes to a file only and not stdin/stdout? In this case you are left with creating temporary files to hold the intermediate output for consumption by subsequent commands. But what is I told you there is an easier way?
Process substitution allows the stdout of one command to appear as a file for consumption to subsequent commands, or allows a command that expects to write to a file to write to the stdin of a subsequent command instead.
Here is an example if we do not use process substitution. The diff command takes two files as input and outputs the difference between the two files. Lets say we wanted to get the difference between two directories. We would need to do something like this:
ls -l /home/tux > file1 ls -l /home/tuxbk > file2 diff -u file1 file2
We couldn’t just pipe the commands together like so:
ls -l /home/tux | ls /home/tuxbk | diff -u
But we can do this all in one line with process substitution
First lets look at the general process substitution syntax. The general syntax for process substitution is <(command) for reading from a command's stdout instead of a file and >(commands) for writing to a processes stdin instead of a file. Note: there is no space between the angle bracket and the left brace! Let first looks at a simple, and useless, example:
We can run wc as follows:
ls -l | wc
Outputs:
168 1512 13430
The above can be done as follow using process substitution:
wc <(ls -l)
Outputs
168 1512 13430 /dev/fd/63
The results are the same but notice the file descriptor handle (/dev/fs/63) that is displayed in the output for the substitution syntax. You can think of it as process substitution creating a temporary file to handle the intermediate results for consumption by a later command This (/dev/fd63) is the intermediate file created to send the results between two processes.
So now lets looks at something more interesting. I introduced the diff command above to display the changes between two directories. with process substitution we can now do this in one line:
diff -u <(ls -l /home/tux) <(ls -l /home/tuxbk)
Above we have replaced the two file descriptors, which the diff command will read from, with the output of two processes.
The power of process substitution can be further demonstrated with the join command. The join command takes two sorted files as input and joins matching lines on the first to lines in the 2nd file, discarding lines which do not match.
Armed with this we could join the output of the top command to the output of the iotop command. Note if you are using sudo with a password you will need to prime the cache with the users password to avoid an error when being prompted on the first run. You will also have to install iotop if you don't have it already.
join <(top -b -n 1 | sed '1,7d'| sort -n) <(sudo iotop -P -b -n 1 | sed '1,3d' | sort -n)
The above command matches pid from top with pids from iotop and combines the output. There are simpler ways of doing this with sar but it does illustrate the point quite nicely.
And the best place to use process substitution is with the tee command. Traditionally we use the tee command to pipe intermediate output to a file and then continue to process the output down our pipeline but with process substitution we can do even better!
With tee and process substitution we can split output from a series of commands into two parallel streams of execution. First a simple example of the tee command.
grep -i "error" /var/log/syslog | tee errors.txt | grep apache > apache.error
The above simply greps for the pattern "error" and then saves all lines to errors.txt and then we extract just those error lines that relate to apache. But what if we really wanted to extract errors for mysql into a separate file as well as the apache errors? We can use process substitution:
grep -i "error" /var/log/syslog | tee >(grep mysql >mysql.error) | grep apache > apache.error
Another example to calculate MD5Sums and SHA256Sums of files in a directory
find ./ -type f | tee >(xargs -n1 md5sum >md5sums.txt)| xargs -n1 sha256sum >sha256sums.txt
You will notice that we have redirected stdout in the tee command to a file. If you didn't do this then the stdout of the process substitution commands would be piped to the subsequent commands too - which mean you will have a fork and join kind of pipeline!
Isn't that awesome? Now go forth and do magic in the world!