Shell Scripting Tricks
I like writing shell scripts, even when they are not really
appropriate. I will attempt to document a number of tips and tricks
on this page. Unless otherwise noted, I will be writing about
POSIX-compatible shells, although much can probably be adapted to rc
without too much trouble. None of this is cutting edge or
particularly unknown, but it’s a handy list of solutions that weren’t
immediately obvious to me when I first encountered the problems they
solve.
Changing a file in-place
A common task in shell scripts is to change some file in place. A shell pipeline is not directly suitable, as for example
cat foo | grep bar > foo
will truncate the file foo
before running the pipeline, thus merely
deleting its contents. There’s a workaround exploiting shell
evaluation order, though:
(rm foo && grep bar > foo) < foo
When running the above, the shell first opens the file foo
for
reading, then begins executing the subshell. rm foo
unlinks the
file from the file system, but Unix file semantics means that the
shell will still have it open for reading (it won’t be finally erased
until any open file descriptors have been closed). When the shell
then evalutes grep bar > foo
, foo
will be a newly created file,
and thus not affect the contents of the “old” foo
that is still
open. You could also use the sponge
program from the
moreutils package,
although I believe the above is perfectly elegant.
Both the subshell method and sponge
suffer from the same problem,
namely that the contents of foo
will be entirely replaced by the
output of the command, even if the command fails. Consider using
insitu
, which can detect whether the
command fails, and not truncate in that case. If you want to be
completely safe against data loss, you need to produce all output to a
temporary file and overwrite the original at the end, though.
Literal sublines in files
You often need to do something to those lines of a file that contain
some precise string. If you needed to delete such a line, you’d be
tempted to use sed
, but then you will have to escape the string in
order to embed it within a regular expression. You might eye fgrep -v
, but if you don’t need to delete the line, but just change it in
some way, that’s still not good enough. My preference so far is to
bite the bullet and use the biggest hammer in the toolbox:
awk 'index($0,STR)!=0 { /* do whatever */ }' 'STR=any string that you can dream of'
This exploits a slightly obscure Awk facility for defining variables
(in this case, STR
) from the command line.
Multiple pipes
Sometimes you need to start a program with specific file descriptors open. This is easy if you need them bound to files, as you can simply provide the optional file descriptor parameter to the input redirection operator:
prog 3</path/to/file 4</path/to/other_file
In other cases, you need a program to continously receive the output
of several other programs, but a standard Unix shell pipeline will
connect standard outputs to standard inputs, meaning that a process
cannot have more than a single piped input stream. You can work around
this by having producers write to named pipes (mkfifo
), which can be
used like files as above, but this results in the hassle and
complexity of file system cleanup. An alternative is to exploit the
file descriptor duplication operator (<&
) to copy the standard input
file descriptor before starting the next stage of the pipeline. The
result will be every stage of the pipeline being available in a
different file descriptor. A demonstration to clarify:
loop() {
while true; do echo $1; sleep 1; done
}
loop foo | (loop bar | (loop baz | (cat <&3 & cat <&4 & cat <&5) 5<&0 0<&-) 4<&0 0<&-) 3<&0 0<&-
At each right-hand side of a file descriptor, the standard input
stream is copied to another file desciptor, and then closed (with
0<&-
), which may be necessary to make reading from the new file
descriptor reliable, depending on whether any of the programs touch
standard input. The innermost program (cat <&3 & cat <&4 & cat <&5
)
has access to the output of the three producing programs via file
descriptors 3, 4 and 5. The parentheses are necessary to ensure the
file redirection operators work in the proper subshells.
Of course, the above is rather ugly and cumbersome. The following shell function provides a nicer syntax, but requires you to define each step in the pipeline as a function by itself.
multipipe() {
if [ $# -eq 1 ]; then
$1
else
cmd=$1
shift
fd=$1
shift
$cmd | eval multipipe '"$@"' "$fd<&0" "0<&-"
fi
}
loop() {
while true; do echo $1; sleep 1; done
}
loopfoo() {
loop foo
}
loopbar() {
loop bar
}
loopbaz() {
loop baz
}
body() {
cat <&3 & cat <&4 & cat <&5
}
multipipe loopfoo 3 loopbar 4 loopbaz 5 body