Skip to navigation

Filename pattern matching via bash extended globbing and find regular expressions in shell scripts

I was confused that bash would expand some patterns nicely to match filenames when I was on the commandline, but the same pattern would not work from inside a shell script. For example, the following pattern would nicely match all files in the current directory not having an “Rnw” or “tex” extension on the commandline:

ls *.!(Rnw|tex)

Run from a shell script, however, this would give the error:

line 5: syntax error near unexpected token `('

It turns out on ubuntu, the shell option extglob is set for an interactive shell, but not for the non-interactive shell that is used when you run a script. You can see whether “extglob” is on like this, either from a commandline or by putting the line into your script:

shopt extglob

If it’s off and you want it on, put into your script:

shopt -s extglob

An alternative (that you might not want, I’m not sure what else it might affect) is to change the opening line of your script to read

#!/bin/bash -i

This runs your script in an interactive shell and should thus have the same behavior as the commandline. Generally, a neat trick is to use the -x option to debug shell scripts:

#!/bin/bash -x

An alternative way to get files whose names match a certain pattern is of course to use find. Here, you can either use limited globbing patterns with the -name option, and combine several patterns using -o, like this:

find . -name "*.tex" -o -name "*.txt"

Find also has a -regex option. If you use that, you probably want to use -regex-type=posix-extended to get a behavior that is more similar to extended regular expressions that you might know, for example from sed. Additionally, you have to watch out that find needs the regular expression to match the entire filename including the directory. Here is an example for this:

audrey:~/tmp/testdir$ ls
blah.tex  blah.txt
audrey:~/tmp/testdir$ find . -regex "blah\.tex"
audrey:~/tmp/testdir$ find . -regex "\./blah\.tex"

Here is an example for the difference between the default and the extended regex pattern behavior:

audrey:~/tmp/testdir$ find . -regex "\./blah\.(tex|txt)"

finds nothing, but:

audrey:~/tmp/testdir$ find . -regextype posix-extended -regex "\./blah\.(tex|txt)"

You could alternatively get the desired behavior with the default regex by escaping the grouping parentheses and the “|” operator:

audrey:~/tmp/testdir$ find . -regex "\./blah\.\(tex\|txt\)"

But that starts getting really hard to read.

2 Responses to “Filename pattern matching via bash extended globbing and find regular expressions in shell scripts”

  1. Dragos Toader Says:

    # I tend to use -regextype posix-extended. i.e.
    find . -regextype posix-extended -regex “\./restore\.[0-9]{8}\.sh”

  2. Dragos Toader Says:

    # In your case, simplify more to
    find . -regextype posix-extended -regex “\./blah\.t(xt|ex)”
    # For more combinations
    find . -regextype posix-extended -regex “\./blah\.t[ex][tx]”
    # blah.tet
    # blah.tex
    # blah.txt
    # blah.txx