Pattern matching is the process of testing whether a given string belongs to a given set of strings, called a pattern. Remark uses two traditional forms of pattern matching, called globs and regular expressions.
A glob defines a pattern by a string in which:
*
matches anything any number of times, ?
matches anything at most one time, [seq]
matches any character in string seq
, and [!seq]
matches any character not in string seq
.For example, ?at.png
matches at.png
, bat.png
, and cat.png
. Globs are commonly used in file systems. They capture a reasonable amount of patterns, while still being intuitive.
A regular expression, or a regex, defines a pattern by a string constructed using the following kind of rules:
.
matches any single character except the newline,E?
matches E
at most one time,E*
matches E
any number of times,E+
matches E
at least one time,AB
matches A
and B
in a sequence,A|B
matches either A
or B
,(E)
matches E
,where E
, A
, and B
are regular expressions. The backslash \
is used to escape the meta-characters. This list is incomplete; the regular expressions are given using the Python’s regular expression syntax. For example, (ab)*.txt
matches .txt
, ab.txt
, abab.txt
, and so on. In Remark, the regex is automatically appended \Z
at the end so that it must match the whole string. Regular expressions are strictly more powerful than globs, but they are also less intuitive.