Character classes match one of a group of characters.
|
Item
|
|
Meaning:
|
|
|
[…]
|
|
Character class. Matches any one character between the square brackets.
|
|
|
[^…]
|
|
Negated character class. Matches any one character not between the square brackets.
|
|
|
In the table “…” stands for any one or more characters.
|
|
Classlike constructs |
|
Item
|
|
Matches:
|
|
|
.
|
|
Matches any character except a new line.
If used in “dot matches all mode” also matches a new line.
|
|
|
\C
|
|
One byte even in utf8 . Equivalent to a C char . (Dangerous, do not use!)
|
|
|
\X
|
|
Unicode “combining character sequence”.
Equivalent to (?:\PM\pM*) .
|
|
|
Classic character class shortcuts |
|
Item
|
|
Matches:
|
|
|
\d
|
|
Digit.
Equivalent to [0-9] .
|
|
|
\D
|
|
Nondigit.
Equivalent to [^0-9] .
|
|
|
\s
|
|
Whitespace.
Equivalent to [ \f\n\r\t] .
|
|
|
\S
|
|
Nonwhitespace.
Equivalent to [^ \f\n\r\t] .
|
|
|
\w
|
|
Word character.
Equivalent to [a-zA-Z0-9_] .
|
|
|
\W
|
|
Non word character.
Equivalent to [^a-zA-Z0-9_] .
|
|
|
POSIX-style character classes
|
|
Item
|
|
Matches:
|
|
|
[:alnum:]
|
|
Any alphanumeric.
Equivalent to [[:alpha:]{:digit;]] .
|
|
|
[:alpha:]
|
|
Any Unicode letter.
|
|
|
[:ascii:]
|
|
Any regular ascii character, (i.e. from ascii code 0 to ascii code).
|
|
|
[:cntrl:]
|
|
Any control character.
|
|
|
[:digit:]
|
|
Any Unicode decimal digit.
Equivalent to \d when using Unicode.
|
|
|
[:graph:]
|
|
Any alphanumeric or punctuation character.
|
|
|
[:lower:]
|
|
Any lowercase letter.
|
|
|
[:print:]
|
|
Any alphanumeric or punctuation character or space.
|
|
|
[:punct:]
|
|
Any punctuation character
|
|
|
[:space:]
|
|
Any Unicode space character.
Equivalent to \s when using Unicode.
|
|
|
[:upper:]
|
|
Any uppercase letter.
|
|
|
[:word:]
|
|
Any alphanumeric character or underline.
|
|
|
[:xdigit:]
|
|
Any hexadecimal digit.
Equivalent to [0-9a-fA-F] .
|
|
|
These character classes can only be used when constructing other classes, that is, inside a pair of square brackets. For example [[:digit:]] is the Posix equivalent of \d .
Perl extends these Posix character classes by allowing you to negate them by prefixing the class name with a ^ . For example [:^digit:] is a negated [:digit;} .
|
Unicode properties
|
|
Item
|
|
Matches:
|
|
|
\p{PROP}
|
|
Unicode property PROP. The braces are optional with one-character names.
|
|
|
\P{PROP}
|
|
Any character not in \p{ PROP} . The braces are optional with one-character names.
|
|
|
Perl provides a very complete support for Unicode properties. In fact, more than I am willing to list here. If you want to know more, see Perl’s documentation.
|
|