Contents

  Icon

Character classes

Anchors

Basic metacharacters

Character classes

Comments

Control and code

Grouping

Look around

Mode modifiers

Quantifiers

Specific characters

Character classes match one of a group of characters.

bullet Basic structure

bullet Classlike constructs

bullet Classic character class shortcuts

bullet POSIX-style character classes

bullet Unicode properties

Basic structure

 

Item

 

Meaning:

 
 

[…]

 

Character class.
Matches any one character between the square brackets.

 
 

[^…]

 

Negated character class.
Matches any one character not between the square brackets.

 
 

In the table “…” stands for any one or more characters.

 

Classlike constructs

 

Item

 

Matches:

 
 

.

 

Matches any character except a new line.
If used in “dot matches all mode” also matches a new line.

 

\C

One byte even in utf8.
Equivalent to a C char. (Dangerous, do not use!)

\X

Unicode “combining character sequence”.
Equivalent to (?:\PM\pM*).

 

Classic character class shortcuts

 

Item

 

Matches:

 
 

\d

 

Digit.
Equivalent to [0-9].

 
 

\D

 

Nondigit.
Equivalent to [^0-9].

 
 

\s

 

Whitespace.
Equivalent to [ \f\n\r\t].

 
 

\S

 

Nonwhitespace.
Equivalent to [^ \f\n\r\t].

 
 

\w

 

Word character.
Equivalent to [a-zA-Z0-9_].

 

\W

Non word character.
Equivalent to [^a-zA-Z0-9_].

 

POSIX-style character classes

 

Item

 

Matches:

 
 

[:alnum:]

 

Any alphanumeric.
Equivalent to [[:alpha:]{:digit;]].

 
 

[:alpha:]

 

Any Unicode letter.

 
 

[:ascii:]

 

Any regular ascii character, (i.e. from ascii code 0 to ascii code).

 
 

[:cntrl:]

 

Any control character.

 
 

[:digit:]

 

Any Unicode decimal digit.
Equivalent to \d when using Unicode.

 
 

[:graph:]

 

Any alphanumeric or punctuation character.

 
 

[:lower:]

 

Any lowercase letter.

 
 

[:print:]

 

Any alphanumeric or punctuation character or space.

 
 

[:punct:]

 

Any punctuation character

 
 

[:space:]

 

Any Unicode space character.
Equivalent to \s when using Unicode.

 
 

[:upper:]

 

Any uppercase letter.

 
 

[:word:]

 

Any alphanumeric character or underline.

 

[:xdigit:]

Any hexadecimal digit.
Equivalent to [0-9a-fA-F].

 

These character classes can only be used when constructing other classes, that is, inside a pair of square brackets. For example [[:digit:]] is the Posix equivalent of \d.
Perl extends these Posix character classes by allowing you to negate them by prefixing the class name with a ^. For example [:^digit:] is a negated [:digit;}

Unicode properties

 

Item

 

Matches:

 
 

\p{PROP}

 

Unicode property PROP. The braces are optional with one-character names.

 

\P{PROP}

Any character not in \p{PROP}. The braces are optional with one-character names.

 

Perl provides a very complete support for Unicode properties. In fact, more than I am willing to list here. If you want to know more, see Perl’s documentation.