Seeing What's Matched by any Regular Expression

Matching Digits Words and Whitespaces In Perl

Perl provides a rich set of simple built in matching characters with which you can construct complex regular expressions.

\d  matches any single digit character
\D  matches any single non-digit character
\w  matches any single word character 
     (alpha numeric or underscore or certain accent characters)
\W matches any single non-word character 
     not (alpha numeric or underscore or certain accent characters)
\s  matches any single whitespace character
\S  matches any single non-whitespace character
\h  matches any single horizontal whitespace character
\H  matches any single non (horizontal whitespace) character
\v  matches any single vertical whitespace character
\V  matches any single non (vertical whitespace) character

At the conceptual level you can think of these as regular expressions. So for example, and recalling that any single character is a regular expression and that the union of regular expression is a regular expression, we have:

\d = 0 &cup 1 &cup 2 &cup 3 &cup 4 &cup 5 &cup 6 &cup 7 &cup 8 &cup 9

Since the above, \d, \D, \w etc are all regular expressions, we can concatenate them, group them, qualify them, form disjunctions of them (logical OR operations using | ) and so on, to create arbitrarily complex regular expressions.

Seeing what's matched by any regular expression

Before we give examples of these and other regular expression. We will mention three Perl variables that are particularly useful for debugging regular expression code.

Upon a regular expression matching an input string the following three variable provide sub-strings of the input string.

$` provides the sub-string before the match.
$& provides the actual matching characters.
$' provides  the sub-string after the match.

These variables are undefined if no match occurs. In the example below we show there use.

Example Perl Script Seeing Matches with $` $& $'

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#!/usr/bin/perl
use strict;
use warnings;
 
#examples of Perl style regular expressions \d \s \w \W etc 
#www.review-pc.com/tutorials
 
my $input_to_be_matched="This is test data it has 123 three digits";
 
print("\ninput=This is test data it has 123 three digits");
 
print("\n".'regexp=\d\d\d');
if ($input_to_be_matched=~m/\d\d\d/)
{
    print("\npre  match \$`=$`");
    print("\nthe  match \$&=$&");
    print("\npost match \$'=$'");
}
 
print("\n\ninput=This is test data it has 123 three digits");
print("\n".'regexp=\d\D\w');
if ($input_to_be_matched=~m/\d\D\w/)
{
    print("\npre  match \$`=$`");
    print("\nthe  match \$&=$&");
    print("\npost match \$'=$'");
}
 
print("\n\ninput=This is test data it has 123 three digits");
print("\n".'regexp=as\s\d{1,9}\s\w{1,6}\s\w');
if ($input_to_be_matched=~m/as\s\d{1,9}\s\w{1,6}\s\w/)
{
    print("\npre  match \$`=$`");
    print("\nthe  match \$&=$&");
    print("\npost match \$'=$'");
}
 
print("\n\ninput=This is test data it has 123 three digits");
print("\n".'regexp=w{4}\s\w{4}');
if ($input_to_be_matched=~m/w{4}\s\w{4}/)
{
    print("\npre  match \$`=$`");
    print("\nthe  match \$&=$&");
    print("\npost match \$'=$'");
}

The script result

As always we provide execute permissions and run our script with
./filename
In interpreting the result remember, d is the literal d character, \d is the digit match regular expression and \D is the non digit match regular expression. Similar relationships hold for w, \w, and \W; s \s \S; etc.
www.review-pc.com $ ./perl-regular-expressions-pre-post.pl 

input=This is test data it has 123 three digits
regexp=\d\d\d
pre  match $`=This is test data it has 
the  match $&=123
post match $'= three digits

input=This is test data it has 123 three digits
regexp=\d\D\w
pre  match $`=This is test data it has 12
the  match $&=3 t
post match $'=hree digits

input=This is test data it has 123 three digits
regexp=as\s\d{1,9}\s\w{1,6}\s\w
pre  match $`=This is test data it h
the  match $&=as 123 three d
post match $'=igits

input=This is test data it has 123 three digits
regexp=w{4}\s\w{4}

input=This is test data it has 123 three digits
regexp=test.*\d\D+d
pre  match $`=This is 
the  match $&=test data it has 123 three d
post match $'=igits
www.review-pc.com $ 

Summary

  • \d matches any single digit character
  • \D matches any single non-digit character
  • \w matches any single word character (alpha numeric or underscore or certain accent characters)
  • \W matches any single non-word character not (alpha numeric or underscore or certain accent characters)
  • \s matches any single whitespace character
  • \S matches any single non-whitespace character
  • \h matches any single horizontal whitespace character
  • \H matches any single non (horizontal whitespace) character
  • \v matches any single vertical whitespace character
  • \V matches any single non (vertical whitespace) character
  • $` provides the sub-string before the match
  • $& provides the actual matching characters
  • $' provides the sub-string after the match