No subject


Fri Apr 17 06:33:00 UTC 2009


 Non-bracketing delimiters use the same character fore and aft, but the
 four sorts of brackets (round, angle, square, curly) will all nest,
 which means that

 q{foo{bar}baz}

 is the same as

  'foo{bar}baz'

Further, in "Gory details of parsing quoted constructs":

 When searching for single-character delimiters, escaped delimiters
 and "\\" are skipped. For example, while searching for terminating
 "/", combinations of "\\" and "\/" are skipped.  If the delimiters are
 bracketing, nested pairs are also skipped.  For example, while searching
 for closing "]" paired with the opening "[", combinations of "\\", "\]",
 and "\[" are all skipped, and nested "[" and "]" are skipped as well.

This implies that m(() would be invalid syntax, and you need to quote
the opening bracket to get it through to the regexp at all if it doesn't
have a matching pair.  So m(\() means m/(/. This is an invalid regexp
giving the error message above because '(' has special significance in
regexps. Similarly

>  vnix$ perl -ne 'print if m[\[]' </dev/null
>  Unmatched [ in regex; marked by <-- HERE in m/[ <-- HERE / at -e line
>  1.

is the same as m/[/, which is an invalid regexp as well and the error
message is correct.

It would seem that getting the opening bracket through as a regexp
literal similar to m/\(/ or m/\[/ would need a double escaping when
the same bracket is the delimiter. However, further down in 
'Gory details of parsing quoted constructs':

 The lack of processing of "\\" creates specific restrictions on the
 post-processed text.  If the delimiter is "/", one cannot get the
 combination "\/" into the result of this step.  "/" will finish the
 regular expression, "\/" will be stripped to "/" on the previous step,
 and "\\/" will be left as is.  Because "/" is equivalent to "\/" inside
 a regular expression, this does not matter unless the delimiter happens
 to be character special to the RE engine, such as in "s*foo*bar*",
 "m[foo]", or "?foo?"; or an alphanumeric char

 [...]

which is precisely the case here.

This can be worked around with the normal regexp quote escape \Q...\E,
so that m/\[/ becomes m[\Q\[\E]. The result can be easily confirmed with
'debugperl -Dr -e 'm[\Q\[\E]'.

> Note that in the error message, the backslash is missing.
> 
> The closing square bracket works as expected:
> 
>  vnix$ perl -ne 'print if m[\]]' </dev/null

This corresponds to m/]/, which works because ']' isn't special in a
regexp on its own, so it needs no quoting.

> But with the rounded parens, the closing paren too is mishandled:
> 
>  vnix$ perl -ne 'print if m(\))' </dev/null
>  Unmatched ) in regex; marked by <-- HERE in m/) <-- HERE / at -e line
>  1.

Again, m/)/  is invalid. m(\Q\)\E) works.

> With square brackets, you get an error message even if there is an
> escaped pair:
> 
>  vnix$ perl -ne 'print if m[\[\]]' </dev/null
>  Unmatched [ in regex; marked by <-- HERE in m/[ <-- HERE ]/ at -e line
>  1.

This means m/[]/, which is an invalid regexp because the closing bracket
is part of the list and doesn't end the character class. From perlre.pod:

 If you want either "-" or "]" itself to be a member of a class, put
 it at the start of the list (possibly after a "^"), or escape it with
 a backslash.

The desired behaviour can be achieved with m[\Q[]\E] or m[\Q[\E\]].
The former works because of the nesting delimiter rule.

> If you remove the escape from the closing square bracket, you still get
> the error:
> 
>  vnix$ perl -ne 'print if m[\[]]' </dev/null
>  Unmatched [ in regex; marked by <-- HERE in m/[ <-- HERE / at -e line
>  1.

That's because it means m/[/] so the regexp part is just the opening
bracket (and the closing bracket would cause a syntax error afterwards
even if the regexp could be compiled).

> Also note that the error message lacks the closing square bracket.
> 
> With rounded brackets, matching pairs work as expected:
> 
>  vnix$ echo 'foo()' | perl -ne 'print if m(\(\))' 
>  foo()

That actually matches any string just like m/()/: the regexp
is just an empty group.

 % echo 'foo' | perl -ne 'print if m(\(\))' 
 foo

For the sake of completeness, this needs m(\Q()\E) or m(\Q\(\)\E),
and the former again works because of the nesting rule.

> Curly brackets and brokets work fine:
> 
>  vnix$ perl -ne 'print if m{\{}' </dev/null
> 
>  vnix$ perl -ne 'print if m<\<>' </dev/null

Those aren't special in regexps, so they should.

Please let me know if I can close this bug. The documentation could
always be better but I think things are working as documented. 
The 'gory details' title pretty much describes all the variations here :)

Cheers,
-- 
Niko Tyni   ntyni at debian.org






More information about the Perl-maintainers mailing list