r64525 - in /trunk/libregexp-grammars-perl: Changes MANIFEST META.yml README debian/changelog lib/Regexp/Grammars.pm t/grammar_polymorphism.t t/no_context.t t/no_context_counterlocal.t t/no_context_local.t
carnil at users.alioth.debian.org
carnil at users.alioth.debian.org
Thu Nov 4 08:25:36 UTC 2010
Author: carnil
Date: Thu Nov 4 08:23:13 2010
New Revision: 64525
URL: http://svn.debian.org/wsvn/pkg-perl/?sc=1&rev=64525
Log:
New upstream release
Added:
trunk/libregexp-grammars-perl/t/grammar_polymorphism.t
- copied unchanged from r64523, branches/upstream/libregexp-grammars-perl/current/t/grammar_polymorphism.t
trunk/libregexp-grammars-perl/t/no_context.t
- copied unchanged from r64523, branches/upstream/libregexp-grammars-perl/current/t/no_context.t
trunk/libregexp-grammars-perl/t/no_context_counterlocal.t
- copied unchanged from r64523, branches/upstream/libregexp-grammars-perl/current/t/no_context_counterlocal.t
trunk/libregexp-grammars-perl/t/no_context_local.t
- copied unchanged from r64523, branches/upstream/libregexp-grammars-perl/current/t/no_context_local.t
Modified:
trunk/libregexp-grammars-perl/Changes
trunk/libregexp-grammars-perl/MANIFEST
trunk/libregexp-grammars-perl/META.yml
trunk/libregexp-grammars-perl/README
trunk/libregexp-grammars-perl/debian/changelog
trunk/libregexp-grammars-perl/lib/Regexp/Grammars.pm
Modified: trunk/libregexp-grammars-perl/Changes
URL: http://svn.debian.org/wsvn/pkg-perl/trunk/libregexp-grammars-perl/Changes?rev=64525&op=diff
==============================================================================
--- trunk/libregexp-grammars-perl/Changes (original)
+++ trunk/libregexp-grammars-perl/Changes Thu Nov 4 08:23:13 2010
@@ -138,3 +138,16 @@
(and made it work around normal lookahead/capture problem)
* Fixed major bugs in <:arg> handling
+
+
+1.012 Wed Nov 3 20:24:36 2010
+
+ * Added RFC5322 example (thanks Tom and Abigail!)
+
+ * Added <:nocontext> and <:context> directives to optimize
+ away unwanted context substrings.
+
+ * Solved transitive inheritance problem
+ (grammars now fully polymorphic)
+
+ * Added NEXT:: namespace for generic polymorphism
Modified: trunk/libregexp-grammars-perl/MANIFEST
URL: http://svn.debian.org/wsvn/pkg-perl/trunk/libregexp-grammars-perl/MANIFEST?rev=64525&op=diff
==============================================================================
--- trunk/libregexp-grammars-perl/MANIFEST (original)
+++ trunk/libregexp-grammars-perl/MANIFEST Thu Nov 4 08:23:13 2010
@@ -56,4 +56,8 @@
t/repop_ws.t
t/top_is_token.t
t/lookaheads.t
+t/grammar_polymorphism.t
+t/no_context.t
+t/no_context_counterlocal.t
+t/no_context_local.t
META.yml Module meta-data (added by MakeMaker)
Modified: trunk/libregexp-grammars-perl/META.yml
URL: http://svn.debian.org/wsvn/pkg-perl/trunk/libregexp-grammars-perl/META.yml?rev=64525&op=diff
==============================================================================
--- trunk/libregexp-grammars-perl/META.yml (original)
+++ trunk/libregexp-grammars-perl/META.yml Thu Nov 4 08:23:13 2010
@@ -1,6 +1,6 @@
--- #YAML:1.0
name: Regexp-Grammars
-version: 1.011
+version: 1.012
abstract: Add grammatical parsing features to Perl 5.10 regexes
license: ~
author:
Modified: trunk/libregexp-grammars-perl/README
URL: http://svn.debian.org/wsvn/pkg-perl/trunk/libregexp-grammars-perl/README?rev=64525&op=diff
==============================================================================
--- trunk/libregexp-grammars-perl/README (original)
+++ trunk/libregexp-grammars-perl/README Thu Nov 4 08:23:13 2010
@@ -1,4 +1,4 @@
-Regexp::Grammars version 1.011
+Regexp::Grammars version 1.012
This module adds a small number of new regex constructs that can be used
within Perl 5.10 patterns to implement complete recursive-descent parsing.
Modified: trunk/libregexp-grammars-perl/debian/changelog
URL: http://svn.debian.org/wsvn/pkg-perl/trunk/libregexp-grammars-perl/debian/changelog?rev=64525&op=diff
==============================================================================
--- trunk/libregexp-grammars-perl/debian/changelog (original)
+++ trunk/libregexp-grammars-perl/debian/changelog Thu Nov 4 08:23:13 2010
@@ -1,9 +1,8 @@
-libregexp-grammars-perl (1.011-2) UNRELEASED; urgency=low
+libregexp-grammars-perl (1.012-1) UNRELEASED; urgency=low
- # Please don't touch this module would like to update it myself
- # for the moment.
+ * New upstream release
- -- Salvatore Bonaccorso <carnil at debian.org> Fri, 29 Oct 2010 18:31:56 +0200
+ -- Salvatore Bonaccorso <carnil at debian.org> Thu, 04 Nov 2010 09:21:06 +0100
libregexp-grammars-perl (1.011-1) unstable; urgency=low
Modified: trunk/libregexp-grammars-perl/lib/Regexp/Grammars.pm
URL: http://svn.debian.org/wsvn/pkg-perl/trunk/libregexp-grammars-perl/lib/Regexp/Grammars.pm?rev=64525&op=diff
==============================================================================
--- trunk/libregexp-grammars-perl/lib/Regexp/Grammars.pm (original)
+++ trunk/libregexp-grammars-perl/lib/Regexp/Grammars.pm Thu Nov 4 08:23:13 2010
@@ -7,7 +7,7 @@
use Scalar::Util qw< blessed >;
use Data::Dumper qw< Dumper >;
-our $VERSION = '1.011';
+our $VERSION = '1.012';
# Load the module...
sub import {
@@ -566,6 +566,9 @@
# Remove "private" captures (i.e. those starting with _)...
delete @{$curr_frame}{grep {substr($_,0,1) eq '_'} keys %{$curr_frame} };
+ # Remove "nocontext" marker...
+ my $nocontext = delete $curr_frame->{'~'};
+
# Build a clone of the current frame...
my $cloned_result_frame
= exists $curr_frame->{'='} ? $curr_frame->{'='}
@@ -585,6 +588,11 @@
}
}
+ # Remove capture if not requested...
+ if ($nocontext && ref $cloned_result_frame eq 'HASH' && keys %{$cloned_result_frame} > 1) {
+ delete $cloned_result_frame->{q{}};
+ }
+
# Nest a clone of current frame inside a clone of the caller frame...
my $cloned_caller_frame = {
%{$caller_frame//{}},
@@ -618,6 +626,9 @@
# Remove "private" captures (i.e. those starting with _)...
delete @{$curr_frame}{grep {substr($_,0,1) eq '_'} keys %{$curr_frame} };
+
+ # Remove "nocontext" marker...
+ my $nocontext = delete $curr_frame->{'~'};
# Clone the current frame...
my $cloned_result_frame
@@ -638,6 +649,11 @@
}
}
+ # Remove capture if not requested...
+ if ($nocontext && ref $cloned_result_frame eq 'HASH' && keys %{$cloned_result_frame} > 1) {
+ delete $cloned_result_frame->{q{}};
+ }
+
# Append a clone of current frame inside a clone of the caller frame...
my $cloned_caller_frame = {
%{$caller_frame},
@@ -664,6 +680,7 @@
# Namespace in which grammar inheritance occurs...
my $CACHE = 'Regexp::Grammars::_CACHE_::';
my $CACHE_LEN = length $CACHE;
+my %CACHE; #...for subrule tracking
# This code inserted at the start of every grammar regex
# (initializes the result stack cleanly and backtrackably, via local)...
@@ -674,11 +691,10 @@
# This code inserted at the end of every grammar regex
# (puts final result in %/. Also defines default <ws>, <hk>, etc.)...
-my $EPILOGUE = q{
- )(?{; $Regexp::Grammars::RESULT_STACK[-1]{""} //= $^N;
+my $EPILOGUE = q{)(?{; $Regexp::Grammars::RESULT_STACK[-1]{q{}} //= $^N;
local $Regexp::Grammars::match_frame = pop @Regexp::Grammars::RESULT_STACK;
delete @{$Regexp::Grammars::match_frame}{
- grep {substr($_,0,1) eq '_'} keys %{$Regexp::Grammars::match_frame}
+ '~', grep {substr($_,0,1) eq '_'} keys %{$Regexp::Grammars::match_frame}
};
if (exists $Regexp::Grammars::match_frame->{'='}) {
if (ref($Regexp::Grammars::match_frame->{'='}) eq 'HASH') {
@@ -699,6 +715,8 @@
(?<matchline> (?{; $Regexp::Grammars::RESULT_STACK[-1]{"="} = 1 + substr($_,0,pos) =~ tr/\n/\n/; }) )
)
};
+my $EPILOGUE_NC = $EPILOGUE;
+ $EPILOGUE_NC =~ s{ ; [^;]+ ;}{;}xms;
#=====[ MISCELLANEOUS PATTERNS THAT MATCH USEFUL THINGS ]========
@@ -1090,8 +1108,8 @@
}
sub _translate_subrule_call {
- my ( $construct, $alias, $subrule, $args, $savemode, $postmodifier,
- $debug_build, $debug_runtime, $valid_subrule_names_ref)
+ my ( $grammar_name, $construct, $alias, $subrule, $args, $savemode, $postmodifier,
+ $debug_build, $debug_runtime, $valid_subrule_names_ref, $nocontext)
= @_;
# Translate arg list, if provided...
@@ -1111,10 +1129,28 @@
}
# Transform qualified subrule names...
+ my $simple_subrule = $subrule;
+ my $start_grammar = (($simple_subrule =~ s{(.*)::}{}xms) ? $1 : "");
+ if ($start_grammar !~ /^NEXT$|::/) {
+ $start_grammar = caller(3).'::'.$start_grammar;
+ }
+
+ my @candidates = $start_grammar eq 'NEXT' ? _ancestry_of($grammar_name)
+ : _ancestry_of($start_grammar);
+
+ # Rename fully-qualified rule call, if to ancestor grammar...
+ RESOLVING:
+ for my $parent_class (@candidates) {
+ my $inherited_subrule = $parent_class.'::'.$simple_subrule;
+ if ($CACHE{$inherited_subrule}) {
+ $subrule = $inherited_subrule;
+ last RESOLVING;
+ }
+ }
+
+ # Replace package separators, which regex engine can't handle...
my $internal_subrule = $subrule;
$internal_subrule =~ s{::}{_88_}gxms;
- my $simple_subrule = $subrule;
- $simple_subrule =~ s{.*::}{}xms;
# Shortcircuit if unknown subrule invoked...
if (!$valid_subrule_names_ref->{$subrule}) {
@@ -1180,6 +1216,8 @@
# Translate to standard regex code...
return qq{(?:(?{;
local \@Regexp::Grammars::RESULT_STACK = (\@Regexp::Grammars::RESULT_STACK, {'\@'=>{$args}});
+ \$Regexp::Grammars::RESULT_STACK[-2]{'~'} = $nocontext
+ if \@Regexp::Grammars::RESULT_STACK >= 2;
$debug_pre})((?&$internal_subrule))(?{;
local \@Regexp::Grammars::RESULT_STACK = (
$save_code
@@ -1189,6 +1227,7 @@
sub _translate_rule_def {
my ($type, $qualifier, $name, $callname, $qualname, $body, $objectify, $local_ws) = @_;
+ $qualname =~ s{::}{_88_}gxms;
# Return object if requested...
my $objectification =
@@ -1209,8 +1248,7 @@
(?{;\$#{!}=delete(\$Regexp::Grammars::RESULT_STACK[-1]{'!'})//0;
delete(\$Regexp::Grammars::RESULT_STACK[-1]{'\@'});
})
- )
- )
+ ))
)
};
}
@@ -1218,7 +1256,8 @@
# Locate any valid <...> sequences and replace with native regex code...
sub _translate_subrule_calls {
- my ($grammar_spec,
+ my ($grammar_name,
+ $grammar_spec,
$compiletime_debugging_requested,
$runtime_debugging_requested,
$pre_match_debug,
@@ -1226,13 +1265,14 @@
$rule_name,
$subrule_names_ref,
$magic_ws,
+ $nocontext,
) = @_;
# Remember the preceding construct, so as to implement the ** operator...
my $prev_construct = q{};
my $prev_translation = q{};
- # Translate all other calls...
+ # Translate all other calls (MAIN GRAMMAR FOR MODULE)...
$grammar_spec =~ s{
(?<list_marker> (?<ws1> \s*+) \*\* (?<ws2> \s*+) )?
(?<construct>
@@ -1309,6 +1349,14 @@
debug \s* : \s* (?<cmd> run | match | step | try | off | on) \s*
)
|
+ (?<context_directive>
+ context \s* : \s*
+ )
+ |
+ (?<nocontext_directive>
+ nocontext \s* : \s*
+ )
+ |
(?<yadaerror_directive>
[.][.][.]
| [!][!][!]
@@ -1349,12 +1397,13 @@
(?<CHARSET> \[ \^?+ \\?+ \]?+ [^]]*+ \] )
(?<IDENT> [^\W\d]\w*+ )
(?<QUALIDENT> (?: [^\W\d]\w*+ :: )* [^\W\d]\w*+ )
- (?<LITERAL> (?&NUMBER) | (?&STRING) )
+ (?<LITERAL> (?&NUMBER) | (?&STRING) | (?&VAR) )
(?<NUMBER> [+-]? \d++ (?:\. \d++)? (?:[eE] [+-]? \d++)? )
(?<STRING> ' [^\\']++ (?: \\. [^\\']++ )* ' )
(?<ARGLIST> (?&PARENCODE) | \( \s* (?&ARGS)? \s* \) | (?# NOTHING ) )
(?<ARGS> (?&ARG) \s* (?: , \s* (?&ARG) \s* )* ,? )
- (?<ARG> : (?&IDENT) | (?&KEY) \s* => \s* (?&LITERAL) )
+ (?<ARG> (?&VAR) | (?&KEY) \s* => \s* (?&LITERAL) )
+ (?<VAR> : (?&IDENT) )
(?<KEY> (?&IDENT) | (?&LITERAL) )
)
}{
@@ -1410,18 +1459,22 @@
# Translate subrule calls of the form: <ALIAS=RULENAME>...
elsif ($+{alias_subrule_scalar}) {
_translate_subrule_call(
+ $grammar_name,
$curr_construct, $alias, $+{subrule}, $+{args}, 'scalar', $+{modifier},
$compiletime_debugging_requested,
$runtime_debugging_requested,
$subrule_names_ref,
+ $nocontext,
);
}
elsif ($+{alias_subrule_list}) {
_translate_subrule_call(
+ $grammar_name,
$curr_construct, $alias, $+{subrule}, $+{args}, 'list', $+{modifier},
$compiletime_debugging_requested,
$runtime_debugging_requested,
$subrule_names_ref,
+ $nocontext,
);
}
@@ -1437,35 +1490,43 @@
}
$pre . _translate_subrule_call(
+ $grammar_name,
$curr_construct, qq{'$+{subrule}'}, $+{subrule}, $+{args}, $type, q{},
$compiletime_debugging_requested,
$runtime_debugging_requested,
$subrule_names_ref,
+ $nocontext,
)
. $post;
}
elsif ($+{self_subrule_scalar_nocap}) {
_translate_subrule_call(
+ $grammar_name,
$curr_construct, qq{'$+{subrule}'}, $+{subrule}, $+{args}, 'noncapturing', $+{modifier},
$compiletime_debugging_requested,
$runtime_debugging_requested,
$subrule_names_ref,
+ $nocontext,
);
}
elsif ($+{self_subrule_scalar}) {
_translate_subrule_call(
+ $grammar_name,
$curr_construct, qq{'$+{subrule}'}, $+{subrule}, $+{args}, 'scalar', $+{modifier},
$compiletime_debugging_requested,
$runtime_debugging_requested,
$subrule_names_ref,
+ $nocontext,
);
}
elsif ($+{self_subrule_list}) {
_translate_subrule_call(
+ $grammar_name,
$curr_construct, qq{'$+{subrule}'}, $+{subrule}, $+{args}, 'list', $+{modifier},
$compiletime_debugging_requested,
$runtime_debugging_requested,
$subrule_names_ref,
+ $nocontext,
);
}
@@ -1565,6 +1626,14 @@
$compiletime_debugging_requested, -$rule_name
);
}
+ elsif ($+{context_directive}) {
+ $nocontext = 0;
+ q{}; # Remove the directive
+ }
+ elsif ($+{nocontext_directive}) {
+ $nocontext = 1;
+ q{}; # Remove the directive
+ }
# There shouldn't be any other possibility...
else {
@@ -1685,6 +1754,16 @@
# Cache of rule/token names within defined grammars...
my %subrule_names_for;
+# Build list of ancestors for a given grammar...
+sub _ancestry_of {
+ my ($grammar_name) = @_;
+
+ return () if !$grammar_name;
+
+ use mro;
+ return map { substr($_, $CACHE_LEN) } @{mro::get_linear_isa($CACHE.$grammar_name, 'c3')};
+}
+
# Detect and translate any requested grammar inheritances...
sub _extract_inheritances {
my ($regex, $compiletime_debugging_requested, $derived_grammar_name) = @_;
@@ -1696,14 +1775,14 @@
my $orig_grammar_name = $+{base_grammar_name};
my $grammar_name = $orig_grammar_name;
if ($grammar_name !~ /::/) {
- $grammar_name = caller(2) . "::$grammar_name";
+ $grammar_name = caller(2).'::'.$grammar_name;
}
if (exists $user_defined_grammar{$grammar_name}) {
if ($compiletime_debugging_requested) {
_debug_notify( info =>
"Processing inheritance request for $grammar_name...",
- "",
+ q{},
);
}
@@ -1721,16 +1800,12 @@
}
}
- # Retrieve ancestors in C3 dispatch order and remove prefixes...
- use mro;
- my @ancestors = @{mro::get_linear_isa($CACHE.$derived_grammar_name, 'c3')};
- shift @ancestors;
- for my $ancestor (@ancestors) {
- substr($ancestor,0,$CACHE_LEN,q{});
- }
+ # Retrieve ancestors (but not self) in C3 dispatch order...
+ my (undef, @ancestors) = _ancestry_of($derived_grammar_name);
# Extract subrule names and implementations for ancestors...
my %subrule_names = map { %{$subrule_names_for{$_}} } @ancestors;
+ $_ = -1 for values %subrule_names;
my $implementation
= join "\n", map { $user_defined_grammar{$_} } @ancestors;
@@ -1821,6 +1896,13 @@
;
# Subdivide into rule and token definitions, preparing to process each...
+ # REWRITE THIS, USING (PROBABLY NEED TO REFACTOR ALL GRAMMARS TO REUSe
+ # THESE COMPONENTS:
+ # (?<PARAMLIST> \( \s* (?&PARAMS)? \s* \) | (?# NOTHING ) )
+ # (?<PARAMS> (?&PARAM) \s* (?: , \s* (?&PARAM) \s* )* ,? )
+ # (?<PARAM> (?&VAR) (?: \s* = \s* (?: (?&LITERAL) | (?&PARENCODE) ) )? )
+ # (?<LITERAL> (?&NUMBER) | (?&STRING) | (?&VAR) )
+ # (?<VAR> : (?&IDENT) )
my @defns = split m{
^ [^#\n]*? \K < (obj|)(rule|token) \s*+ :
\s*+ ((?:${IDENT}::)*+) (?: ($IDENT) \s*+ = \s*+ )?+
@@ -1853,7 +1935,7 @@
"No main regex specified before rule definitions.",
"Grammar will never match anything.",
"(Did you forget a <grammar:...> specification?)",
- "",
+ q{},
);
}
@@ -1870,6 +1952,11 @@
$grammar_name = caller(1) . "::$grammar_name";
}
$is_grammar = 1;
+
+ # Add subrule definitions to namespace...
+ for my $subrule_name (@subrule_names) {
+ $CACHE{$grammar_name.'::'.$subrule_name} = 1;
+ }
}
else {
state $dummy_grammar_index = 0;
@@ -1890,6 +1977,11 @@
# Add inherited subrule names to allowed subrule names;
@subrule_names{ keys %{$inherited_subrule_names} }
= values %{$inherited_subrule_names};
+
+ # Remove any top-level nocontext directive...
+ my $nocontext = ($main_regex =~ s{ < nocontext \s* : \s* > }{}gxms) ? 1
+ : ($main_regex =~ s{ < context \s* : \s* > }{}gxms) ? 0
+ : 0;
# If so, set up to save the grammar...
if ($is_grammar) {
@@ -1900,7 +1992,7 @@
if ($compiletime_debugging_requested) {
_debug_notify( info =>
"Processing definition of grammar $grammar_name...",
- "",
+ q{},
);
}
@@ -1924,8 +2016,8 @@
# Remember set of valid subrule names...
$subrule_names_for{$grammar_name}
= {
- map { ($_ => 1, $grammar_name.'::'.$_ => 1 ) }
- keys %subrule_names
+ map({ ($_ => 1) } keys %subrule_names),
+ map({ ($grammar_name.'::'.$_ => 1) } grep { !/::/ } keys %subrule_names),
};
}
else { #...not a grammar specification
@@ -1938,6 +2030,7 @@
# Any actual regex is processed first...
$regex = _translate_subrule_calls(
+ $grammar_name,
$main_regex,
$compiletime_debugging_requested,
$runtime_debugging_requested,
@@ -1945,7 +2038,8 @@
$post_match_debug,
q{}, # Expected...what?
\%subrule_names,
- 0 # Whitespace isn't magical
+ 0, # Whitespace isn't magical
+ $nocontext,
);
# Report how construct was interpreted, if requested to...
@@ -1958,16 +2052,12 @@
}
}
- # Build common prefix to fully qualified rulenames
- my $qual_prefix = $grammar_name.'::';
- $qual_prefix =~ s/::/_88_/g;
-
# Then iterate any following rule definitions...
while (@defns) {
# Grab details of each rule defn (as extracted by previous split)...
my ($objectify, $type, $qualifier, $name, $callname, $body) = splice(@defns, 0, 6);
$name //= $callname;
- my $qualified_name = $qual_prefix.$callname;
+ my $qualified_name = $grammar_name.'::'.$callname;
# Report how construct was interpreted, if requested to...
if ($compiletime_debugging_requested) {
@@ -1979,6 +2069,7 @@
# Translate any nested <...> constructs...
$body = _translate_subrule_calls(
+ $grammar_name,
$body,
$compiletime_debugging_requested,
$runtime_debugging_requested,
@@ -1987,6 +2078,7 @@
$callname, # Expected...what?
\%subrule_names,
$type eq 'rule', # Is whitespace magical?
+ $nocontext, # Start with the global nocontextuality
);
# Report how construct was interpreted, if requested to...
@@ -2075,14 +2167,15 @@
}
# Otherwise, aggregrate the final grammar...
else {
- return _complete_regex($regex.$inherited_rules, $pre_match_debug, $post_match_debug);
+ return _complete_regex($regex.$inherited_rules, $pre_match_debug, $post_match_debug, $nocontext);
}
}
sub _complete_regex {
- my ($regex, $pre_match_debug, $post_match_debug) = @_;
-
- return qq{$pre_match_debug$PROLOGUE$regex$EPILOGUE$post_match_debug};
+ my ($regex, $pre_match_debug, $post_match_debug, $nocontext) = @_;
+
+ return $nocontext ? qq{$pre_match_debug$PROLOGUE$regex$EPILOGUE_NC$post_match_debug}
+ : qq{$pre_match_debug$PROLOGUE$regex$EPILOGUE$post_match_debug};
}
1; # Magic true value required at end of module
@@ -2096,7 +2189,7 @@
=head1 VERSION
-This document describes Regexp::Grammars version 1.011
+This document describes Regexp::Grammars version 1.012
=head1 SYNOPSIS
@@ -2251,6 +2344,8 @@
<log: (?{ CODE }) > Explicitly add a message to the log
<minimize:> Simplify the result of a subrule match
<ws: PATTERN > Override automatic whitespace matching
+ <context:> Switch on context substring retention
+ <nocontext:> Switch off context substring retention
@@ -2593,7 +2688,9 @@
have keys: C<'noun'>, C<'verb'>, and C<'object'>.
In addition each result-hash has one extra key: the empty string. The
-value for this key is whatever string the entire subrule call matched.
+value for this key is whatever substring the entire subrule call matched.
+This value is known as the I<context substring>.
+
So, for example, a successful call to C<< <sentence> >> might add
something like the following to the current result-hash:
@@ -2611,7 +2708,7 @@
Note, however, that if the result-hash at any level contains I<only>
the empty-string key (i.e. the subrule did not call any sub-subrules or
save any of their nested result-hashes), then the hash is "unpacked"
-and just the matched substring itself if returned.
+and just the context substring itself is returned.
For example, if C<< <rule: sentence> >> had been defined:
@@ -2653,6 +2750,71 @@
}
+=head3 Turning off the context substring
+
+The context substring is convenient for debugging and for generating
+error messages but, in a large grammar, or when parsing a long string,
+the capture and storage of many nested substrings may quickly become
+prohibitively expensive.
+
+So Regexp::Grammars provides a directive to prevent context substrings
+from being retained. Any rule or token that includes the directive
+C<< <nocontext:> >> anywhere in the rule's body will not retain any
+context substring it matches...unless that substring would be the only
+entry in its result hash (which only happens within objrules and
+objtokens).
+
+If a C<< <nocontext:> >> directive appears I<before> the first rule or
+token definition (i.e. as part of the main pattern), then the entire grammar
+will discard all context substrings from every one of its rules
+and tokens.
+
+However, you can override this universal prohibition with a second
+directive: C<< <context:> >>. If this directive appears in any rule or
+token, that rule or token I<will> save its context substring, even if a
+global C<< <nocontext:> >> is in effect.
+
+This means that this grammar:
+
+ qr{
+ <Command>
+
+ <rule: Command>
+ <nocontext:>
+ <Keyword> <arg=(\S+)> ** <.ws>
+
+ <token: Keyword>
+ <Move> | <Copy> | <Delete>
+
+ # etc.
+ }x
+
+and this grammar:
+
+ qr{
+ <nocontext:>
+ <Command>
+
+ <rule: Command>
+ <Keyword> <arg=(\S+)> ** <.ws>
+
+ <token: Keyword>
+ <context:>
+ <Move> | <Copy> | <Delete>
+
+ # etc.
+ }x
+
+will behave identically (saving context substrings for keywords, but not
+for commands), except that the first version will also retain the global
+context substring (i.e. $/{""}), whereas the second version will not.
+
+Note that C<< <context:> >> and C<< <nocontext:> >> have no effect on,
+or even any interaction with, the various
+L<result distillation|"Result distillation"> mechanisms,
+which continue to work in the usual way when either or both of the
+directives is used.
+
=head2 Renaming subrule results
@@ -3793,14 +3955,6 @@
If you call a subrule using a fully qualified name (such as
C<< <List::Integral::Digit> >>), the grammar calls that
version of the rule, rather than the most-derived version.
-
-Note, however, that fully qualified subrule calls are I<not> (yet)
-polymorphically dispatched. This means that you cannot currently just
-specify the name some ancestor grammar (or C<SUPER> or C<NEXT> for that
-matter) and have the grammar find the correct inherited version of the
-subrule. At present you must specify the name of the exact ancestor
-grammar whose rule you want. This limitation will be removed in a
-future release.
=head2 Debugging named grammars
@@ -4019,9 +4173,10 @@
})
Note that you can also partially override the subrule return behaviour.
-Normally, the subrule returns the complete text it matched under the "empty
-key" of its result-hash. That is, of course, C<$MATCH{""}>, so you can
-override just that behaviour by directly assigning to that entry.
+Normally, the subrule returns the complete text it matched as its context
+substring (i.e. under the "empty key") in its result-hash. That is, of
+course, C<$MATCH{""}>, so you can override just that behaviour by
+directly assigning to that entry.
For example, if you have a rule that matches key/value pairs from a
configuration file, you might prefer that any trailing comments not be
More information about the Pkg-perl-cvs-commits
mailing list