Initial version in git
This commit is contained in:
commit
8520d2b4bc
16 changed files with 2063 additions and 0 deletions
108
ChangeLog
Normal file
108
ChangeLog
Normal file
|
|
@ -0,0 +1,108 @@
|
|||
2003-08-27 Amit Patel <amitp@cs.stanford.edu>
|
||||
|
||||
* *: (VERSION) Release 2.1.1
|
||||
|
||||
* *: Added a test/ directory for test cases; I had previously put
|
||||
tests in the examples/ directory, which is a bad place to put
|
||||
them. Examples are useful for learning how Yapps works. Tests
|
||||
are for testing specific features of Yapps.
|
||||
|
||||
* parsetree.py (Plus.update): Fixed a long-standing bug in which
|
||||
the FOLLOW set of 'a'+ would include 'a'. In theory this makes no
|
||||
practical difference because the 'a'+ rule eats up all the 'a'
|
||||
tokens anyway. However, it makes error messages a little bit more
|
||||
confusing because they imply that an 'a' can follow.
|
||||
|
||||
* yappsrt.py (print_error): Incorporated the context object into
|
||||
the printing of error messages.
|
||||
|
||||
2003-08-12 Amit Patel <amitp@cs.stanford.edu>
|
||||
|
||||
* *: (VERSION) Release 2.1.0
|
||||
|
||||
* parsetree.py: Improved error message generation. Instead of
|
||||
relying on the scanner to produce errors, the parser now checks
|
||||
things explicitly and produces errors directly. The parser has
|
||||
better knowledge of the context, so its error messages are more
|
||||
precise and helpful.
|
||||
|
||||
* yapps_grammar.g: Instead of setting self.rule in the setup()
|
||||
method, pass it in the constructor. To make it available at
|
||||
construction time, pass it along as another attribute in the
|
||||
attribute grammar.
|
||||
|
||||
2003-08-11 Amit Patel <amitp@cs.stanford.edu>
|
||||
|
||||
* parsetree.py: Generated parsers now include a context object
|
||||
that describes the parse rule stack. For example, while parsing
|
||||
rule A, called from rule B, called from rule D, the context object
|
||||
will let you reconstruct the path D > B > A. [Thanks David Morley]
|
||||
|
||||
* *: Removed all output when things are working
|
||||
properly; all warnings/errors now go to stderr.
|
||||
|
||||
* yapps_grammar.g: Added support for A? meaning an optional A.
|
||||
This is equivalent to [A].
|
||||
|
||||
* yapps2.py: Design - refactored yapps2.py into yapps2.py +
|
||||
grammar.py + parsetree.py. grammar.py is automatically generated
|
||||
from grammar.g. Added lots of docstrings.
|
||||
|
||||
2003-08-09 Amit Patel <amitp@cs.stanford.edu>
|
||||
|
||||
* yapps2.py: Documentation - added doctest tests to some of the
|
||||
set algorithms in class Generator.
|
||||
|
||||
* yapps2.py: Style - removed "import *" everywhere.
|
||||
|
||||
* yapps2.py: Style - moved to Python 2 -- string methods,
|
||||
list comprehensions, inline syntax for apply
|
||||
|
||||
2003-07-28 Amit Patel <amitp@cs.stanford.edu>
|
||||
|
||||
* *: (VERSION) Release 2.0.4
|
||||
|
||||
* yappsrt.py: Style - replaced raising string exceptions
|
||||
with raising class exceptions. [Thanks Alex Verstak]
|
||||
|
||||
* yappsrt.py: (SyntaxError) Bug fix - SyntaxError.__init__ should
|
||||
call Exception.__init__
|
||||
|
||||
* yapps2.py: Bug fix - identifiers in grammar rules that had
|
||||
digits in them were not accessible in the {{python code}} sections
|
||||
of the grammar.
|
||||
|
||||
* yapps2.py: Style - changed "b >= a and b < c" to "a <= b < c"
|
||||
|
||||
* yapps2.py: Style - change "`expr`" to "repr(expr)"
|
||||
|
||||
2002-08-00 Amit Patel <amitp@cs.stanford.edu>
|
||||
|
||||
* *: (VERSION) Release 2.0.3
|
||||
|
||||
* yapps2.py: Bug fix - inline tokens using the r"" syntax weren't
|
||||
treated properly.
|
||||
|
||||
2002-04-00 Amit Patel <amitp@cs.stanford.edu>
|
||||
|
||||
* *: (VERSION) Release 2.0.2
|
||||
|
||||
* yapps2.py: Bug fix - when generating the "else" clause, if the
|
||||
comment was too long, Yapps was not emitting a newline. [Thanks
|
||||
Steven Engelhardt]
|
||||
|
||||
2001-10-00 Amit Patel <amitp@cs.stanford.edu>
|
||||
|
||||
* *: (VERSION) Release 2.0.1
|
||||
|
||||
* yappsrt.py: (SyntaxError) Style - the exception classes now
|
||||
inherit from Exception. [Thanks Rich Salz]
|
||||
|
||||
* yappsrt.py: (Scanner) Performance - instead of passing the set
|
||||
of tokens into the scanner at initialization time, we build the
|
||||
list at compile time. You can still override the default list per
|
||||
instance of the scanner, but in the common case, we don't have to
|
||||
rebuild the token list. [Thanks Amaury Forgeot d'Arc]
|
||||
|
||||
|
||||
|
||||
20
LICENSE
Normal file
20
LICENSE
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
<http://www.opensource.org/licenses/mit-license.php>
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining
|
||||
a copy of this software and associated documentation files (the
|
||||
"Software"), to deal in the Software without restriction, including
|
||||
without limitation the rights to use, copy, modify, merge, publish,
|
||||
distribute, sublicense, and/or sell copies of the Software, and to
|
||||
permit persons to whom the Software is furnished to do so, subject to
|
||||
the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included
|
||||
in all copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
||||
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
|
||||
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
||||
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
||||
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||
78
NOTES
Normal file
78
NOTES
Normal file
|
|
@ -0,0 +1,78 @@
|
|||
[Last updated August 11, 2003]
|
||||
|
||||
Notes for myself:
|
||||
|
||||
Document the LINENO trick
|
||||
|
||||
Add a way to have a self-contained mode that doesn't require yappsrt?
|
||||
|
||||
Add a debugging mode that helps you understand how the grammar
|
||||
is constructed and how things are being parsed
|
||||
|
||||
Optimize (remove) unused variables
|
||||
|
||||
Yapps produces a bunch of inline list literals. We should be able to
|
||||
instead create these lists as class variables (but this makes it
|
||||
harder to read the code). Also, 'A in X' could be written
|
||||
'X.has_key(A)' if we can convert the lists into dictionaries ahead
|
||||
of time.
|
||||
|
||||
Add a convenience to automatically gather up the values returned
|
||||
from subpatterns, put them into a list, and return them
|
||||
|
||||
"Gather" mode that simply outputs the return values for certain nodes.
|
||||
For example, if you just want all expressions, you could ask yapps
|
||||
to gather the results of the 'expr' rule into a list. This would
|
||||
ignore all the higher level structure.
|
||||
|
||||
Improve the documentation
|
||||
|
||||
Write some larger examples (probably XML/HTML)
|
||||
|
||||
EOF needs to be dealt with. It's probably a token that can match anywhere.
|
||||
|
||||
Get rid of old-style regex support
|
||||
|
||||
Use SRE's lex support to speed up lexing (this may be hard given that
|
||||
yapps allows for context-sensitive lexers)
|
||||
|
||||
Look over Dan Connoly's experience with Yapps (bugs, frustrations, etc.)
|
||||
and see what improvements could be made
|
||||
|
||||
Add something to pretty-print the grammar (without the actions)
|
||||
|
||||
Maybe conditionals? Follow this rule only if <condition> holds.
|
||||
But this would be useful mainly when multiple rules match, and we
|
||||
want the first matching rule. The conditional would mean we skip to
|
||||
the next rule. Maybe this is part of the attribute grammar system,
|
||||
where rule X<0> can be specified separately from X<N>.
|
||||
|
||||
Convenience functions that could build return values for all rules
|
||||
without specifying the code for each rule individually
|
||||
|
||||
Patterns (abstractions over rules) -- for example, comma separated values
|
||||
have a certain rule pattern that gets replicated all over the place
|
||||
|
||||
These are rules that take other rules as parameters.
|
||||
|
||||
rule list<separator,element>: {{ result = [] }}
|
||||
[ element {{ result.append(element) }}
|
||||
( separator element {{ result.append(element) }}
|
||||
)*
|
||||
] {{ return result }}
|
||||
|
||||
Inheritance of parser and scanner classes. The base class (Parser)
|
||||
may define common tokens like ID, STR, NUM, space, comments, EOF,
|
||||
etc., and common rules/patterns like optional, sequence,
|
||||
delimiter-separated sequence.
|
||||
|
||||
Why do A? and (A | ) produce different code? It seems that they
|
||||
should produce the very same code.
|
||||
|
||||
Look at everyone's Yapps grammars, and come up with larger examples
|
||||
http://www.w3.org/2000/10/swap/SemEnglish.g
|
||||
http://www.w3.org/2000/10/swap/kifExpr.g
|
||||
http://www.w3.org/2000/10/swap/rdfn3.g
|
||||
|
||||
Construct lots of erroneous grammars and see what Yapps does with them
|
||||
(improve error reporting)
|
||||
64
examples/calc.g
Normal file
64
examples/calc.g
Normal file
|
|
@ -0,0 +1,64 @@
|
|||
# This calculator supports the usual (numbers, add, subtract,
|
||||
# multiply, divide), global variables (stored in a global variable in
|
||||
# Python), and local variables (stored in an attribute passed around
|
||||
# in the grammar).
|
||||
|
||||
|
||||
globalvars = {} # We will store the calculator's variables here
|
||||
|
||||
def lookup(map, name):
|
||||
for x,v in map:
|
||||
if x == name: return v
|
||||
if not globalvars.has_key(name): print 'Undefined (defaulting to 0):', name
|
||||
return globalvars.get(name, 0)
|
||||
|
||||
%%
|
||||
parser Calculator:
|
||||
ignore: "[ \r\t\n]+"
|
||||
token END: "$"
|
||||
token NUM: "[0-9]+"
|
||||
token VAR: "[a-zA-Z_]+"
|
||||
|
||||
# Each line can either be an expression or an assignment statement
|
||||
rule goal: expr<<[]>> END {{ print '=', expr }}
|
||||
{{ return expr }}
|
||||
| "set" VAR expr<<[]>> END {{ globalvars[VAR] = expr }}
|
||||
{{ print VAR, '=', expr }}
|
||||
{{ return expr }}
|
||||
|
||||
# An expression is the sum and difference of factors
|
||||
rule expr<<V>>: factor<<V>> {{ n = factor }}
|
||||
( "[+]" factor<<V>> {{ n = n+factor }}
|
||||
| "-" factor<<V>> {{ n = n-factor }}
|
||||
)* {{ return n }}
|
||||
|
||||
# A factor is the product and division of terms
|
||||
rule factor<<V>>: term<<V>> {{ v = term }}
|
||||
( "[*]" term<<V>> {{ v = v*term }}
|
||||
| "/" term<<V>> {{ v = v/term }}
|
||||
)* {{ return v }}
|
||||
|
||||
# A term is a number, variable, or an expression surrounded by parentheses
|
||||
rule term<<V>>:
|
||||
NUM {{ return atoi(NUM) }}
|
||||
| VAR {{ return lookup(V, VAR) }}
|
||||
| "\\(" expr "\\)" {{ return expr }}
|
||||
| "let" VAR "=" expr<<V>> {{ V = [(VAR, expr)] + V }}
|
||||
"in" expr<<V>> {{ return expr }}
|
||||
%%
|
||||
if __name__=='__main__':
|
||||
print 'Welcome to the calculator sample for Yapps 2.'
|
||||
print ' Enter either "<expression>" or "set <var> <expression>",'
|
||||
print ' or just press return to exit. An expression can have'
|
||||
print ' local variables: let x = expr in expr'
|
||||
# We could have put this loop into the parser, by making the
|
||||
# `goal' rule use (expr | set var expr)*, but by putting the
|
||||
# loop into Python code, we can make it interactive (i.e., enter
|
||||
# one expression, get the result, enter another expression, etc.)
|
||||
while 1:
|
||||
try: s = raw_input('>>> ')
|
||||
except EOFError: break
|
||||
if not strip(s): break
|
||||
parse('goal', s)
|
||||
print 'Bye.'
|
||||
|
||||
21
examples/expr.g
Normal file
21
examples/expr.g
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
parser Calculator:
|
||||
token END: "$"
|
||||
token NUM: "[0-9]+"
|
||||
|
||||
rule goal: expr END {{ return expr }}
|
||||
|
||||
# An expression is the sum and difference of factors
|
||||
rule expr: factor {{ v = factor }}
|
||||
( "[+]" factor {{ v = v+factor }}
|
||||
| "-" factor {{ v = v-factor }}
|
||||
)* {{ return v }}
|
||||
|
||||
# A factor is the product and division of terms
|
||||
rule factor: term {{ v = term }}
|
||||
( "[*]" term {{ v = v*term }}
|
||||
| "/" term {{ v = v/term }}
|
||||
)* {{ return v }}
|
||||
|
||||
# A term is either a number or an expression surrounded by parentheses
|
||||
rule term: NUM {{ return atoi(NUM) }}
|
||||
| "\\(" expr "\\)" {{ return expr }}
|
||||
15
examples/lisp.g
Normal file
15
examples/lisp.g
Normal file
|
|
@ -0,0 +1,15 @@
|
|||
# This parser can parse a simple subset of Lisp's syntax.
|
||||
|
||||
parser Lisp:
|
||||
ignore: r'\s+'
|
||||
token NUM: r'[0-9]+'
|
||||
token ID: r'[-+*/!@$%^&=.a-zA-Z0-9_]+'
|
||||
token STR: r'"([^\\"]+|\\.)*"'
|
||||
|
||||
rule expr: ID {{ return ('id',ID) }}
|
||||
| STR {{ return ('str',eval(STR)) }}
|
||||
| NUM {{ return ('num',int(NUM)) }}
|
||||
| r"\("
|
||||
{{ e = [] }} # initialize the list
|
||||
( expr {{ e.append(expr) }} ) * # put each expr into the list
|
||||
r"\)" {{ return e }} # return the list
|
||||
66
examples/xml.g
Normal file
66
examples/xml.g
Normal file
|
|
@ -0,0 +1,66 @@
|
|||
#!/usr/bin/python2
|
||||
|
||||
# xml.g
|
||||
#
|
||||
# Amit J. Patel, August 2003
|
||||
#
|
||||
# Simple (non-conforming, non-validating) parsing of XML documents,
|
||||
# based on Robert D. Cameron's "REX" shallow parser. It doesn't
|
||||
# handle CDATA and lots of other stuff; it's meant to demonstrate
|
||||
# Yapps, not replace a proper XML parser.
|
||||
|
||||
%%
|
||||
|
||||
parser xml:
|
||||
token nodetext: r'[^<>]+'
|
||||
token attrtext_singlequote: "[^']*"
|
||||
token attrtext_doublequote: '[^"]*'
|
||||
token SP: r'\s'
|
||||
token id: r'[a-zA-Z_:][a-zA-Z0-9_:.-]*'
|
||||
|
||||
rule node:
|
||||
r'<!--.*?-->' {{ return ['!--comment'] }}
|
||||
| r'<!\[CDATA\[.*?\]\]>' {{ return ['![CDATA['] }}
|
||||
| r'<!' SP* id '[^>]*>' {{ return ['!doctype'] }}
|
||||
| '<' SP* id SP* attributes SP* {{ startid = id }}
|
||||
( '>' nodes '</' SP* id SP* '>' {{ assert startid == id, 'Mismatched tags <%s> ... </%s>' % (startid, id) }}
|
||||
{{ return [id, attributes] + nodes }}
|
||||
| '/\s*>' {{ return [id, attributes] }}
|
||||
)
|
||||
| nodetext {{ return nodetext }}
|
||||
|
||||
rule nodes: {{ result = [] }}
|
||||
( node {{ result.append(node) }}
|
||||
) * {{ return result }}
|
||||
|
||||
rule attribute: id SP* '=' SP*
|
||||
( '"' attrtext_doublequote '"' {{ return (id, attrtext_doublequote) }}
|
||||
| "'" attrtext_singlequote "'" {{ return (id, attrtext_singlequote) }}
|
||||
)
|
||||
|
||||
rule attributes: {{ result = {} }}
|
||||
( attribute SP* {{ result[attribute[0]] = attribute[1] }}
|
||||
) * {{ return result }}
|
||||
|
||||
%%
|
||||
|
||||
if __name__ == '__main__':
|
||||
tests = ['<!-- hello -->',
|
||||
'some text',
|
||||
'< bad xml',
|
||||
'<br />',
|
||||
'< spacey a = "foo" / >',
|
||||
'<a href="foo">text ... </a>',
|
||||
'<begin> middle </end>',
|
||||
'<begin> <nested attr=\'baz\' another="hey"> foo </nested> <nested> bar </nested> </begin>',
|
||||
]
|
||||
print
|
||||
print '____Running tests_______________________________________'
|
||||
for test in tests:
|
||||
print
|
||||
try:
|
||||
parser = xml(xmlScanner(test))
|
||||
output = '%s ==> %s' % (repr(test), repr(parser.node()))
|
||||
except (yappsrt.SyntaxError, AssertionError), e:
|
||||
output = '%s ==> FAILED ==> %s' % (repr(test), e)
|
||||
print output
|
||||
234
grammar.py
Normal file
234
grammar.py
Normal file
|
|
@ -0,0 +1,234 @@
|
|||
#!/usr/bin/python2
|
||||
#
|
||||
# grammar.py, part of Yapps 2 - yet another python parser system
|
||||
# Copyright 1999-2003 by Amit J. Patel <amitp@cs.stanford.edu>
|
||||
#
|
||||
# This version of the Yapps 2 grammar can be distributed under the
|
||||
# terms of the MIT open source license, either found in the LICENSE
|
||||
# file included with the Yapps distribution
|
||||
# <http://theory.stanford.edu/~amitp/yapps/> or at
|
||||
# <http://www.opensource.org/licenses/mit-license.php>
|
||||
#
|
||||
|
||||
"""Parser for Yapps grammars.
|
||||
|
||||
This file defines the grammar of Yapps grammars. Naturally, it is
|
||||
implemented in Yapps. The grammar.py module needed by Yapps is built
|
||||
by running Yapps on yapps_grammar.g. (Holy circularity, Batman!)
|
||||
|
||||
"""
|
||||
|
||||
import sys, re
|
||||
import parsetree
|
||||
|
||||
######################################################################
|
||||
def cleanup_choice(rule, lst):
|
||||
if len(lst) == 0: return Sequence(rule, [])
|
||||
if len(lst) == 1: return lst[0]
|
||||
return parsetree.Choice(rule, *tuple(lst))
|
||||
|
||||
def cleanup_sequence(rule, lst):
|
||||
if len(lst) == 1: return lst[0]
|
||||
return parsetree.Sequence(rule, *tuple(lst))
|
||||
|
||||
def resolve_name(rule, tokens, id, args):
|
||||
if id in [x[0] for x in tokens]:
|
||||
# It's a token
|
||||
if args:
|
||||
print 'Warning: ignoring parameters on TOKEN %s<<%s>>' % (id, args)
|
||||
return parsetree.Terminal(rule, id)
|
||||
else:
|
||||
# It's a name, so assume it's a nonterminal
|
||||
return parsetree.NonTerminal(rule, id, args)
|
||||
|
||||
|
||||
# Begin -- grammar generated by Yapps
|
||||
import sys, re
|
||||
import yappsrt
|
||||
|
||||
class ParserDescriptionScanner(yappsrt.Scanner):
|
||||
patterns = [
|
||||
('"rule"', re.compile('rule')),
|
||||
('"ignore"', re.compile('ignore')),
|
||||
('"token"', re.compile('token')),
|
||||
('"option"', re.compile('option')),
|
||||
('":"', re.compile(':')),
|
||||
('"parser"', re.compile('parser')),
|
||||
('[ \t\r\n]+', re.compile('[ \t\r\n]+')),
|
||||
('#.*?\r?\n', re.compile('#.*?\r?\n')),
|
||||
('EOF', re.compile('$')),
|
||||
('ATTR', re.compile('<<.+?>>')),
|
||||
('STMT', re.compile('{{.+?}}')),
|
||||
('ID', re.compile('[a-zA-Z_][a-zA-Z_0-9]*')),
|
||||
('STR', re.compile('[rR]?\'([^\\n\'\\\\]|\\\\.)*\'|[rR]?"([^\\n"\\\\]|\\\\.)*"')),
|
||||
('LP', re.compile('\\(')),
|
||||
('RP', re.compile('\\)')),
|
||||
('LB', re.compile('\\[')),
|
||||
('RB', re.compile('\\]')),
|
||||
('OR', re.compile('[|]')),
|
||||
('STAR', re.compile('[*]')),
|
||||
('PLUS', re.compile('[+]')),
|
||||
('QUEST', re.compile('[?]')),
|
||||
('COLON', re.compile(':')),
|
||||
]
|
||||
def __init__(self, str):
|
||||
yappsrt.Scanner.__init__(self,None,['[ \t\r\n]+', '#.*?\r?\n'],str)
|
||||
|
||||
class ParserDescription(yappsrt.Parser):
|
||||
Context = yappsrt.Context
|
||||
def LINENO(self, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'LINENO', [])
|
||||
return 1 + self._scanner.get_input_scanned().count('\n')
|
||||
|
||||
def Parser(self, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'Parser', [])
|
||||
self._scan('"parser"')
|
||||
ID = self._scan('ID')
|
||||
self._scan('":"')
|
||||
Options = self.Options(_context)
|
||||
Tokens = self.Tokens(_context)
|
||||
Rules = self.Rules(Tokens, _context)
|
||||
EOF = self._scan('EOF')
|
||||
return parsetree.Generator(ID,Options,Tokens,Rules)
|
||||
|
||||
def Options(self, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'Options', [])
|
||||
opt = {}
|
||||
while self._peek() == '"option"':
|
||||
self._scan('"option"')
|
||||
self._scan('":"')
|
||||
Str = self.Str(_context)
|
||||
opt[Str] = 1
|
||||
if self._peek() not in ['"option"', '"token"', '"ignore"', 'EOF', '"rule"']:
|
||||
raise yappsrt.SyntaxError(charpos=self._scanner.get_prev_char_pos(), context=_context, msg='Need one of ' + ', '.join(['"option"', '"token"', '"ignore"', 'EOF', '"rule"']))
|
||||
return opt
|
||||
|
||||
def Tokens(self, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'Tokens', [])
|
||||
tok = []
|
||||
while self._peek() in ['"token"', '"ignore"']:
|
||||
_token = self._peek()
|
||||
if _token == '"token"':
|
||||
self._scan('"token"')
|
||||
ID = self._scan('ID')
|
||||
self._scan('":"')
|
||||
Str = self.Str(_context)
|
||||
tok.append( (ID,Str) )
|
||||
elif _token == '"ignore"':
|
||||
self._scan('"ignore"')
|
||||
self._scan('":"')
|
||||
Str = self.Str(_context)
|
||||
tok.append( ('#ignore',Str) )
|
||||
else:
|
||||
raise yappsrt.SyntaxError(_token[0], 'Could not match Tokens')
|
||||
if self._peek() not in ['"token"', '"ignore"', 'EOF', '"rule"']:
|
||||
raise yappsrt.SyntaxError(charpos=self._scanner.get_prev_char_pos(), context=_context, msg='Need one of ' + ', '.join(['"token"', '"ignore"', 'EOF', '"rule"']))
|
||||
return tok
|
||||
|
||||
def Rules(self, tokens, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'Rules', [tokens])
|
||||
rul = []
|
||||
while self._peek() == '"rule"':
|
||||
LINENO = self.LINENO(_context)
|
||||
self._scan('"rule"')
|
||||
ID = self._scan('ID')
|
||||
OptParam = self.OptParam(_context)
|
||||
self._scan('":"')
|
||||
ClauseA = self.ClauseA(ID, tokens, _context)
|
||||
rul.append( (ID, OptParam, ClauseA) )
|
||||
if self._peek() not in ['"rule"', 'EOF']:
|
||||
raise yappsrt.SyntaxError(charpos=self._scanner.get_prev_char_pos(), context=_context, msg='Need one of ' + ', '.join(['"rule"', 'EOF']))
|
||||
return rul
|
||||
|
||||
def ClauseA(self, rule, tokens, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'ClauseA', [rule, tokens])
|
||||
ClauseB = self.ClauseB(rule, tokens, _context)
|
||||
v = [ClauseB]
|
||||
while self._peek() == 'OR':
|
||||
OR = self._scan('OR')
|
||||
ClauseB = self.ClauseB(rule, tokens, _context)
|
||||
v.append(ClauseB)
|
||||
if self._peek() not in ['OR', 'RP', 'RB', '"rule"', 'EOF']:
|
||||
raise yappsrt.SyntaxError(charpos=self._scanner.get_prev_char_pos(), context=_context, msg='Need one of ' + ', '.join(['OR', 'RP', 'RB', '"rule"', 'EOF']))
|
||||
return cleanup_choice(rule, v)
|
||||
|
||||
def ClauseB(self, rule, tokens, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'ClauseB', [rule, tokens])
|
||||
v = []
|
||||
while self._peek() in ['STR', 'ID', 'LP', 'LB', 'STMT']:
|
||||
ClauseC = self.ClauseC(rule, tokens, _context)
|
||||
v.append(ClauseC)
|
||||
if self._peek() not in ['STR', 'ID', 'LP', 'LB', 'STMT', 'OR', 'RP', 'RB', '"rule"', 'EOF']:
|
||||
raise yappsrt.SyntaxError(charpos=self._scanner.get_prev_char_pos(), context=_context, msg='Need one of ' + ', '.join(['STR', 'ID', 'LP', 'LB', 'STMT', 'OR', 'RP', 'RB', '"rule"', 'EOF']))
|
||||
return cleanup_sequence(rule, v)
|
||||
|
||||
def ClauseC(self, rule, tokens, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'ClauseC', [rule, tokens])
|
||||
ClauseD = self.ClauseD(rule, tokens, _context)
|
||||
_token = self._peek()
|
||||
if _token == 'PLUS':
|
||||
PLUS = self._scan('PLUS')
|
||||
return parsetree.Plus(rule, ClauseD)
|
||||
elif _token == 'STAR':
|
||||
STAR = self._scan('STAR')
|
||||
return parsetree.Star(rule, ClauseD)
|
||||
elif _token == 'QUEST':
|
||||
QUEST = self._scan('QUEST')
|
||||
return parsetree.Option(rule, ClauseD)
|
||||
elif _token not in ['"ignore"', '"token"', '"option"', '":"', '"parser"', 'ATTR', 'COLON']:
|
||||
return ClauseD
|
||||
else:
|
||||
raise yappsrt.SyntaxError(_token[0], 'Could not match ClauseC')
|
||||
|
||||
def ClauseD(self, rule, tokens, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'ClauseD', [rule, tokens])
|
||||
_token = self._peek()
|
||||
if _token == 'STR':
|
||||
STR = self._scan('STR')
|
||||
t = (STR, eval(STR,{},{}))
|
||||
if t not in tokens: tokens.insert( 0, t )
|
||||
return parsetree.Terminal(rule, STR)
|
||||
elif _token == 'ID':
|
||||
ID = self._scan('ID')
|
||||
OptParam = self.OptParam(_context)
|
||||
return resolve_name(rule, tokens, ID, OptParam)
|
||||
elif _token == 'LP':
|
||||
LP = self._scan('LP')
|
||||
ClauseA = self.ClauseA(rule, tokens, _context)
|
||||
RP = self._scan('RP')
|
||||
return ClauseA
|
||||
elif _token == 'LB':
|
||||
LB = self._scan('LB')
|
||||
ClauseA = self.ClauseA(rule, tokens, _context)
|
||||
RB = self._scan('RB')
|
||||
return parsetree.Option(rule, ClauseA)
|
||||
elif _token == 'STMT':
|
||||
STMT = self._scan('STMT')
|
||||
return parsetree.Eval(rule, STMT[2:-2])
|
||||
else:
|
||||
raise yappsrt.SyntaxError(_token[0], 'Could not match ClauseD')
|
||||
|
||||
def OptParam(self, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'OptParam', [])
|
||||
_token = self._peek()
|
||||
if _token == 'ATTR':
|
||||
ATTR = self._scan('ATTR')
|
||||
return ATTR[2:-2]
|
||||
elif _token not in ['"ignore"', '"token"', '"option"', '"parser"', 'COLON']:
|
||||
return ''
|
||||
else:
|
||||
raise yappsrt.SyntaxError(_token[0], 'Could not match OptParam')
|
||||
|
||||
def Str(self, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'Str', [])
|
||||
STR = self._scan('STR')
|
||||
return eval(STR,{},{})
|
||||
|
||||
|
||||
def parse(rule, text):
|
||||
P = ParserDescription(ParserDescriptionScanner(text))
|
||||
return yappsrt.wrap_error_reporter(P, rule)
|
||||
|
||||
# End -- grammar generated by Yapps
|
||||
|
||||
|
||||
645
parsetree.py
Normal file
645
parsetree.py
Normal file
|
|
@ -0,0 +1,645 @@
|
|||
#!/usr/bin/python2
|
||||
#
|
||||
# parsetree.py, part of Yapps 2 - yet another python parser system
|
||||
# Copyright 1999-2003 by Amit J. Patel <amitp@cs.stanford.edu>
|
||||
#
|
||||
# This version of the Yapps 2 Runtime can be distributed under the
|
||||
# terms of the MIT open source license, either found in the LICENSE file
|
||||
# included with the Yapps distribution
|
||||
# <http://theory.stanford.edu/~amitp/yapps/> or at
|
||||
# <http://www.opensource.org/licenses/mit-license.php>
|
||||
#
|
||||
|
||||
"""Classes used to represent parse trees and generate output.
|
||||
|
||||
This module defines the Generator class, which drives the generation
|
||||
of Python output from a grammar parse tree. It also defines nodes
|
||||
used to represent the parse tree; they are derived from class Node.
|
||||
|
||||
The main logic of Yapps is in this module.
|
||||
"""
|
||||
|
||||
import sys, re
|
||||
|
||||
######################################################################
|
||||
INDENT = ' '*4
|
||||
class Generator:
|
||||
|
||||
# TODO: many of the methods here should be class methods, not instance methods
|
||||
|
||||
def __init__(self, name, options, tokens, rules):
|
||||
self.change_count = 0
|
||||
self.name = name
|
||||
self.options = options
|
||||
self.preparser = ''
|
||||
self.postparser = None
|
||||
|
||||
self.tokens = {} # Map from tokens to regexps
|
||||
self.ignore = [] # List of token names to ignore in parsing
|
||||
self.terminals = [] # List of token names (to maintain ordering)
|
||||
for n, t in tokens:
|
||||
if n == '#ignore':
|
||||
n = t
|
||||
self.ignore.append(n)
|
||||
if n in self.tokens.keys() and self.tokens[n] != t:
|
||||
print >>sys.stderr, 'Warning: token %s defined more than once.' % n
|
||||
self.tokens[n] = t
|
||||
self.terminals.append(n)
|
||||
|
||||
self.rules = {} # Map from rule names to parser nodes
|
||||
self.params = {} # Map from rule names to parameters
|
||||
self.goals = [] # List of rule names (to maintain ordering)
|
||||
for n,p,r in rules:
|
||||
self.params[n] = p
|
||||
self.rules[n] = r
|
||||
self.goals.append(n)
|
||||
|
||||
self.output = sys.stdout
|
||||
|
||||
def has_option(self, name):
|
||||
return self.options.get(name, 0)
|
||||
|
||||
def non_ignored_tokens(self):
|
||||
return [x for x in self.terminals if x not in self.ignore]
|
||||
|
||||
def changed(self):
|
||||
"""Increments the change count.
|
||||
|
||||
>>> t = Generator('', [], [], [])
|
||||
>>> old_count = t.change_count
|
||||
>>> t.changed()
|
||||
>>> assert t.change_count == old_count + 1
|
||||
"""
|
||||
self.change_count = 1+self.change_count
|
||||
|
||||
def set_subtract(self, a, b):
|
||||
"""Returns the elements of a that are not in b.
|
||||
|
||||
>>> t = Generator('', [], [], [])
|
||||
>>> t.set_subtract([], [])
|
||||
[]
|
||||
>>> t.set_subtract([1, 2], [1, 2])
|
||||
[]
|
||||
>>> t.set_subtract([1, 2, 3], [2])
|
||||
[1, 3]
|
||||
>>> t.set_subtract([1], [2, 3, 4])
|
||||
[1]
|
||||
"""
|
||||
result = []
|
||||
for x in a:
|
||||
if x not in b:
|
||||
result.append(x)
|
||||
return result
|
||||
|
||||
def subset(self, a, b):
|
||||
"""True iff all elements of sequence a are inside sequence b
|
||||
|
||||
>>> t = Generator('', [], [], [])
|
||||
>>> t.subset([], [1, 2, 3])
|
||||
1
|
||||
>>> t.subset([1, 2, 3], [])
|
||||
0
|
||||
>>> t.subset([1], [1, 2, 3])
|
||||
1
|
||||
>>> t.subset([3, 2, 1], [1, 2, 3])
|
||||
1
|
||||
>>> t.subset([1, 1, 1], [1, 2, 3])
|
||||
1
|
||||
>>> t.subset([1, 2, 3], [1, 1, 1])
|
||||
0
|
||||
"""
|
||||
for x in a:
|
||||
if x not in b:
|
||||
return 0
|
||||
return 1
|
||||
|
||||
def equal_set(self, a, b):
|
||||
"""True iff subset(a, b) and subset(b, a)
|
||||
|
||||
>>> t = Generator('', [], [], [])
|
||||
>>> a_set = [1, 2, 3]
|
||||
>>> t.equal_set(a_set, a_set)
|
||||
1
|
||||
>>> t.equal_set(a_set, a_set[:])
|
||||
1
|
||||
>>> t.equal_set([], a_set)
|
||||
0
|
||||
>>> t.equal_set([1, 2, 3], [3, 2, 1])
|
||||
1
|
||||
"""
|
||||
if len(a) != len(b): return 0
|
||||
if a == b: return 1
|
||||
return self.subset(a, b) and self.subset(b, a)
|
||||
|
||||
def add_to(self, parent, additions):
|
||||
"Modify _parent_ to include all elements in _additions_"
|
||||
for x in additions:
|
||||
if x not in parent:
|
||||
parent.append(x)
|
||||
self.changed()
|
||||
|
||||
def equate(self, a, b):
|
||||
"""Extend (a) and (b) so that they contain each others' elements.
|
||||
|
||||
>>> t = Generator('', [], [], [])
|
||||
>>> a = [1, 2]
|
||||
>>> b = [2, 3]
|
||||
>>> t.equate(a, b)
|
||||
>>> a
|
||||
[1, 2, 3]
|
||||
>>> b
|
||||
[2, 3, 1]
|
||||
"""
|
||||
self.add_to(a, b)
|
||||
self.add_to(b, a)
|
||||
|
||||
def write(self, *args):
|
||||
for a in args:
|
||||
self.output.write(a)
|
||||
|
||||
def in_test(self, expr, full, set):
|
||||
"""Generate a test of (expr) being in (set), where (set) is a subset of (full)
|
||||
|
||||
expr is a string (Python expression)
|
||||
set is a list of values (which will be converted with repr)
|
||||
full is the list of all values expr could possibly evaluate to
|
||||
|
||||
>>> t = Generator('', [], [], [])
|
||||
>>> t.in_test('x', [1,2,3,4], [])
|
||||
'0'
|
||||
>>> t.in_test('x', [1,2,3,4], [1,2,3,4])
|
||||
'1'
|
||||
>>> t.in_test('x', [1,2,3,4], [1])
|
||||
'x == 1'
|
||||
>>> t.in_test('a+b', [1,2,3,4], [1,2])
|
||||
'a+b in [1, 2]'
|
||||
>>> t.in_test('x', [1,2,3,4,5], [1,2,3])
|
||||
'x not in [4, 5]'
|
||||
>>> t.in_test('x', [1,2,3,4,5], [1,2,3,4])
|
||||
'x != 5'
|
||||
"""
|
||||
|
||||
if not set: return '0'
|
||||
if len(set) == 1: return '%s == %s' % (expr, repr(set[0]))
|
||||
if full and len(set) > len(full)/2:
|
||||
# Reverse the sense of the test.
|
||||
not_set = [x for x in full if x not in set]
|
||||
return self.not_in_test(expr, full, not_set)
|
||||
return '%s in %s' % (expr, repr(set))
|
||||
|
||||
def not_in_test(self, expr, full, set):
|
||||
"""Like in_test, but the reverse test."""
|
||||
if not set: return '1'
|
||||
if len(set) == 1: return '%s != %s' % (expr, repr(set[0]))
|
||||
return '%s not in %s' % (expr, repr(set))
|
||||
|
||||
def peek_call(self, a):
|
||||
"""Generate a call to scan for a token in the set 'a'"""
|
||||
assert type(a) == type([])
|
||||
a_set = (repr(a)[1:-1])
|
||||
if self.equal_set(a, self.non_ignored_tokens()): a_set = ''
|
||||
if self.has_option('context-insensitive-scanner'): a_set = ''
|
||||
return 'self._peek(%s)' % a_set
|
||||
|
||||
def peek_test(self, a, b):
|
||||
"""Generate a call to test whether the next token (which could be any of
|
||||
the elements in a) is in the set b."""
|
||||
if self.subset(a, b): return '1'
|
||||
if self.has_option('context-insensitive-scanner'): a = self.non_ignored_tokens()
|
||||
return self.in_test(self.peek_call(a), a, b)
|
||||
|
||||
def not_peek_test(self, a, b):
|
||||
"""Like peek_test, but the opposite sense."""
|
||||
if self.subset(a, b): return '0'
|
||||
return self.not_in_test(self.peek_call(a), a, b)
|
||||
|
||||
def calculate(self):
|
||||
"""The main loop to compute the epsilon, first, follow sets.
|
||||
The loop continues until the sets converge. This works because
|
||||
each set can only get larger, so when they stop getting larger,
|
||||
we're done."""
|
||||
# First we determine whether a rule accepts epsilon (the empty sequence)
|
||||
while 1:
|
||||
for r in self.goals:
|
||||
self.rules[r].setup(self)
|
||||
if self.change_count == 0: break
|
||||
self.change_count = 0
|
||||
|
||||
# Now we compute the first/follow sets
|
||||
while 1:
|
||||
for r in self.goals:
|
||||
self.rules[r].update(self)
|
||||
if self.change_count == 0: break
|
||||
self.change_count = 0
|
||||
|
||||
def dump_information(self):
|
||||
"""Display the grammar in somewhat human-readable form."""
|
||||
self.calculate()
|
||||
for r in self.goals:
|
||||
print ' _____' + '_'*len(r)
|
||||
print ('___/Rule '+r+'\\' + '_'*80)[:79]
|
||||
queue = [self.rules[r]]
|
||||
while queue:
|
||||
top = queue[0]
|
||||
del queue[0]
|
||||
|
||||
print 'Rule', repr(top), 'of class', top.__class__.__name__
|
||||
top.first.sort()
|
||||
top.follow.sort()
|
||||
eps = []
|
||||
if top.accepts_epsilon: eps = ['(null)']
|
||||
print ' FIRST:', ', '.join(top.first+eps)
|
||||
print ' FOLLOW:', ', '.join(top.follow)
|
||||
for x in top.get_children(): queue.append(x)
|
||||
|
||||
def generate_output(self):
|
||||
self.calculate()
|
||||
self.write(self.preparser)
|
||||
self.write("# Begin -- grammar generated by Yapps\n")
|
||||
self.write("import sys, re\n")
|
||||
self.write("import yappsrt\n")
|
||||
self.write("\n")
|
||||
self.write("class ", self.name, "Scanner(yappsrt.Scanner):\n")
|
||||
self.write(" patterns = [\n")
|
||||
for p in self.terminals:
|
||||
self.write(" (%s, re.compile(%s)),\n" % (
|
||||
repr(p), repr(self.tokens[p])))
|
||||
self.write(" ]\n")
|
||||
self.write(" def __init__(self, str):\n")
|
||||
self.write(" yappsrt.Scanner.__init__(self,None,%s,str)\n" %
|
||||
repr(self.ignore))
|
||||
self.write("\n")
|
||||
|
||||
self.write("class ", self.name, "(yappsrt.Parser):\n")
|
||||
self.write(INDENT, "Context = yappsrt.Context\n")
|
||||
for r in self.goals:
|
||||
self.write(INDENT, "def ", r, "(self")
|
||||
if self.params[r]: self.write(", ", self.params[r])
|
||||
self.write(", _parent=None):\n")
|
||||
self.write(INDENT+INDENT, "_context = self.Context(_parent, self._scanner, self._pos, %s, [%s])\n" %
|
||||
(repr(r), self.params.get(r, '')))
|
||||
self.rules[r].output(self, INDENT+INDENT)
|
||||
self.write("\n")
|
||||
|
||||
self.write("\n")
|
||||
self.write("def parse(rule, text):\n")
|
||||
self.write(" P = ", self.name, "(", self.name, "Scanner(text))\n")
|
||||
self.write(" return yappsrt.wrap_error_reporter(P, rule)\n")
|
||||
self.write("\n")
|
||||
if self.postparser is not None:
|
||||
self.write("# End -- grammar generated by Yapps\n")
|
||||
self.write(self.postparser)
|
||||
else:
|
||||
self.write("if __name__ == '__main__':\n")
|
||||
self.write(INDENT, "from sys import argv, stdin\n")
|
||||
self.write(INDENT, "if len(argv) >= 2:\n")
|
||||
self.write(INDENT*2, "if len(argv) >= 3:\n")
|
||||
self.write(INDENT*3, "f = open(argv[2],'r')\n")
|
||||
self.write(INDENT*2, "else:\n")
|
||||
self.write(INDENT*3, "f = stdin\n")
|
||||
self.write(INDENT*2, "print parse(argv[1], f.read())\n")
|
||||
self.write(INDENT, "else: print >>sys.stderr, 'Args: <rule> [<filename>]'\n")
|
||||
self.write("# End -- grammar generated by Yapps\n")
|
||||
|
||||
######################################################################
|
||||
class Node:
|
||||
"""This is the base class for all components of a grammar."""
|
||||
def __init__(self, rule):
|
||||
self.rule = rule # name of the rule containing this node
|
||||
self.first = []
|
||||
self.follow = []
|
||||
self.accepts_epsilon = 0
|
||||
|
||||
def setup(self, gen):
|
||||
# Setup will change accepts_epsilon,
|
||||
# sometimes from 0 to 1 but never 1 to 0.
|
||||
# It will take a finite number of steps to set things up
|
||||
pass
|
||||
|
||||
def used(self, vars):
|
||||
"Return two lists: one of vars used, and the other of vars assigned"
|
||||
return vars, []
|
||||
|
||||
def get_children(self):
|
||||
"Return a list of sub-nodes"
|
||||
return []
|
||||
|
||||
def __repr__(self):
|
||||
return str(self)
|
||||
|
||||
def update(self, gen):
|
||||
if self.accepts_epsilon:
|
||||
gen.add_to(self.first, self.follow)
|
||||
|
||||
def output(self, gen, indent):
|
||||
"Write out code to _gen_ with _indent_:string indentation"
|
||||
gen.write(indent, "assert 0 # Invalid parser node\n")
|
||||
|
||||
class Terminal(Node):
|
||||
"""This class stores terminal nodes, which are tokens."""
|
||||
def __init__(self, rule, token):
|
||||
Node.__init__(self, rule)
|
||||
self.token = token
|
||||
self.accepts_epsilon = 0
|
||||
|
||||
def __str__(self):
|
||||
return self.token
|
||||
|
||||
def update(self, gen):
|
||||
Node.update(self, gen)
|
||||
if self.first != [self.token]:
|
||||
self.first = [self.token]
|
||||
gen.changed()
|
||||
|
||||
def output(self, gen, indent):
|
||||
gen.write(indent)
|
||||
if re.match('[a-zA-Z_][a-zA-Z_0-9]*$', self.token):
|
||||
gen.write(self.token, " = ")
|
||||
gen.write("self._scan(%s)\n" % repr(self.token))
|
||||
|
||||
class Eval(Node):
|
||||
"""This class stores evaluation nodes, from {{ ... }} clauses."""
|
||||
def __init__(self, rule, expr):
|
||||
Node.__init__(self, rule)
|
||||
self.expr = expr
|
||||
|
||||
def setup(self, gen):
|
||||
Node.setup(self, gen)
|
||||
if not self.accepts_epsilon:
|
||||
self.accepts_epsilon = 1
|
||||
gen.changed()
|
||||
|
||||
def __str__(self):
|
||||
return '{{ %s }}' % self.expr.strip()
|
||||
|
||||
def output(self, gen, indent):
|
||||
gen.write(indent, self.expr.strip(), '\n')
|
||||
|
||||
class NonTerminal(Node):
|
||||
"""This class stores nonterminal nodes, which are rules with arguments."""
|
||||
def __init__(self, rule, name, args):
|
||||
Node.__init__(self, rule)
|
||||
self.name = name
|
||||
self.args = args
|
||||
|
||||
def setup(self, gen):
|
||||
Node.setup(self, gen)
|
||||
try:
|
||||
self.target = gen.rules[self.name]
|
||||
if self.accepts_epsilon != self.target.accepts_epsilon:
|
||||
self.accepts_epsilon = self.target.accepts_epsilon
|
||||
gen.changed()
|
||||
except KeyError: # Oops, it's nonexistent
|
||||
print >>sys.stderr, 'Error: no rule <%s>' % self.name
|
||||
self.target = self
|
||||
|
||||
def __str__(self):
|
||||
return '%s' % self.name
|
||||
|
||||
def update(self, gen):
|
||||
Node.update(self, gen)
|
||||
gen.equate(self.first, self.target.first)
|
||||
gen.equate(self.follow, self.target.follow)
|
||||
|
||||
def output(self, gen, indent):
|
||||
gen.write(indent)
|
||||
gen.write(self.name, " = ")
|
||||
args = self.args
|
||||
if args: args += ', '
|
||||
args += '_context'
|
||||
gen.write("self.", self.name, "(", args, ")\n")
|
||||
|
||||
class Sequence(Node):
|
||||
"""This class stores a sequence of nodes (A B C ...)"""
|
||||
def __init__(self, rule, *children):
|
||||
Node.__init__(self, rule)
|
||||
self.children = children
|
||||
|
||||
def setup(self, gen):
|
||||
Node.setup(self, gen)
|
||||
for c in self.children: c.setup(gen)
|
||||
|
||||
if not self.accepts_epsilon:
|
||||
# If it's not already accepting epsilon, it might now do so.
|
||||
for c in self.children:
|
||||
# any non-epsilon means all is non-epsilon
|
||||
if not c.accepts_epsilon: break
|
||||
else:
|
||||
self.accepts_epsilon = 1
|
||||
gen.changed()
|
||||
|
||||
def get_children(self):
|
||||
return self.children
|
||||
|
||||
def __str__(self):
|
||||
return '( %s )' % ' '.join(map(str, self.children))
|
||||
|
||||
def update(self, gen):
|
||||
Node.update(self, gen)
|
||||
for g in self.children:
|
||||
g.update(gen)
|
||||
|
||||
empty = 1
|
||||
for g_i in range(len(self.children)):
|
||||
g = self.children[g_i]
|
||||
|
||||
if empty: gen.add_to(self.first, g.first)
|
||||
if not g.accepts_epsilon: empty = 0
|
||||
|
||||
if g_i == len(self.children)-1:
|
||||
next = self.follow
|
||||
else:
|
||||
next = self.children[1+g_i].first
|
||||
gen.add_to(g.follow, next)
|
||||
|
||||
if self.children:
|
||||
gen.add_to(self.follow, self.children[-1].follow)
|
||||
|
||||
def output(self, gen, indent):
|
||||
if self.children:
|
||||
for c in self.children:
|
||||
c.output(gen, indent)
|
||||
else:
|
||||
# Placeholder for empty sequences, just in case
|
||||
gen.write(indent, 'pass\n')
|
||||
|
||||
class Choice(Node):
|
||||
"""This class stores a choice between nodes (A | B | C | ...)"""
|
||||
def __init__(self, rule, *children):
|
||||
Node.__init__(self, rule)
|
||||
self.children = children
|
||||
|
||||
def setup(self, gen):
|
||||
Node.setup(self, gen)
|
||||
for c in self.children: c.setup(gen)
|
||||
|
||||
if not self.accepts_epsilon:
|
||||
for c in self.children:
|
||||
if c.accepts_epsilon:
|
||||
self.accepts_epsilon = 1
|
||||
gen.changed()
|
||||
|
||||
def get_children(self):
|
||||
return self.children
|
||||
|
||||
def __str__(self):
|
||||
return '( %s )' % ' | '.join(map(str, self.children))
|
||||
|
||||
def update(self, gen):
|
||||
Node.update(self, gen)
|
||||
for g in self.children:
|
||||
g.update(gen)
|
||||
|
||||
for g in self.children:
|
||||
gen.add_to(self.first, g.first)
|
||||
gen.add_to(self.follow, g.follow)
|
||||
for g in self.children:
|
||||
gen.add_to(g.follow, self.follow)
|
||||
if self.accepts_epsilon:
|
||||
gen.add_to(self.first, self.follow)
|
||||
|
||||
def output(self, gen, indent):
|
||||
test = "if"
|
||||
gen.write(indent, "_token = ", gen.peek_call(self.first), "\n")
|
||||
tokens_seen = []
|
||||
tokens_unseen = self.first[:]
|
||||
if gen.has_option('context-insensitive-scanner'):
|
||||
# Context insensitive scanners can return ANY token,
|
||||
# not only the ones in first.
|
||||
tokens_unseen = gen.non_ignored_tokens()
|
||||
for c in self.children:
|
||||
testset = c.first[:]
|
||||
removed = []
|
||||
for x in testset:
|
||||
if x in tokens_seen:
|
||||
testset.remove(x)
|
||||
removed.append(x)
|
||||
if x in tokens_unseen: tokens_unseen.remove(x)
|
||||
tokens_seen = tokens_seen + testset
|
||||
if removed:
|
||||
if not testset:
|
||||
print >>sys.stderr, 'Error in rule', self.rule+':'
|
||||
else:
|
||||
print >>sys.stderr, 'Warning in rule', self.rule+':'
|
||||
print >>sys.stderr, ' *', self
|
||||
print >>sys.stderr, ' * These tokens could be matched by more than one clause:'
|
||||
print >>sys.stderr, ' *', ' '.join(removed)
|
||||
|
||||
if testset:
|
||||
if not tokens_unseen: # context sensitive scanners only!
|
||||
if test == 'if':
|
||||
# if it's the first AND last test, then
|
||||
# we can simply put the code without an if/else
|
||||
c.output(gen, indent)
|
||||
else:
|
||||
gen.write(indent, "else:")
|
||||
t = gen.in_test('', [], testset)
|
||||
if len(t) < 70-len(indent):
|
||||
gen.write(' #', t)
|
||||
gen.write("\n")
|
||||
c.output(gen, indent+INDENT)
|
||||
else:
|
||||
gen.write(indent, test, " ",
|
||||
gen.in_test('_token', tokens_unseen, testset),
|
||||
":\n")
|
||||
c.output(gen, indent+INDENT)
|
||||
test = "elif"
|
||||
|
||||
if tokens_unseen:
|
||||
gen.write(indent, "else:\n")
|
||||
gen.write(indent, INDENT, "raise yappsrt.SyntaxError(_token[0], ")
|
||||
gen.write("'Could not match ", self.rule, "')\n")
|
||||
|
||||
class Wrapper(Node):
|
||||
"""This is a base class for nodes that modify a single child."""
|
||||
def __init__(self, rule, child):
|
||||
Node.__init__(self, rule)
|
||||
self.child = child
|
||||
|
||||
def setup(self, gen):
|
||||
Node.setup(self, gen)
|
||||
self.child.setup(gen)
|
||||
|
||||
def get_children(self):
|
||||
return [self.child]
|
||||
|
||||
def update(self, gen):
|
||||
Node.update(self, gen)
|
||||
self.child.update(gen)
|
||||
gen.add_to(self.first, self.child.first)
|
||||
gen.equate(self.follow, self.child.follow)
|
||||
|
||||
class Option(Wrapper):
|
||||
"""This class represents an optional clause of the form [A]"""
|
||||
def setup(self, gen):
|
||||
Wrapper.setup(self, gen)
|
||||
if not self.accepts_epsilon:
|
||||
self.accepts_epsilon = 1
|
||||
gen.changed()
|
||||
|
||||
def __str__(self):
|
||||
return '[ %s ]' % str(self.child)
|
||||
|
||||
def output(self, gen, indent):
|
||||
if self.child.accepts_epsilon:
|
||||
print >>sys.stderr, 'Warning in rule', self.rule+': contents may be empty.'
|
||||
gen.write(indent, "if %s:\n" %
|
||||
gen.peek_test(self.first, self.child.first))
|
||||
self.child.output(gen, indent+INDENT)
|
||||
|
||||
class Plus(Wrapper):
|
||||
"""This class represents a 1-or-more repetition clause of the form A+"""
|
||||
def setup(self, gen):
|
||||
Wrapper.setup(self, gen)
|
||||
if self.accepts_epsilon != self.child.accepts_epsilon:
|
||||
self.accepts_epsilon = self.child.accepts_epsilon
|
||||
gen.changed()
|
||||
|
||||
def __str__(self):
|
||||
return '%s+' % str(self.child)
|
||||
|
||||
def update(self, gen):
|
||||
Wrapper.update(self, gen)
|
||||
gen.add_to(self.child.follow, self.child.first)
|
||||
|
||||
def output(self, gen, indent):
|
||||
if self.child.accepts_epsilon:
|
||||
print >>sys.stderr, 'Warning in rule', self.rule+':'
|
||||
print >>sys.stderr, ' * The repeated pattern could be empty. The resulting parser may not work properly.'
|
||||
gen.write(indent, "while 1:\n")
|
||||
self.child.output(gen, indent+INDENT)
|
||||
union = self.first[:]
|
||||
gen.add_to(union, self.follow)
|
||||
gen.write(indent+INDENT, "if %s: break\n" %
|
||||
gen.not_peek_test(union, self.child.first))
|
||||
|
||||
class Star(Wrapper):
|
||||
"""This class represents a 0-or-more repetition clause of the form A*"""
|
||||
def setup(self, gen):
|
||||
Wrapper.setup(self, gen)
|
||||
if not self.accepts_epsilon:
|
||||
self.accepts_epsilon = 1
|
||||
gen.changed()
|
||||
|
||||
def __str__(self):
|
||||
return '%s*' % str(self.child)
|
||||
|
||||
def update(self, gen):
|
||||
Wrapper.update(self, gen)
|
||||
gen.add_to(self.child.follow, self.child.first)
|
||||
|
||||
def output(self, gen, indent):
|
||||
if self.child.accepts_epsilon:
|
||||
print >>sys.stderr, 'Warning in rule', self.rule+':'
|
||||
print >>sys.stderr, ' * The repeated pattern could be empty. The resulting parser probably will not work properly.'
|
||||
gen.write(indent, "while %s:\n" %
|
||||
gen.peek_test(self.follow, self.child.first))
|
||||
self.child.output(gen, indent+INDENT)
|
||||
|
||||
# TODO: need to generate tests like this in lots of rules
|
||||
# TODO: do we need to do this only when it's a context-insensitive scanner?
|
||||
gen.write(indent, "if %s:\n" %
|
||||
gen.not_peek_test(gen.non_ignored_tokens(), self.follow))
|
||||
gen.write(indent+INDENT, "raise yappsrt.SyntaxError(charpos=self._scanner.get_prev_char_pos(), context=_context, msg='Need one of ' + ', '.join(%s))\n" %
|
||||
repr(self.first))
|
||||
|
||||
10
test/empty_clauses.g
Normal file
10
test/empty_clauses.g
Normal file
|
|
@ -0,0 +1,10 @@
|
|||
# This parser tests the use of OR clauses with one of them being empty
|
||||
#
|
||||
# The output of --dump should indicate the FOLLOW set for (A | ) is 'c'.
|
||||
|
||||
parser Test:
|
||||
rule TestPlus: ( A | ) 'c'
|
||||
rule A: 'a'+
|
||||
|
||||
rule TestStar: ( B | ) 'c'
|
||||
rule B: 'b'*
|
||||
10
test/line_numbers.g
Normal file
10
test/line_numbers.g
Normal file
|
|
@ -0,0 +1,10 @@
|
|||
#
|
||||
# The error messages produced by Yapps have a line number.
|
||||
# The line number should take the Python code section into account.
|
||||
|
||||
# The line number should be 10.
|
||||
|
||||
%%
|
||||
|
||||
parser error_1:
|
||||
this_is_an_error;
|
||||
17
test/option.g
Normal file
17
test/option.g
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
|
||||
%%
|
||||
|
||||
parser test_option:
|
||||
ignore: r'\s+'
|
||||
token a: 'a'
|
||||
token b: 'b'
|
||||
token EOF: r'$'
|
||||
|
||||
rule test_brackets: a [b] EOF
|
||||
|
||||
rule test_question_mark: a b? EOF
|
||||
|
||||
%%
|
||||
|
||||
# The generated code for test_brackets and test_question_mark should
|
||||
# be the same.
|
||||
111
yapps2.py
Executable file
111
yapps2.py
Executable file
|
|
@ -0,0 +1,111 @@
|
|||
#!/usr/bin/python2
|
||||
|
||||
#
|
||||
# Yapps 2 - yet another python parser system
|
||||
# Copyright 1999-2003 by Amit J. Patel <amitp@cs.stanford.edu>
|
||||
#
|
||||
# This version of Yapps 2 can be distributed under the
|
||||
# terms of the MIT open source license, either found in the LICENSE file
|
||||
# included with the Yapps distribution
|
||||
# <http://theory.stanford.edu/~amitp/yapps/> or at
|
||||
# <http://www.opensource.org/licenses/mit-license.php>
|
||||
#
|
||||
|
||||
import sys, re
|
||||
|
||||
import yappsrt, parsetree
|
||||
|
||||
def generate(inputfilename, outputfilename='', dump=0, **flags):
|
||||
"""Generate a grammar, given an input filename (X.g)
|
||||
and an output filename (defaulting to X.py)."""
|
||||
|
||||
if not outputfilename:
|
||||
if inputfilename.endswith('.g'):
|
||||
outputfilename = inputfilename[:-2] + '.py'
|
||||
else:
|
||||
raise Exception('Must specify output filename if input filename is not *.g')
|
||||
|
||||
DIVIDER = '\n%%\n' # This pattern separates the pre/post parsers
|
||||
preparser, postparser = None, None # Code before and after the parser desc
|
||||
|
||||
# Read the entire file
|
||||
s = open(inputfilename,'r').read()
|
||||
|
||||
# See if there's a separation between the pre-parser and parser
|
||||
f = s.find(DIVIDER)
|
||||
if f >= 0: preparser, s = s[:f]+'\n\n', s[f+len(DIVIDER):]
|
||||
|
||||
# See if there's a separation between the parser and post-parser
|
||||
f = s.find(DIVIDER)
|
||||
if f >= 0: s, postparser = s[:f], '\n\n'+s[f+len(DIVIDER):]
|
||||
|
||||
# Create the parser and scanner and parse the text
|
||||
scanner = grammar.ParserDescriptionScanner(s)
|
||||
if preparser: scanner.first_line_number = 1 + preparser.count('\n')
|
||||
parser = grammar.ParserDescription(scanner)
|
||||
t = yappsrt.wrap_error_reporter(parser, 'Parser')
|
||||
if t is None: return # Failure
|
||||
if preparser is not None: t.preparser = preparser
|
||||
if postparser is not None: t.postparser = postparser
|
||||
|
||||
# Check the options
|
||||
for f in t.options.keys():
|
||||
for opt,_,_ in yapps_options:
|
||||
if f == opt: break
|
||||
else:
|
||||
print >>sys.stderr, 'Warning: unrecognized option', f
|
||||
# Add command line options to the set
|
||||
for f in flags.keys(): t.options[f] = flags[f]
|
||||
|
||||
# Generate the output
|
||||
if dump:
|
||||
t.dump_information()
|
||||
else:
|
||||
t.output = open(outputfilename, 'w')
|
||||
t.generate_output()
|
||||
|
||||
if __name__ == '__main__':
|
||||
import doctest
|
||||
doctest.testmod(sys.modules['__main__'])
|
||||
doctest.testmod(parsetree)
|
||||
|
||||
# Someday I will use optparse, but Python 2.3 is too new at the moment.
|
||||
yapps_options = [
|
||||
('context-insensitive-scanner',
|
||||
'context-insensitive-scanner',
|
||||
'Scan all tokens (see docs)'),
|
||||
]
|
||||
|
||||
import getopt
|
||||
optlist, args = getopt.getopt(sys.argv[1:], 'f:', ['help', 'dump', 'use-devel-grammar'])
|
||||
if not args or len(args) > 2:
|
||||
print >>sys.stderr, 'Usage:'
|
||||
print >>sys.stderr, ' python', sys.argv[0], '[flags] input.g [output.py]'
|
||||
print >>sys.stderr, 'Flags:'
|
||||
print >>sys.stderr, (' --dump' + ' '*40)[:35] + 'Dump out grammar information'
|
||||
print >>sys.stderr, (' --use-devel-grammar' + ' '*40)[:35] + 'Use the devel grammar parser from yapps_grammar.py instead of the stable grammar from grammar.py'
|
||||
for flag, _, doc in yapps_options:
|
||||
print >>sys.stderr, (' -f' + flag + ' '*40)[:35] + doc
|
||||
else:
|
||||
# Read in the options and create a list of flags
|
||||
flags = {}
|
||||
use_devel_grammar = 0
|
||||
for opt in optlist:
|
||||
for flag, name, _ in yapps_options:
|
||||
if opt == ('-f', flag):
|
||||
flags[name] = 1
|
||||
break
|
||||
else:
|
||||
if opt == ('--dump', ''):
|
||||
flags['dump'] = 1
|
||||
elif opt == ('--use-devel-grammar', ''):
|
||||
use_devel_grammar = 1
|
||||
else:
|
||||
print >>sys.stderr, 'Warning: unrecognized option', opt[0], opt[1]
|
||||
|
||||
if use_devel_grammar:
|
||||
import yapps_grammar as grammar
|
||||
else:
|
||||
import grammar
|
||||
|
||||
generate(*tuple(args), **flags)
|
||||
126
yapps_grammar.g
Executable file
126
yapps_grammar.g
Executable file
|
|
@ -0,0 +1,126 @@
|
|||
#!/usr/bin/python2
|
||||
#
|
||||
# grammar.py, part of Yapps 2 - yet another python parser system
|
||||
# Copyright 1999-2003 by Amit J. Patel <amitp@cs.stanford.edu>
|
||||
#
|
||||
# This version of the Yapps 2 grammar can be distributed under the
|
||||
# terms of the MIT open source license, either found in the LICENSE
|
||||
# file included with the Yapps distribution
|
||||
# <http://theory.stanford.edu/~amitp/yapps/> or at
|
||||
# <http://www.opensource.org/licenses/mit-license.php>
|
||||
#
|
||||
|
||||
"""Parser for Yapps grammars.
|
||||
|
||||
This file defines the grammar of Yapps grammars. Naturally, it is
|
||||
implemented in Yapps. The grammar.py module needed by Yapps is built
|
||||
by running Yapps on yapps_grammar.g. (Holy circularity, Batman!)
|
||||
|
||||
"""
|
||||
|
||||
import sys, re
|
||||
import parsetree
|
||||
|
||||
######################################################################
|
||||
def cleanup_choice(rule, lst):
|
||||
if len(lst) == 0: return Sequence(rule, [])
|
||||
if len(lst) == 1: return lst[0]
|
||||
return parsetree.Choice(rule, *tuple(lst))
|
||||
|
||||
def cleanup_sequence(rule, lst):
|
||||
if len(lst) == 1: return lst[0]
|
||||
return parsetree.Sequence(rule, *tuple(lst))
|
||||
|
||||
def resolve_name(rule, tokens, id, args):
|
||||
if id in [x[0] for x in tokens]:
|
||||
# It's a token
|
||||
if args:
|
||||
print 'Warning: ignoring parameters on TOKEN %s<<%s>>' % (id, args)
|
||||
return parsetree.Terminal(rule, id)
|
||||
else:
|
||||
# It's a name, so assume it's a nonterminal
|
||||
return parsetree.NonTerminal(rule, id, args)
|
||||
|
||||
%%
|
||||
parser ParserDescription:
|
||||
option: "context-insensitive-scanner"
|
||||
|
||||
ignore: "[ \t\r\n]+"
|
||||
ignore: "#.*?\r?\n"
|
||||
token EOF: "$"
|
||||
token ATTR: "<<.+?>>"
|
||||
token STMT: "{{.+?}}"
|
||||
token ID: '[a-zA-Z_][a-zA-Z_0-9]*'
|
||||
token STR: '[rR]?\'([^\\n\'\\\\]|\\\\.)*\'|[rR]?"([^\\n"\\\\]|\\\\.)*"'
|
||||
token LP: '\\('
|
||||
token RP: '\\)'
|
||||
token LB: '\\['
|
||||
token RB: '\\]'
|
||||
token OR: '[|]'
|
||||
token STAR: '[*]'
|
||||
token PLUS: '[+]'
|
||||
token QUEST: '[?]'
|
||||
token COLON: ':'
|
||||
|
||||
rule LINENO: # This is a pseudotoken. It matches nothing; returns the line number
|
||||
{{ return 1 + self._scanner.get_input_scanned().count('\n') }}
|
||||
|
||||
rule Parser: "parser" ID ":"
|
||||
Options
|
||||
Tokens
|
||||
Rules<<Tokens>>
|
||||
EOF
|
||||
{{ return parsetree.Generator(ID,Options,Tokens,Rules) }}
|
||||
|
||||
rule Options: {{ opt = {} }}
|
||||
( "option" ":" Str {{ opt[Str] = 1 }} )*
|
||||
{{ return opt }}
|
||||
|
||||
rule Tokens: {{ tok = [] }}
|
||||
(
|
||||
"token" ID ":" Str {{ tok.append( (ID,Str) ) }}
|
||||
| "ignore" ":" Str {{ tok.append( ('#ignore',Str) ) }}
|
||||
)*
|
||||
{{ return tok }}
|
||||
|
||||
rule Rules<<tokens>>:
|
||||
{{ rul = [] }}
|
||||
( LINENO
|
||||
"rule" ID OptParam ":" ClauseA<<ID, tokens>>
|
||||
# TODO: save LINENO somewhere?
|
||||
{{ rul.append( (ID, OptParam, ClauseA) ) }}
|
||||
)*
|
||||
{{ return rul }}
|
||||
|
||||
rule ClauseA<<rule, tokens>>:
|
||||
ClauseB<<rule, tokens>>
|
||||
{{ v = [ClauseB] }}
|
||||
( OR ClauseB<<rule, tokens>> {{ v.append(ClauseB) }} )*
|
||||
{{ return cleanup_choice(rule, v) }}
|
||||
|
||||
rule ClauseB<<rule, tokens>>:
|
||||
{{ v = [] }}
|
||||
( ClauseC<<rule, tokens>> {{ v.append(ClauseC) }} )*
|
||||
{{ return cleanup_sequence(rule, v) }}
|
||||
|
||||
rule ClauseC<<rule, tokens>>:
|
||||
ClauseD<<rule, tokens>>
|
||||
( PLUS {{ return parsetree.Plus(rule, ClauseD) }}
|
||||
| STAR {{ return parsetree.Star(rule, ClauseD) }}
|
||||
| QUEST {{ return parsetree.Option(rule, ClauseD) }}
|
||||
| {{ return ClauseD }} )
|
||||
|
||||
rule ClauseD<<rule, tokens>>:
|
||||
STR {{ t = (STR, eval(STR,{},{})) }}
|
||||
{{ if t not in tokens: tokens.insert( 0, t ) }}
|
||||
{{ return parsetree.Terminal(rule, STR) }}
|
||||
| ID OptParam {{ return resolve_name(rule, tokens, ID, OptParam) }}
|
||||
| LP ClauseA<<rule, tokens>> RP {{ return ClauseA }}
|
||||
| LB ClauseA<<rule, tokens>> RB {{ return parsetree.Option(rule, ClauseA) }}
|
||||
| STMT {{ return parsetree.Eval(rule, STMT[2:-2]) }}
|
||||
|
||||
rule OptParam:
|
||||
ATTR {{ return ATTR[2:-2] }}
|
||||
| {{ return '' }}
|
||||
rule Str: STR {{ return eval(STR,{},{}) }}
|
||||
%%
|
||||
234
yapps_grammar.py
Normal file
234
yapps_grammar.py
Normal file
|
|
@ -0,0 +1,234 @@
|
|||
#!/usr/bin/python2
|
||||
#
|
||||
# grammar.py, part of Yapps 2 - yet another python parser system
|
||||
# Copyright 1999-2003 by Amit J. Patel <amitp@cs.stanford.edu>
|
||||
#
|
||||
# This version of the Yapps 2 grammar can be distributed under the
|
||||
# terms of the MIT open source license, either found in the LICENSE
|
||||
# file included with the Yapps distribution
|
||||
# <http://theory.stanford.edu/~amitp/yapps/> or at
|
||||
# <http://www.opensource.org/licenses/mit-license.php>
|
||||
#
|
||||
|
||||
"""Parser for Yapps grammars.
|
||||
|
||||
This file defines the grammar of Yapps grammars. Naturally, it is
|
||||
implemented in Yapps. The grammar.py module needed by Yapps is built
|
||||
by running Yapps on yapps_grammar.g. (Holy circularity, Batman!)
|
||||
|
||||
"""
|
||||
|
||||
import sys, re
|
||||
import parsetree
|
||||
|
||||
######################################################################
|
||||
def cleanup_choice(rule, lst):
|
||||
if len(lst) == 0: return Sequence(rule, [])
|
||||
if len(lst) == 1: return lst[0]
|
||||
return parsetree.Choice(rule, *tuple(lst))
|
||||
|
||||
def cleanup_sequence(rule, lst):
|
||||
if len(lst) == 1: return lst[0]
|
||||
return parsetree.Sequence(rule, *tuple(lst))
|
||||
|
||||
def resolve_name(rule, tokens, id, args):
|
||||
if id in [x[0] for x in tokens]:
|
||||
# It's a token
|
||||
if args:
|
||||
print 'Warning: ignoring parameters on TOKEN %s<<%s>>' % (id, args)
|
||||
return parsetree.Terminal(rule, id)
|
||||
else:
|
||||
# It's a name, so assume it's a nonterminal
|
||||
return parsetree.NonTerminal(rule, id, args)
|
||||
|
||||
|
||||
# Begin -- grammar generated by Yapps
|
||||
import sys, re
|
||||
import yappsrt
|
||||
|
||||
class ParserDescriptionScanner(yappsrt.Scanner):
|
||||
patterns = [
|
||||
('"rule"', re.compile('rule')),
|
||||
('"ignore"', re.compile('ignore')),
|
||||
('"token"', re.compile('token')),
|
||||
('"option"', re.compile('option')),
|
||||
('":"', re.compile(':')),
|
||||
('"parser"', re.compile('parser')),
|
||||
('[ \t\r\n]+', re.compile('[ \t\r\n]+')),
|
||||
('#.*?\r?\n', re.compile('#.*?\r?\n')),
|
||||
('EOF', re.compile('$')),
|
||||
('ATTR', re.compile('<<.+?>>')),
|
||||
('STMT', re.compile('{{.+?}}')),
|
||||
('ID', re.compile('[a-zA-Z_][a-zA-Z_0-9]*')),
|
||||
('STR', re.compile('[rR]?\'([^\\n\'\\\\]|\\\\.)*\'|[rR]?"([^\\n"\\\\]|\\\\.)*"')),
|
||||
('LP', re.compile('\\(')),
|
||||
('RP', re.compile('\\)')),
|
||||
('LB', re.compile('\\[')),
|
||||
('RB', re.compile('\\]')),
|
||||
('OR', re.compile('[|]')),
|
||||
('STAR', re.compile('[*]')),
|
||||
('PLUS', re.compile('[+]')),
|
||||
('QUEST', re.compile('[?]')),
|
||||
('COLON', re.compile(':')),
|
||||
]
|
||||
def __init__(self, str):
|
||||
yappsrt.Scanner.__init__(self,None,['[ \t\r\n]+', '#.*?\r?\n'],str)
|
||||
|
||||
class ParserDescription(yappsrt.Parser):
|
||||
Context = yappsrt.Context
|
||||
def LINENO(self, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'LINENO', [])
|
||||
return 1 + self._scanner.get_input_scanned().count('\n')
|
||||
|
||||
def Parser(self, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'Parser', [])
|
||||
self._scan('"parser"')
|
||||
ID = self._scan('ID')
|
||||
self._scan('":"')
|
||||
Options = self.Options(_context)
|
||||
Tokens = self.Tokens(_context)
|
||||
Rules = self.Rules(Tokens, _context)
|
||||
EOF = self._scan('EOF')
|
||||
return parsetree.Generator(ID,Options,Tokens,Rules)
|
||||
|
||||
def Options(self, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'Options', [])
|
||||
opt = {}
|
||||
while self._peek() == '"option"':
|
||||
self._scan('"option"')
|
||||
self._scan('":"')
|
||||
Str = self.Str(_context)
|
||||
opt[Str] = 1
|
||||
if self._peek() not in ['"option"', '"token"', '"ignore"', 'EOF', '"rule"']:
|
||||
raise yappsrt.SyntaxError(charpos=self._scanner.get_prev_char_pos(), context=_context, msg='Need one of ' + ', '.join(['"option"', '"token"', '"ignore"', 'EOF', '"rule"']))
|
||||
return opt
|
||||
|
||||
def Tokens(self, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'Tokens', [])
|
||||
tok = []
|
||||
while self._peek() in ['"token"', '"ignore"']:
|
||||
_token = self._peek()
|
||||
if _token == '"token"':
|
||||
self._scan('"token"')
|
||||
ID = self._scan('ID')
|
||||
self._scan('":"')
|
||||
Str = self.Str(_context)
|
||||
tok.append( (ID,Str) )
|
||||
elif _token == '"ignore"':
|
||||
self._scan('"ignore"')
|
||||
self._scan('":"')
|
||||
Str = self.Str(_context)
|
||||
tok.append( ('#ignore',Str) )
|
||||
else:
|
||||
raise yappsrt.SyntaxError(_token[0], 'Could not match Tokens')
|
||||
if self._peek() not in ['"token"', '"ignore"', 'EOF', '"rule"']:
|
||||
raise yappsrt.SyntaxError(charpos=self._scanner.get_prev_char_pos(), context=_context, msg='Need one of ' + ', '.join(['"token"', '"ignore"', 'EOF', '"rule"']))
|
||||
return tok
|
||||
|
||||
def Rules(self, tokens, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'Rules', [tokens])
|
||||
rul = []
|
||||
while self._peek() == '"rule"':
|
||||
LINENO = self.LINENO(_context)
|
||||
self._scan('"rule"')
|
||||
ID = self._scan('ID')
|
||||
OptParam = self.OptParam(_context)
|
||||
self._scan('":"')
|
||||
ClauseA = self.ClauseA(ID, tokens, _context)
|
||||
rul.append( (ID, OptParam, ClauseA) )
|
||||
if self._peek() not in ['"rule"', 'EOF']:
|
||||
raise yappsrt.SyntaxError(charpos=self._scanner.get_prev_char_pos(), context=_context, msg='Need one of ' + ', '.join(['"rule"', 'EOF']))
|
||||
return rul
|
||||
|
||||
def ClauseA(self, rule, tokens, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'ClauseA', [rule, tokens])
|
||||
ClauseB = self.ClauseB(rule, tokens, _context)
|
||||
v = [ClauseB]
|
||||
while self._peek() == 'OR':
|
||||
OR = self._scan('OR')
|
||||
ClauseB = self.ClauseB(rule, tokens, _context)
|
||||
v.append(ClauseB)
|
||||
if self._peek() not in ['OR', 'RP', 'RB', '"rule"', 'EOF']:
|
||||
raise yappsrt.SyntaxError(charpos=self._scanner.get_prev_char_pos(), context=_context, msg='Need one of ' + ', '.join(['OR', 'RP', 'RB', '"rule"', 'EOF']))
|
||||
return cleanup_choice(rule, v)
|
||||
|
||||
def ClauseB(self, rule, tokens, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'ClauseB', [rule, tokens])
|
||||
v = []
|
||||
while self._peek() in ['STR', 'ID', 'LP', 'LB', 'STMT']:
|
||||
ClauseC = self.ClauseC(rule, tokens, _context)
|
||||
v.append(ClauseC)
|
||||
if self._peek() not in ['STR', 'ID', 'LP', 'LB', 'STMT', 'OR', 'RP', 'RB', '"rule"', 'EOF']:
|
||||
raise yappsrt.SyntaxError(charpos=self._scanner.get_prev_char_pos(), context=_context, msg='Need one of ' + ', '.join(['STR', 'ID', 'LP', 'LB', 'STMT', 'OR', 'RP', 'RB', '"rule"', 'EOF']))
|
||||
return cleanup_sequence(rule, v)
|
||||
|
||||
def ClauseC(self, rule, tokens, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'ClauseC', [rule, tokens])
|
||||
ClauseD = self.ClauseD(rule, tokens, _context)
|
||||
_token = self._peek()
|
||||
if _token == 'PLUS':
|
||||
PLUS = self._scan('PLUS')
|
||||
return parsetree.Plus(rule, ClauseD)
|
||||
elif _token == 'STAR':
|
||||
STAR = self._scan('STAR')
|
||||
return parsetree.Star(rule, ClauseD)
|
||||
elif _token == 'QUEST':
|
||||
QUEST = self._scan('QUEST')
|
||||
return parsetree.Option(rule, ClauseD)
|
||||
elif _token not in ['"ignore"', '"token"', '"option"', '":"', '"parser"', 'ATTR', 'COLON']:
|
||||
return ClauseD
|
||||
else:
|
||||
raise yappsrt.SyntaxError(_token[0], 'Could not match ClauseC')
|
||||
|
||||
def ClauseD(self, rule, tokens, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'ClauseD', [rule, tokens])
|
||||
_token = self._peek()
|
||||
if _token == 'STR':
|
||||
STR = self._scan('STR')
|
||||
t = (STR, eval(STR,{},{}))
|
||||
if t not in tokens: tokens.insert( 0, t )
|
||||
return parsetree.Terminal(rule, STR)
|
||||
elif _token == 'ID':
|
||||
ID = self._scan('ID')
|
||||
OptParam = self.OptParam(_context)
|
||||
return resolve_name(rule, tokens, ID, OptParam)
|
||||
elif _token == 'LP':
|
||||
LP = self._scan('LP')
|
||||
ClauseA = self.ClauseA(rule, tokens, _context)
|
||||
RP = self._scan('RP')
|
||||
return ClauseA
|
||||
elif _token == 'LB':
|
||||
LB = self._scan('LB')
|
||||
ClauseA = self.ClauseA(rule, tokens, _context)
|
||||
RB = self._scan('RB')
|
||||
return parsetree.Option(rule, ClauseA)
|
||||
elif _token == 'STMT':
|
||||
STMT = self._scan('STMT')
|
||||
return parsetree.Eval(rule, STMT[2:-2])
|
||||
else:
|
||||
raise yappsrt.SyntaxError(_token[0], 'Could not match ClauseD')
|
||||
|
||||
def OptParam(self, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'OptParam', [])
|
||||
_token = self._peek()
|
||||
if _token == 'ATTR':
|
||||
ATTR = self._scan('ATTR')
|
||||
return ATTR[2:-2]
|
||||
elif _token not in ['"ignore"', '"token"', '"option"', '"parser"', 'COLON']:
|
||||
return ''
|
||||
else:
|
||||
raise yappsrt.SyntaxError(_token[0], 'Could not match OptParam')
|
||||
|
||||
def Str(self, _parent=None):
|
||||
_context = self.Context(_parent, self._scanner, self._pos, 'Str', [])
|
||||
STR = self._scan('STR')
|
||||
return eval(STR,{},{})
|
||||
|
||||
|
||||
def parse(rule, text):
|
||||
P = ParserDescription(ParserDescriptionScanner(text))
|
||||
return yappsrt.wrap_error_reporter(P, rule)
|
||||
|
||||
# End -- grammar generated by Yapps
|
||||
|
||||
|
||||
304
yappsrt.py
Normal file
304
yappsrt.py
Normal file
|
|
@ -0,0 +1,304 @@
|
|||
#
|
||||
# Yapps 2 Runtime, part of Yapps 2 - yet another python parser system
|
||||
# Copyright 1999-2003 by Amit J. Patel <amitp@cs.stanford.edu>
|
||||
#
|
||||
# This version of the Yapps 2 Runtime can be distributed under the
|
||||
# terms of the MIT open source license, either found in the LICENSE file
|
||||
# included with the Yapps distribution
|
||||
# <http://theory.stanford.edu/~amitp/yapps/> or at
|
||||
# <http://www.opensource.org/licenses/mit-license.php>
|
||||
#
|
||||
|
||||
"""Run time libraries needed to run parsers generated by Yapps.
|
||||
|
||||
This module defines parse-time exception classes, a scanner class, a
|
||||
base class for parsers produced by Yapps, and a context class that
|
||||
keeps track of the parse stack.
|
||||
|
||||
"""
|
||||
|
||||
# TODO: it should be possible to embed yappsrt into the generated
|
||||
# grammar to make a standalone module.
|
||||
|
||||
import sys, re
|
||||
|
||||
class SyntaxError(Exception):
|
||||
"""When we run into an unexpected token, this is the exception to use"""
|
||||
def __init__(self, charpos=-1, msg="Bad Token", context=None):
|
||||
Exception.__init__(self)
|
||||
self.charpos = charpos
|
||||
self.msg = msg
|
||||
self.context = context
|
||||
|
||||
def __str__(self):
|
||||
if self.charpos < 0: return 'SyntaxError'
|
||||
else: return 'SyntaxError@char%s(%s)' % (repr(self.charpos), self.msg)
|
||||
|
||||
class NoMoreTokens(Exception):
|
||||
"""Another exception object, for when we run out of tokens"""
|
||||
pass
|
||||
|
||||
class Scanner:
|
||||
"""Yapps scanner.
|
||||
|
||||
The Yapps scanner can work in context sensitive or context
|
||||
insensitive modes. The token(i) method is used to retrieve the
|
||||
i-th token. It takes a restrict set that limits the set of tokens
|
||||
it is allowed to return. In context sensitive mode, this restrict
|
||||
set guides the scanner. In context insensitive mode, there is no
|
||||
restriction (the set is always the full set of tokens).
|
||||
|
||||
"""
|
||||
|
||||
def __init__(self, patterns, ignore, input):
|
||||
"""Initialize the scanner.
|
||||
|
||||
Parameters:
|
||||
patterns : [(terminal, uncompiled regex), ...] or None
|
||||
ignore : [terminal,...]
|
||||
input : string
|
||||
|
||||
If patterns is None, we assume that the subclass has
|
||||
defined self.patterns : [(terminal, compiled regex), ...].
|
||||
Note that the patterns parameter expects uncompiled regexes,
|
||||
whereas the self.patterns field expects compiled regexes.
|
||||
"""
|
||||
self.tokens = [] # [(begin char pos, end char pos, token name, matched text), ...]
|
||||
self.restrictions = []
|
||||
self.input = input
|
||||
self.pos = 0
|
||||
self.ignore = ignore
|
||||
self.first_line_number = 1
|
||||
|
||||
if patterns is not None:
|
||||
# Compile the regex strings into regex objects
|
||||
self.patterns = []
|
||||
for terminal, regex in patterns:
|
||||
self.patterns.append( (terminal, re.compile(regex)) )
|
||||
|
||||
def get_token_pos(self):
|
||||
"""Get the current token position in the input text."""
|
||||
return len(self.tokens)
|
||||
|
||||
def get_char_pos(self):
|
||||
"""Get the current char position in the input text."""
|
||||
return self.pos
|
||||
|
||||
def get_prev_char_pos(self, i=None):
|
||||
"""Get the previous position (one token back) in the input text."""
|
||||
if self.pos == 0: return 0
|
||||
if i is None: i = -1
|
||||
return self.tokens[i][0]
|
||||
|
||||
def get_line_number(self):
|
||||
"""Get the line number of the current position in the input text."""
|
||||
# TODO: make this work at any token/char position
|
||||
return self.first_line_number + self.get_input_scanned().count('\n')
|
||||
|
||||
def get_column_number(self):
|
||||
"""Get the column number of the current position in the input text."""
|
||||
s = self.get_input_scanned()
|
||||
i = s.rfind('\n') # may be -1, but that's okay in this case
|
||||
return len(s) - (i+1)
|
||||
|
||||
def get_input_scanned(self):
|
||||
"""Get the portion of the input that has been tokenized."""
|
||||
return self.input[:self.pos]
|
||||
|
||||
def get_input_unscanned(self):
|
||||
"""Get the portion of the input that has not yet been tokenized."""
|
||||
return self.input[self.pos:]
|
||||
|
||||
def token(self, i, restrict=None):
|
||||
"""Get the i'th token in the input.
|
||||
|
||||
If i is one past the end, then scan for another token.
|
||||
|
||||
Args:
|
||||
|
||||
restrict : [token, ...] or None; if restrict is None, then any
|
||||
token is allowed. You may call token(i) more than once.
|
||||
However, the restrict set may never be larger than what was
|
||||
passed in on the first call to token(i).
|
||||
|
||||
"""
|
||||
if i == len(self.tokens):
|
||||
self.scan(restrict)
|
||||
if i < len(self.tokens):
|
||||
# Make sure the restriction is more restricted. This
|
||||
# invariant is needed to avoid ruining tokenization at
|
||||
# position i+1 and higher.
|
||||
if restrict and self.restrictions[i]:
|
||||
for r in restrict:
|
||||
if r not in self.restrictions[i]:
|
||||
raise NotImplementedError("Unimplemented: restriction set changed")
|
||||
return self.tokens[i]
|
||||
raise NoMoreTokens()
|
||||
|
||||
def __repr__(self):
|
||||
"""Print the last 10 tokens that have been scanned in"""
|
||||
output = ''
|
||||
for t in self.tokens[-10:]:
|
||||
output = '%s\n (@%s) %s = %s' % (output,t[0],t[2],repr(t[3]))
|
||||
return output
|
||||
|
||||
def scan(self, restrict):
|
||||
"""Should scan another token and add it to the list, self.tokens,
|
||||
and add the restriction to self.restrictions"""
|
||||
# Keep looking for a token, ignoring any in self.ignore
|
||||
while 1:
|
||||
# Search the patterns for the longest match, with earlier
|
||||
# tokens in the list having preference
|
||||
best_match = -1
|
||||
best_pat = '(error)'
|
||||
for p, regexp in self.patterns:
|
||||
# First check to see if we're ignoring this token
|
||||
if restrict and p not in restrict and p not in self.ignore:
|
||||
continue
|
||||
m = regexp.match(self.input, self.pos)
|
||||
if m and len(m.group(0)) > best_match:
|
||||
# We got a match that's better than the previous one
|
||||
best_pat = p
|
||||
best_match = len(m.group(0))
|
||||
|
||||
# If we didn't find anything, raise an error
|
||||
if best_pat == '(error)' and best_match < 0:
|
||||
msg = 'Bad Token'
|
||||
if restrict:
|
||||
msg = 'Trying to find one of '+', '.join(restrict)
|
||||
raise SyntaxError(self.pos, msg)
|
||||
|
||||
# If we found something that isn't to be ignored, return it
|
||||
if best_pat not in self.ignore:
|
||||
# Create a token with this data
|
||||
token = (self.pos, self.pos+best_match, best_pat,
|
||||
self.input[self.pos:self.pos+best_match])
|
||||
self.pos = self.pos + best_match
|
||||
# Only add this token if it's not in the list
|
||||
# (to prevent looping)
|
||||
if not self.tokens or token != self.tokens[-1]:
|
||||
self.tokens.append(token)
|
||||
self.restrictions.append(restrict)
|
||||
return
|
||||
else:
|
||||
# This token should be ignored ..
|
||||
self.pos = self.pos + best_match
|
||||
|
||||
class Parser:
|
||||
"""Base class for Yapps-generated parsers.
|
||||
|
||||
"""
|
||||
|
||||
def __init__(self, scanner):
|
||||
self._scanner = scanner
|
||||
self._pos = 0
|
||||
|
||||
def _peek(self, *types):
|
||||
"""Returns the token type for lookahead; if there are any args
|
||||
then the list of args is the set of token types to allow"""
|
||||
tok = self._scanner.token(self._pos, types)
|
||||
return tok[2]
|
||||
|
||||
def _scan(self, type):
|
||||
"""Returns the matched text, and moves to the next token"""
|
||||
tok = self._scanner.token(self._pos, [type])
|
||||
if tok[2] != type:
|
||||
raise SyntaxError(tok[0], 'Trying to find '+type+' :'+ ' ,'.join(self._scanner.restrictions[self._pos]))
|
||||
self._pos = 1 + self._pos
|
||||
return tok[3]
|
||||
|
||||
class Context:
|
||||
"""Class to represent the parser's call stack.
|
||||
|
||||
Every rule creates a Context that links to its parent rule. The
|
||||
contexts can be used for debugging.
|
||||
|
||||
"""
|
||||
|
||||
def __init__(self, parent, scanner, tokenpos, rule, args=()):
|
||||
"""Create a new context.
|
||||
|
||||
Args:
|
||||
parent: Context object or None
|
||||
scanner: Scanner object
|
||||
pos: integer (scanner token position)
|
||||
rule: string (name of the rule)
|
||||
args: tuple listing parameters to the rule
|
||||
|
||||
"""
|
||||
self.parent = parent
|
||||
self.scanner = scanner
|
||||
self.tokenpos = tokenpos
|
||||
self.rule = rule
|
||||
self.args = args
|
||||
|
||||
def __str__(self):
|
||||
output = ''
|
||||
if self.parent: output = str(self.parent) + ' > '
|
||||
output += self.rule
|
||||
return output
|
||||
|
||||
def print_line_with_pointer(text, p):
|
||||
"""Print the line of 'text' that includes position 'p',
|
||||
along with a second line with a single caret (^) at position p"""
|
||||
|
||||
# TODO: separate out the logic for determining the line/character
|
||||
# location from the logic for determining how to display an
|
||||
# 80-column line to stderr.
|
||||
|
||||
# Now try printing part of the line
|
||||
text = text[max(p-80, 0):p+80]
|
||||
p = p - max(p-80, 0)
|
||||
|
||||
# Strip to the left
|
||||
i = text[:p].rfind('\n')
|
||||
j = text[:p].rfind('\r')
|
||||
if i < 0 or (0 <= j < i): i = j
|
||||
if 0 <= i < p:
|
||||
p = p - i - 1
|
||||
text = text[i+1:]
|
||||
|
||||
# Strip to the right
|
||||
i = text.find('\n', p)
|
||||
j = text.find('\r', p)
|
||||
if i < 0 or (0 <= j < i): i = j
|
||||
if i >= 0:
|
||||
text = text[:i]
|
||||
|
||||
# Now shorten the text
|
||||
while len(text) > 70 and p > 60:
|
||||
# Cut off 10 chars
|
||||
text = "..." + text[10:]
|
||||
p = p - 7
|
||||
|
||||
# Now print the string, along with an indicator
|
||||
print >>sys.stderr, '> ',text
|
||||
print >>sys.stderr, '> ',' '*p + '^'
|
||||
|
||||
def print_error(input, err, scanner):
|
||||
"""Print error messages, the parser stack, and the input text -- for human-readable error messages."""
|
||||
# NOTE: this function assumes 80 columns :-(
|
||||
# Figure out the line number
|
||||
line_number = scanner.get_line_number()
|
||||
column_number = scanner.get_column_number()
|
||||
print >>sys.stderr, '%d:%d: %s' % (line_number, column_number, err.msg)
|
||||
|
||||
context = err.context
|
||||
if not context:
|
||||
print_line_with_pointer(input, err.charpos)
|
||||
|
||||
while context:
|
||||
# TODO: add line number
|
||||
print >>sys.stderr, 'while parsing %s%s:' % (context.rule, tuple(context.args))
|
||||
print_line_with_pointer(input, context.scanner.get_prev_char_pos(context.tokenpos))
|
||||
context = context.parent
|
||||
|
||||
def wrap_error_reporter(parser, rule):
|
||||
try:
|
||||
return getattr(parser, rule)()
|
||||
except SyntaxError, e:
|
||||
input = parser._scanner.input
|
||||
print_error(input, e, parser._scanner)
|
||||
except NoMoreTokens:
|
||||
print >>sys.stderr, 'Could not complete parsing; stopped around here:'
|
||||
print >>sys.stderr, parser._scanner
|
||||
Loading…
Reference in a new issue