• R/O
  • SSH

Commit

Tags
No Tags

Frequently used words (click to add to your profile)

javac++androidlinuxc#windowsobjective-ccocoa誰得qtpythonphprubygameguibathyscaphec計画中(planning stage)翻訳omegatframeworktwitterdomtestvb.netdirectxゲームエンジンbtronarduinopreviewer

Commit MetaInfo

Revisiónea1a174e665a21e9a912a60fedbc773f9383c2af (tree)
Tiempo2022-10-27 20:06:19
AutorAlbert Mietus < albert AT mietus DOT nl >
CommiterAlbert Mietus < albert AT mietus DOT nl >

Log Message

QuickNote: PEGEN (more)

Cambiar Resumen

Diferencia incremental

diff -r 5f1b122c9ce7 -r ea1a174e665a CCastle/short/index.rst
--- a/CCastle/short/index.rst Sun Oct 23 22:47:01 2022 +0200
+++ b/CCastle/short/index.rst Thu Oct 27 13:06:19 2022 +0200
@@ -1,5 +1,5 @@
11 ================
2-Some short blogs
2+Some Quick Blogs
33 ================
44
55 .. toctree::
diff -r 5f1b122c9ce7 -r ea1a174e665a CCastle/short/pegen_parser.rst
--- a/CCastle/short/pegen_parser.rst Sun Oct 23 22:47:01 2022 +0200
+++ b/CCastle/short/pegen_parser.rst Thu Oct 27 13:06:19 2022 +0200
@@ -1,31 +1,39 @@
1+.. include:: /std/localtoc.irst
2+
13 ================
2-The PEGEN parser
4+QuickNote: PEGEN
35 ================
46
5-.. post:: 2022/10/23
7+.. post:: 2022/10/27
68 :category: CastleBlogs, rough
79 :tags: Grammar, PEG, DRAFT
810
9- To implement CCastle we need a parser (as part of ther compiler). Eventually, that parser will be writen in Castle;
10- but for now we kickstart it in python. Which has several packages that can assist us. As we like to use an PEG one,
11- there are a few options. `Arpeggio <https://textx.github.io/Arpeggio/2.0/>`__ is well known, and has some nice
12- options -- but can’t handle `left recursion <https://en.wikipedia.org/wiki/Left_recursion>`__ -- like most
13- PEG-parsers.
11+ To implement CCastle we need a parser; as part of the compiler. Eventually, that parser will be writen in Castle. For
12+ now we kickstart it in python; which has several packages that can assist us. As we like to use an PEG one, there
13+ are a few options. `Arpeggio <https://textx.github.io/Arpeggio/2.0/>`__ is well known, and has some nice options --
14+ but can’t handle `left recursion <https://en.wikipedia.org/wiki/Left_recursion>`__ -- like most PEG-parsers.
1415
15- Recently python uses a PEG parser, that supports `left recursion <https://en.wikipedia.org/wiki/Left_recursion>`__
16- (which is a recent development). That parser is also available as a (hardly documented) package: `pegen
17- <https://we-like-parsers.github.io/pegen/index.html>`__
16+ Recently python itself uses a PEG parser, that supports `left recursion
17+ <https://en.wikipedia.org/wiki/Left_recursion>`__ (which is a recent development). That parser is also available as a
18+ package: `pegen <https://we-like-parsers.github.io/pegen/index.html>`__; but hardly documented.
1819
19- This blog is writen to remember some leassons learned when playing with in
20+ This blog is writen to remember some leassons learned when playing with it. And as kind of informal docs.
2021
2122
2223 Build-In Lexer
2324 ==============
2425
25-Pegen is specially writen for Python and use a specialized lexer; unlike most PEG-parser, that uses PEG for both. Pegen
26+Pegen is specially writen for Python and use a specialized lexer; unlike most PEG-parser that uses PEG for lexing too. Pegen
2627 uses the `tokenizer <https://docs.python.org/3/library/tokenize.html>`__ that is part of Python. This comes with some
2728 restrictions.
2829
30+.. hint::
31+
32+ This applies mostly when we use pegen as modele: ``pyton -m pegen ...``; that calls `simple_parser_main()`.
33+ |BR|
34+ When uses it in code, by importing pegen ``from pegen.parser Parser ...`` one has more options (not studies yet).
35+
36+
2937 Tokens
3038 ------
3139
@@ -36,26 +44,36 @@
3644 them differently; possible combined with other characters. Then, those will not be found; not the literal-strings as set
3745 in the grammar.
3846
47+.. note::
48+
49+ Pegen speaks about *(soft)* **keywords** for all kind of literal terminals; even when they are more like operators
50+ than *words*.
51+
3952 .. warning::
4053
54+ When the grammar defines (literal) terminals (or keywords) --especially for operators-- make sure the lexer will not
55+ break them into predefined tokens!
56+ |BR|
57+ This will not give an error, but it does not work!
58+
4159 .. code-block:: PEG
4260
43- Left_arrow_BAD: '<-' ## This is WRONG, as ``<`` is sees as a token
61+ Left_arrow_BAD: '<-' ## This is WRONG, as ``<`` is seen as a token. And so, `<-` is never found
4462 Left_arrow_OKE: '<' '-' ## This is acceptable
4563
46-.. seealso:: https://docs.python.org/3/library/token.html for an overiew of the predefied tokens
64+.. seealso:: See https://docs.python.org/3/library/token.html, for an overiew of the predefined tokens
4765
4866 .. tip::
4967
5068 A quick trick to see how a file is split into tokens, use ``python -m tokenize [-e] filename.peg``.
5169 |BR|
52- Make sure you do not use string-literals that (eg) are composed of two tokens.
70+ Make sure you do not use string-literals that (eg) are composed of two tokens. Like the above mentioned ``<--``
5371
5472
5573
56-.. sidebar:: Reserverd Names
74+.. sidebar:: Reserverd
75+ :class: localtoc
5776
58- - start
5977 - showpeek
6078 - name
6179 - number
@@ -75,3 +93,42 @@
7593 The *GeneratedParser* inherites and calls the base ``pegen.parser.Parser`` class and has methods for all
7694 rule-names. This implies some names should not be used as rule-names (in all cases) -- see the sidebar.
7795
96+
97+Meta Syntax (issues)
98+====================
99+
100+No: regexps
101+-----------
102+
103+PEGEN has **no** support for regular expressions probably as it uses a custom lexer.
104+
105+Unordered Group starts a comment
106+--------------------------------
107+
108+PEGEN (or it lexer) used the ``#`` to start a comment. This implies an **Unordered group** ``( sequence )#`` --as in
109+`Arpeggio <https://textx.github.io/Arpeggio/2.0/grammars/#grammars-written-in-peg-notations>`__-- are not recognized
110+
111+A workarond is to use another character like ``@`` instead of the hash (``#``).
112+
113+
114+Result/Output
115+=============
116+
117+cmd-tool
118+--------
119+
120+The commandline tool ``pyton -m pegen ...`` only prints the parsed tree: a list (shown as ``[`` ... ``]``) with
121+sub-list and/or `TokenInfo` namedtuples. Each `TokenInfo` has 5 elements: a token type (an int and its enum-name), the
122+token-string (that was was parsed), the begin & end location (line- & column-number), and the full line that is beeing
123+parsed.
124+
125+No info about the matched gramer-rule (e.g. the rule-name) is shown. Actually that info is not part of the parsed-tree.
126+
127+.. seealso:: This `structure is described <https://docs.python.org/3/library/tokenize.html?highlight=TokenInfo>`__ in
128+ the tokenize module; without specifying its name: TokenInfo.
129+
130+The parser
131+----------
132+
133+The GeneratedParser (and/or it’s baseclass: ``pegen.parser.Parser``) returns only (list of) tokens from the tokenizer (a
134+OO wrapper arround tokenize). And so, the same TokenInfo objects as described above.