Revisión | 865669d7dc68aa11fe701358e427a65b1a4e4091 (tree) |
---|---|
Tiempo | 2023-05-23 03:46:46 |
Autor | Albert Mietus < albert AT mietus DOT nl > |
Commiter | Albert Mietus < albert AT mietus DOT nl > |
Grammerly
@@ -11,7 +11,7 @@ | ||
11 | 11 | :category: Castle, Usage |
12 | 12 | :tags: Castle, Grammar |
13 | 13 | |
14 | - In Castle you can define a *grammar* directly in your code. The compiler will *translate* them into functions, using | |
14 | + In Castle, you can define *grammar*(s) directly in your code. The compiler will *translate* them into functions, using | |
15 | 15 | the build-in (PEG) **compiler-compiler** -- at least that was it called back in the days of *YACC*. |
16 | 16 | |
17 | 17 | How does one *use* that? And *why* should you? |
@@ -21,7 +21,7 @@ | ||
21 | 21 | ======================= |
22 | 22 | |
23 | 23 | A grammar is a collection of (parsing)-**rules** and optionally some *settings*. Rules are written in a mixture of EBNF |
24 | -and PEG meta-syntax. Let’s start with an simple example: | |
24 | +and PEG meta-syntax. Let’s start with a simple example: | |
25 | 25 | |
26 | 26 | .. code-block:: PEG |
27 | 27 |
@@ -33,132 +33,131 @@ | ||
33 | 33 | |
34 | 34 | |
35 | 35 | This basically defines that a ``castle_file`` is a sequence of ``import_line``\(s), ``interface``\(s), or |
36 | -``implementation``\(s); which are all “non-terminal(s)” -- see below. Each of those non-terminals are defined by more | |
37 | -rules. By example, an ``import_line`` starts with the ``IMPORT_stmt`` then comes either a ``STRING_literal`` or a | |
36 | +``implementation``\(s); which are all “non-terminal(s)” -- see below. All of those *’non-terminals’* are defined by more | |
37 | +rules. For example, an ``import_line`` starts with the ``IMPORT_stmt`` then comes either a ``STRING_literal`` or a | |
38 | 38 | ``qualID``, and it ends with a semicolon (`';'`). Likewise, the ``IMPORT_stmt`` is set to ‘import’ (literally). |
39 | 39 | |
40 | -As we see, a grammar contains non-terminals and terminals. Non-terminals are abstract and defined by grammar-rules, | |
41 | -containing both (other) non-terminals and terminals. Terminals are concrete: it are the things (tokens) you type when | |
40 | +As we see, the grammar contains non-terminals and terminals. Non-terminals are abstract and defined by grammar rules, | |
41 | +containing both (other) non-terminals and terminals. Terminals are concrete: they are the things (tokens) you type when | |
42 | 42 | programming. Some terminals are like constants like the semicolon at the end of ``import_line``, therefore they are |
43 | -quoted in the grammar (Notice, the is also a non quoted semicolon on each line, that is part of the syntax of grammar.) | |
43 | +quoted in the grammar (Notice, the is also a non-quoted semicolon on each line, which is part of the syntax of grammar.) | |
44 | 44 | |BR| |
45 | -Other terminals are more like valuables, they have a value. The ``STRING_literal`` is a good example. It’s value is the | |
46 | -string itself. Similar for numbers and variable-names. | |
45 | +Other terminals are more like valuables, they have a value. The ``STRING_literal`` is a good example. Its value is the | |
46 | +string itself. Similar for numbers and variable names. | |
47 | 47 | |
48 | -In this example grammar, a ``qualID`` is a ``nameID`` *(a name that is used as ID, like in any programming language)*, | |
48 | +In this (example) grammar, a ``qualID`` is a ``nameID`` *(a name that is used as ID, like in any programming language)*, | |
49 | 49 | optionally followed by sub-names *(again like most languages: a dotted name, specifying a field (in a field, in |
50 | 50 | ...)*. In Castle, that name may start with a dot --which is a shorthand notation for “in the current namespace”. You can |
51 | -ignore that for know. | |
51 | +ignore that for now. | |
52 | 52 | |
53 | -Basically, grammers defines how one should read the input --a text--, or more formally: how to parse it. The result of | |
54 | -this parsing is twofold. It will check whether input conforms to the grammer; resulting in boolean, for the | |
55 | -mathematics under us. And it will translate a sequential (flat) text into a tree-structure; which typically much more | |
56 | -useful for a software-engineer. | |
53 | +A grammar defines how one (aka the compiler) should read the input --a text--, or more formally: how to parse it. The | |
54 | +result of this parsing is twofold. It will check whether the input conforms to the grammar; resulting in a boolean, for | |
55 | +the mathematics under us. And it will translate a sequential (flat) text into a tree-structure; which is typically much | |
56 | +more useful for a software engineer. | |
57 | 57 | |BR| |
58 | -A well known example is this HTML-file. On disk it’s nothing but text, which is easy to store and to transfer. But when | |
59 | -send to your brouwer, it’s *parsed* to create the `DOM <https://nl.wikipedia.org/wiki/Document_Object_Model>`__; a | |
60 | -tree of the document, with sections, paragraphs, hyper-links, etc. By regarding it as a tree, it easy to describe | |
61 | -(e.g. with CSS) how arts should be shown: all headers have a background, the first row in a table is highlighed, | |
62 | -etc. | |
63 | 58 | |
59 | +A well-known example is this HTML file. On disk, it’s nothing but text, which is easy to store and transfer. But when | |
60 | +sent to your browser, it’s *parsed* to create the `DOM <https://nl.wikipedia.org/wiki/Document_Object_Model>`__; a tree | |
61 | +of the document, with sections, paragraphs, hyperlinks, etc. By regarding it as a tree, it becomes easy to describe or | |
62 | +selected parts (e.g. with CSS) how parts should be shown: all headers have a background, the first row in a table is | |
63 | +highlighted, etc. | |
64 | 64 | |
65 | 65 | Parsing |
66 | 66 | ======= |
67 | 67 | Another well-known example is (the source of a) program. As code, it is just text. But the compiler will parse it into |
68 | -a parse-tree and/or an abstract-syntax-tree; which is build out of classes, methods, statements etc. | |
68 | +a parse tree and/or an “abstract syntax tree” (AST); which is built out of classes, methods, statements, etc. | |
69 | 69 | |BR| |
70 | 70 | But also your favorite IDE will *parse* it; to highlight the code, give tooltips, enable you to quickly navigate and |
71 | -refactor it and all those conviant features that make it your favorite editor. | |
72 | - | |
71 | +refactor it, and all those convenient features that make it your favorite editor. | |
73 | 72 | And even you are probably parsing text as part of your daily job. When you un-serialise data, you are (often) parsing |
74 | -text; when you read the configuration, you are (or should be ) parsing that text. Even a simple input of the user might | |
73 | +text; when you read the configuration, you are (or should be ) parsing that text. Even a simple input from the user might | |
75 | 74 | need a bit of parsing. The text “42” is not the number :math:`42.0` -- you need to convert it; parse it. |
76 | 75 | |
77 | -There a many ways to *parse*. You do not need a full-fledged grammer to translate “42” into :math:`42` or | |
78 | -:math:`42.0`; a stdlib-function as ``atoi()`` or ``atof()`` will do. But how about handling complex numbers | |
79 | -(:math:`4+j2`) or fractions (:math:`\frac{17}{42}`)? | |
76 | +There are many ways to *parse*. A full-fledged grammar to translate (the text) ‘42’ into the int “:math:`42`” or | |
77 | +the float “:math:`42.0`” isn’t needed, a stdlib-function as “``atoi()` or ``atof()`` will do. But how about handling | |
78 | +complex numbers (:math:`4+j2`) or fractions (:math:`\frac{17}{42}`)? | |
80 | 79 | |
81 | 80 | Non-parsing |
82 | 81 | ----------- |
83 | 82 | |
84 | 83 | As writing a proper-passer used to be (too) hard, other similar (but simpler) techniques are often used, like `globing |
85 | 84 | <https://en.wikipedia.org/wiki/Glob_(programming)>`__ *(``\*.Castle`` on the bash-prompt will result in all |
86 | -Castle-files)*. This is simple, and will is very simple cases. | |
85 | +Castle-files)*. This is simple and will do in very simple cases. | |
87 | 86 | |BR| |
88 | -Other try to use `regular-expressions <https://en.wikipedia.org/wiki/Regular_expression>`__ for parsing. RegExps are | |
89 | -indeed more powerfull then globing, and often used to highlight code. A pattern as ``//.*$`` can be used to highlight | |
90 | -(single-line) comment. It often works, but this simple pattern might match a piece of text *inside* a | |
91 | -multi-line-(doc)string -- which wrong. | |
87 | +Others try to use `regular expressions <https://en.wikipedia.org/wiki/Regular_expression>`__ for parsing. RegExps are | |
88 | +indeed more powerful than globing and are often used to highlight code. A pattern as ``//.*$`` can be used to highlight | |
89 | +(single-line) comments. It often works, but this simple pattern might match a piece of text *inside* a | |
90 | +multi-line-(doc)string -- which is wrong. | |
92 | 91 | |
93 | -To parse an input-text its not a sound solution; although I have seen cunning regular-expressions, that almost always | |
94 | -work. But *reg-exps* have not the same power as a grammar-- That is already proven halve a century ago and will not be | |
95 | -repeated here. | |
92 | +Those *tricks* aren’t a sound solution to parse generic input/text; although I have seen cunning RegExps that almost | |
93 | +(always) work. *Regular expressions* do have not the same power as grammars; that is already proven half a century | |
94 | +ago and not repeated here. | |
96 | 95 | |
97 | -Grammars are more powerfull | |
98 | -=========================== | |
96 | +Grammars are more powerful | |
97 | +========================== | |
99 | 98 | |
100 | -A grammar (even a simple one) is more powerfull. You can define the overal structure of the input and the sub-structure | |
101 | -of each *lump*. When a multi-line-string has no sub-structure, the parser will never find comments inside it. Nor other | |
102 | -way around; it simple is not hunting for it. | |
99 | +A grammar (even a simple one) is more powerful. You can define the overall structure of the input and the sub-structure | |
100 | +of each *lump*. When a multi-line-string has no sub-structure, the parser will never find comments inside it. Nor the other | |
101 | +way around; it simply is not hunting for it. | |
103 | 102 | |
104 | -As most programming-languages do not have build-in support for grammars, one has to resort to external tools. Like the | |
105 | -famous `YACC <https://en.wikipedia.org/wiki/Yacc>`__; developed in 197X. YACC will read a grammar-file, and generates | |
103 | +As most programming languages do not have built-in support for grammars, one has to resort to external tools. Like the | |
104 | +famous `YACC <https://en.wikipedia.org/wiki/Yacc>`__; developed in 197X. YACC will read a grammar-file and generates | |
106 | 105 | C-code that can be compiled and linked to your code. |
107 | 106 | |
108 | -Back then, writing compiler-compilers was a popular academic research exercise (YACC stand for: Yet Another Compiler | |
109 | -Compiler). It was great for compiler-designers, but clumsy to use for average developers: The syntax to write a grammar | |
107 | +Back then, writing compiler-compilers was a popular academic research exercise (YACC stands for: Yet Another Compiler | |
108 | +Compiler). It was great for compiler designers, but clumsy to use for average developers: The syntax to write a grammar | |
110 | 109 | was hard to grasp, with many pitfalls, the interface between your code and the parser was awkward (you had to call |
111 | 110 | ``yyparse()``; needed some globals; OO wasn't invented, no inheritance or data-hiding, which resulted in puzzling tricks |
112 | 111 | to use multiple parsers, etc). |
113 | 112 | |BR| |
114 | -Aside of that, more and better parsing strategies are developed; that is handles in another :ref:`blog <grammmar-code>`. | |
113 | +Aside from that, more and better parsing strategies are developed; that is handled in another :ref:`blog <grammmar-code>`. | |
115 | 114 | |
116 | 115 | Unleash that power! |
117 | 116 | ------------------- |
118 | 117 | |
119 | -With those better parsing-algorithms, faster computers with a lot more memory and other inventions, writing grammars | |
120 | -has become more peaceful. Except that you still need an extra step, another sytax, as you still need to use an external | |
118 | +With those better parsing algorithms, faster computers with a lot more memory, and other inventions, writing grammars | |
119 | +has become more peaceful. Except that you still need an extra step, another syntax, as you still need to use an external | |
121 | 120 | tool. That sometimes isn’t maintained after a couple of years ... |
122 | 121 | |BR| |
123 | -The effect is, most developers don’t use grammars; they write parser-like code manually, or the settle for less optimal | |
124 | -result. Or are utterly not aware that grammer can provide another, better, easier solution. | |
122 | +The effect is, most developers don’t use grammars; they write parser-like code manually, or they settle for less optimal | |
123 | +results. Or are utterly not aware that grammar can provide another, better, easier solution. | |
125 | 124 | |
126 | 125 | With a few lines, you can define the structure of the input. Each rule is like a function: it has a name (the |
127 | -left-hand-side of the rule, so the part before the arrow), and an implementation; the part after the arrow. That | |
128 | -implementation “calls” other rules, like normal code. | |
126 | +‘left-hand side’ (LHS) of the rule, so the part before the arrow), and an implementation; the part after the | |
127 | +arrow. That implementation “calls” other rules, like normal code. | |
129 | 128 | |BR| |
130 | -When you call the “main rule function”, with the input-stream as input, that *file* is parsed, and the complete input is | |
131 | -ready to use; not more manual scanning and parsing. And when the file-structure is slightly updated, you just add a few | |
132 | -details to the grammer. | |
129 | +When you call the “main rule function”, with the input stream as input, that *file* is parsed, and the complete input is | |
130 | +ready to use; not more manual scanning and parsing. And when the file structure is slightly updated, you just add a few | |
131 | +details to the grammar. | |
133 | 132 | |
134 | -Castle has it build-in | |
133 | +Castle has it built-in | |
135 | 134 | ====================== |
136 | 135 | |
137 | 136 | Grammars makes reading text easy. Define the structure, call the “main rule” and use the values. Castle makes that |
138 | 137 | simple! |
139 | 138 | |
140 | -.. use:: Castle has build-in grammers support | |
141 | - :ID: U_Grammers | |
142 | - | |
143 | - In Castle one can define a grammer directly into the source-code; as :ref:`grammmar-code`! | |
139 | +.. use:: Castle has build-in grammar support | |
140 | + :ID: U_Grammars | |
144 | 141 | |
145 | - And, like many other details, the language is hiding the nasty details of parsing-strategies. There is no need to | |
146 | - generating, compiling, and use that code, with external tools. All that clutter is gone. | |
142 | + In Castle one can define a grammar directly into the source code; as :ref:`grammmar-code`! | |
147 | 143 | |
148 | - .. tip:: The standard parsing-algorithm is PEG; but that is not an requirement. | |
144 | + And, like many other details, the language is hiding the nasty details of parsing strategies. There is no need to | |
145 | + generate, compile, and use that code, with external tools. All that clutter is gone. | |
149 | 146 | |
150 | - The syntax of grammers is quite generic, it’s the implementation of the Castle-compiler that implements the | |
151 | - parsing-strategy; it should supports PEG. But it is free to support others as well (with user-selectable | |
147 | + .. tip:: The standard parsing algorithm is PEG, but that is not a requirement. | |
148 | + | |
149 | + The syntax of grammars is quite generic, it’s the implementation of the Castle compiler that implements the | |
150 | + parsing strategy; it should support PEG. But it is free to support others as well (with user-selectable | |
152 | 151 | compiler-plugins). |
153 | 152 | |BR| |
154 | 153 | This is not unlike other compiler-options. |
155 | 154 | |
156 | -To use the grammar you simply call one of those rules as a function: pass the input (string) and it will return a | |
157 | -(generic) tree-structure. | |
158 | -When you simple like to verify the syntax is correct: use the tree as a boolean: when it not-empty the input is valid. | |
155 | +To use the grammar, you simply call one of those rules as a function: pass the input (string) and it will return a | |
156 | +(generic) tree structure. | |
157 | +When you like to verify the syntax is correct: use the tree as a boolean: when it not-empty the input is valid. | |
159 | 158 | |BR| |
160 | -Typically however, you traverse that tree, like you do in many situations. | |
159 | +Typically, however, you traverse that tree, like you do in many situations. | |
161 | 160 | |
162 | 161 | To read that early configuration: parse the file and walk over the tree. Or use it “ala a DOM” by using Castle’s |
163 | -:ref:`matching-statements` to simply. Curious on how that works: continue reading in :ref:`grammmar-code`. | |
162 | +:ref:`matching-statements` to simply. Curious about how that works: continue reading in :ref:`grammmar-code`. | |
164 | 163 | Or skip to “Why there are :ref:`G2C-actions`”. |