aboutsummaryrefslogtreecommitdiff
path: root/docs/SYNTAX.md
blob: 4e57b043a56cb646ef1dc66d4b410c0cae3885ed (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
# Syntax: A Casual and Formal Look

## Call Syntax

There is little difference between a function, macro, and operator call. There are only a few forms such calls can take, too, though notably more than most other languages (due to, among other things, uniform function call syntax): hence this section.

```
# The standard, unambiguous call.
routine(1, 2, 3, 4)
# The method call syntax equivalent.
1.routine(2, 3, 4)
# A block-based call. This is only really useful for macros taking in a body.
routine
  1
  2
  3
  4
# A parentheses-less call. This is only really useful for `print` and `dbg`.
# Only valid at the start of a line.
routine 1, 2, 3, 4
```

Binary operators have some special rules.

```
# Valid call syntaxes for binary operators. What can constitute a binary
# operator is constrained for parsing's sake. Whitespace is optional.
1 + 2
1+2
+ 1, 2 # Only valid at the start of a line. Also, don't do this.
+(1, 2)
```

As do unary operators.

```
# The standard call for unary operators. Postfix.
1?
?(1)
```

Method call syntax has a number of advantages: notably that it can be *chained*: acting as a natural pipe operator. Redundant parenthesis can also be omitted.

```
# The following statements are equivalent:
foo.bar.baz
foo().bar().baz()
baz(bar(foo))
baz
  bar
    foo
baz bar(foo)
baz foo.bar
```

## Indentation Rules

The tokens `=`, `then`, `do`, `of`, `else`, `block`, `const`, `block X`, and `X` (where `X` is an identifier) are *scope tokens*. They denote a new scope for their associated expressions (functions/macros/declarations, control flow, loops). The tokens `,`, `.` (notably not `...`), and all default binary operators (notably not `not`) are *continuation tokens*. An expression beginning or ending in one of them would always be a syntactic error.

Line breaks are treated as the end of a statement, with several exceptions.

```puck
pub func foo() =
  print "Hello, world!"
  print "This is from a function."

pub func inline_decl() = print "Hello, world!"
```

Indented lines following a line ending in a *scope token* are treated as belonging to a new scope. That is, indented lines following a line ending in a scope token form the body of the expression associated with the scope token.

Indentation is not obligatory after a scope token. However, this necessarily constrains the body of the associated expression to one line: no lines following will be treated as an extension of the body, only the expression associated with the original scope token. (This may change in the future.)

```puck
pub func foo(really_long_parameter: ReallyLongType,
another_really_long_parameter: AnotherReallyLongType) = # no indentation! this is ok
  print really_long_parameter # this line is indented relative to the first line
  print really_long_type
```

Lines following a line ending in a *continuation token* (and, additionally `not` and `(`) are treated as a continuation of that line and can have any level of indentation (even negative). If they end in a scope token, however, the following lines must be indented relative to the indentation of the previous line.

```puck
let really_long_parameter: ReallyLongType = ...
let another_really_long_parameter: AnotherReallyLongType = ...

really_long_parameter
  .foo(another_really_long_parameter) # some indentation! this is ok
```

Lines *beginning* in a continuation token (and, additionally `)`), too, are treated as a continuation of the previous line and can have any level of indentation. If they end in a scope token, the following lines must be indented relative to the indentation of the previous line.

```puck
pub func foo() =
  print "Hello, world!"
pub func bar() = # this line is no longer in the above scope.
  print "Another function declaration."
```

Dedented lines *not* beginning or ending with a continuation token are treated as no longer in the previous scope, returning to the scope of the according indentation level.

```puck
if cond then this
else that

match cond
of this then ...
of that then ...
```

A line beginning with a scope token is treated as attached to the previous expression.

```
# Technically allowed. Please don't do this.
let foo
= ...

if cond then if cond then this
else that

for i
in iterable
do ...

match foo of this then ...
of that then ...

match foo of this
then ...
of that then ...
```

This *can* lead to some ugly possibilities for formatting that are best avoided.

```
# Much preferred.

let foo =
  ...
let foo = ...

if cond then
  if cond then
    this
else that
if cond then
  if cond then this
else that

for i in iterable do
  ...
for i in iterable do ...

match foo
of this then ...
of that then ...
```

The indentation rules are complex, but the effect is such that long statements can be broken *almost* anywhere.

## Expression Rules

First, a word on the distinction between *expressions* and *statements*. Expressions return a value. Statements do not. That is all.

There are some syntactic constructs unambiguously recognizable as statements: all declarations, modules, and `use` statements. There are no syntactic constructs unambiguously recognizable as expressions. As calls returning `void` are treated as statements, and expressions that return a type could possibly return `void`, there is no explicit distinction between expressions and statements made in the parser: or anywhere before type-checking.

Expressions can go almost anywhere. Our indentation rules above allow for it.

```
# Some different formulations of valid expressions.

if cond then
  this
else
  that

if cond then this
else that

if cond
then this
else that

if cond then this else that

let foo =
  if cond then
    this
  else
    that
```

```
# Some different formulations of *invalid* expressions.
# These primarily break the rule that everything following a scope token
# (ex. `=`, `do`, `then`) not at the end of the line must be self-contained.

let foo = if cond then
    this
  else
    that

let foo = if cond then this
  else that

let foo = if cond then this
else that

# todo: how to handle this?
if cond then if cond then that
else that

# shrimple
if cond then
  if cond then that
else that

# this should be ok
if cond then this
else that

match foo of
this then ...
of that then ...
```

## Reserved Keywords

The following keywords are reserved:
- variables: `let` `var` `const`
- control flow: `if` `then` `elif` `else`
- pattern matching: `match` `of`
- error handling: `try` `with` `finally`
- loops: `while` `do` `for` `in`
- blocks: `loop` `block` `break` `continue` `return`
- modules: `pub` `mod` `use` `as`
- functions: `func` `varargs`
- metaprogramming: `macro` `quote` `when`
- ownership: `lent` `mut` `ref` `refc`
- types: `type` `struct` `tuple` `union` `enum` `class`

The following keywords are not reserved, but liable to become so.
- `impl` `object` `interface` `concept` `auto` `effect` `case`
- `suspend` `resume` `spawn` `pool` `thread` `closure` `static`
- `cyclic` `acyclic` `sink` `move` `destroy` `copy` `trace` `deepcopy`

The following identifiers are in use by the standard prelude:
- logic: `not` `and` `or` `xor` `shl` `shr` `div` `mod` `rem`
- logic: `+` `-` `*` `/` `<` `>` `<=` `>=` `==` `!=` `is`
- async: `async` `await`
- types: `int` `uint` `float` `i\d+` `u\d+`
  - `f32` `f64` `f128`
  - `dec64` `dec128`
- types: `bool` `byte` `char` `str`
- types: `void` `never`
- strings: `&` (string append)

The following punctuation is taken:
- `=` (assignment)
- `.` (chaining)
- `,` (parameters)
- `;` (statements)
- `:` (types)
- `#` (comment)
- `@` (attributes)
- `_` (unused bindings)
- `|` (generics)
- `\` (string/char escaping)
- `()` (parameters, tuples)
- `[]` (generics, lists)
- `{}` (scope, structs)
- `""` (strings)
- `''` (chars)
- ``` `` ``` (unquoting)
- unused on qwerty: `~` `%` `^` `$`
  - perhaps leave `$` unused. but `~`, `%`, and `^` totally could be...

## A Formal Grammar

We now shall take a look at a more formal description of Puck's syntax.

Syntax rules are described in [extended Backus–Naur form](https://en.wikipedia.org/wiki/Extended_Backus–Naur_form) (EBNF): however, most rules surrounding whitespace, and scope, and line breaks, are modified to how they would appear after a lexing step.

### Identifiers
```
Ident  ::= (Letter | '_') (Letter | Digit | '_')*
Letter ::= 'A'..'Z' | 'a'..'z' | '\x80'..'\xff' # todo
Digit  ::= '0'..'9'
```

### Literals
```
Int ::= '-'? (DecLit | HexLit | OctLit | BinLit)
Float ::= '-'? DecLit '.' DecLit
BinLit ::= '0b' BinDigit ('_'? BinDigit)*
OctLit ::= '0o' OctDigit ('_'? OctDigit)*
HexLit ::= '0x' HexDigit ('_'? HexDigit)*
DecLit ::= Digit ('_'? Digit)*
BinDigit ::= '0'..'1'
OctDigit ::= '0'..'7'
HexDigit ::= Digit | 'A'..'F' | 'a'..'f'
```

### Chars, Strings, and Comments
```
CHAR    ::= '\'' (PRINT - '\'' | '\\\'')* '\''
STRING  ::= SINGLE_LINE_STRING | MULTI_LINE_STRING
COMMENT ::= SINGLE_LINE_COMMENT | MULTI_LINE_COMMENT | EXPRESSION_COMMENT
SINGLE_LINE_STRING  ::= '"' (PRINT - '"' | '\\"')* '"'
MULTI_LINE_STRING   ::= '"""' (PRINT | '\n' | '\r')* '"""'
SINGLE_LINE_COMMENT ::= '#' PRINT*
MULTI_LINE_COMMENT  ::= '#[' (PRINT | '\n' | '\r' | MULTI_LINE_COMMENT)* ']#'
EXPRESSION_COMMENT  ::= '#;' SINGLE_STMT
PRINT ::= LETTER | DIGIT | OPR |
          '"' | '#' | "'" | '(' | ')' | # notably the dual of OPR
          ',' | ';' | '[' | ']' | '_' |
          '`' | '{' | '}' | ' ' | '\t'
```

### Values
```
Value ::= Int | Float | String | Char | Array | Tuple | Struct
Array  ::= '[' (Expr (',' Expr)*)? ']'
Tuple  ::= '(' (Ident '=')? Expr (',' (Ident '=')? Expr)* ')'
Struct ::= '{' Ident '=' Expr (',' Ident '=' Expr)* '}'
```

### Variables
```
Decl  ::= Let | Var | Const | Func | Type
Let   ::= 'let' Pattern (':' Type)? '=' Expr
Var   ::= 'var' Pattern (':' Type)? ('=' Expr)?
Const ::= 'pub'? 'const' Pattern (':' Type)? '=' Expr
Pattern ::= (Ident ('as' Ident)?) | Char | String | Number | Float |
            Ident? '(' Pattern (',' Pattern)* ')'
```

### Declarations
```
Func  ::= 'pub'? 'func' Ident Generics? Parameters? (':' Type)? '=' Body
Macro ::= 'pub'? 'macro' Ident Generics? Parameters? (':' Type)? '=' Body
Generics   ::= '[' Ident (':' Type)? (',' Ident (':' Type)?)* ']'
Parameters ::= '(' Ident (':' Type)? (',' Ident (':' Type)?)* ')'
```

All arguments to functions must have a type. This is resolved at the semantic level, however. (Arguments to macros may lack types. This signifies a generic node.)

### Types
```
TypeDecl ::= 'pub'? 'type' Ident Generics? '=' Type
Type     ::= TypeStruct | TypeTuple | TypeEnum | TypeUnion | SugarUnion |
             TypeClass | (Modifier* (Type | ('[' Type ']')))
TypeStruct ::= 'struct' ('[' Ident ':' Type (',' Ident ':' Type)* ']')?
TypeUnion  ::= 'union'  ('[' Ident ':' Type (',' Ident ':' Type)* ']')?
SugarUnion ::= '(' Ident ':' Type (',' Ident ':' Type)* ')'
TypeTuple  ::= 'tuple' ('[' (Ident ':')? Type (',' (Ident ':')? Type)* ']')?
TypeEnum   ::= 'enum'  ('[' Ident ('=' Expr)? (',' Ident ('=' Expr)?)* ']')?
TypeClass  ::= 'class' ('[' Signature (',' Signature)* ']')?
Modifier   ::= 'ref' | 'refc' | 'ptr' | 'lent' | 'mut' | 'const'
Signature  ::= Ident Generics? ('(' Type (',' Type)* ')')? (':' Type)?
```

## Control Flow
```
If     ::= 'if' Expr 'then' Body ('elif' Expr 'then' Body)* ('else' Body)?
When   ::= 'when' Expr 'then' Body ('elif' Expr 'then' Body)* ('else' Body)?
Try    ::= 'try' Body
           ('except' Ident ('as' Ident)? (',' Ident ('as' Ident)?)*) 'then' Body)+
           ('finally' Body)?
Match  ::= 'match' Expr ('of' Pattern (',' Pattern)* ('where' Expr)? 'then' Body)+
While  ::= 'while' Expr 'do' Body
For    ::= 'for' Pattern 'in' Expr 'do' Body
Loop   ::= 'loop' Body
Block  ::= 'block' Ident? Body
Const  ::= 'const' Body
Quote  ::= 'quote' QuoteBody
```

## Modules
```
Mod ::= 'pub'? 'mod' Ident '=' Body
Use ::= 'use' Ident ('.' Ident)* ('.' ('[' Ident (',' Ident)* ']'))?
```

### Operators
```
Operator ::= 'and' | 'or' | 'not' | 'xor' | 'shl' | 'shr' |
             'div' | 'mod' | 'rem' | 'is' | 'in' | Opr+
Opr ::= '=' | '+' | '-' | '*' | '/' | '<' | '>' |
        '@' | '$' | '~' | '&' | '%' | '|' |
        '!' | '?' | '^' | '.' | ':' | '\\'
```

## Calls and Expressions

This section is (quite) inaccurate due to complexities with respect to significant indentation. Heed caution.

```
Call ::= Ident ('[' Call (',' Call)* ']')? ('(' (Ident '=')? Call (',' (Ident '=')? Call)* ')')? |
         Ident Call (',' Call)* |
         Call Operator Call? |
         Call Body
Stmt ::= Let | Var | Const | Func | Type | Mod | Use | Expr
Expr ::= Block | Const | For | While | Loop | If | When | Try | Match | Call
Body ::= (Stmt ';')* Expr
```

---

References:
- [Statements vs. Expressions](https://www.joshwcomeau.com/javascript/statements-vs-expressions/)
- [Swift's Lexical Structure](https://docs.swift.org/swift-book/ReferenceManual/LexicalStructure.html)
- [The Nim Programming Language](https://nim-lang.github.io/Nim/manual.html)
- [Pietro's Notes on Compilers](https://pgrandinetti.github.io/compilers/)