# Syntax: A Casual and Formal Look ## Call Syntax There is little difference between a function, macro, and operator call. There are only a few forms such calls can take, too, though notably more than most other languages (due to, among other things, uniform function call syntax): hence this section. ```puck # The standard, unambiguous call. routine(1, 2, 3, 4) # The method call syntax equivalent. 1.routine(2, 3, 4) # A block-based call. This is only really useful for macros taking in a body. routine 1 2 3 4 # A parentheses-less call. This is only really useful for `print` and `dbg`. # Only valid at the start of a line. routine 1, 2, 3, 4 ``` Binary operators have some special rules. ```puck # Valid call syntaxes for binary operators. What can constitute a binary # operator is constrained for parsing's sake. Whitespace is optional. 1 + 2 1+2 + 1, 2 # Only valid at the start of a line. Also, don't do this. +(1, 2) ``` As do unary operators. ```puck # The standard call for unary operators. Postfix. 1? ?(1) ``` Method call syntax has a number of advantages: notably that it can be *chained*: acting as a natural pipe operator. Redundant parenthesis can also be omitted. ```puck # The following statements are equivalent: foo.bar.baz foo().bar().baz() baz(bar(foo)) baz bar foo baz bar(foo) baz foo.bar ``` ## Indentation Rules The tokens `=`, `then`, `do`, `of`, `else`, `block`, `const`, `block X`, and `X` (where `X` is an identifier) are *scope tokens*. They denote a new scope for their associated expressions (functions/macros/declarations, control flow, loops). The tokens `,`, `.` (notably not `...`), and all default binary operators (notably not `not`) are *continuation tokens*. An expression beginning or ending in one of them would always be a syntactic error. Line breaks are treated as the end of a statement, with several exceptions. ```puck pub func foo() = print "Hello, world!" print "This is from a function." pub func inline_decl() = print "Hello, world!" ``` Indented lines following a line ending in a *scope token* are treated as belonging to a new scope. That is, indented lines following a line ending in a scope token form the body of the expression associated with the scope token. Indentation is not obligatory after a scope token. However, this necessarily constrains the body of the associated expression to one line: no lines following will be treated as an extension of the body, only the expression associated with the original scope token. (This may change in the future.) ```puck pub func foo(really_long_parameter: ReallyLongType, another_really_long_parameter: AnotherReallyLongType) = # no indentation! this is ok print really_long_parameter # this line is indented relative to the first line print really_long_type ``` Lines following a line ending in a *continuation token* (and, additionally `not` and `(`) are treated as a continuation of that line and can have any level of indentation (even negative). If they end in a scope token, however, the following lines must be indented relative to the indentation of the previous line. ```puck let really_long_parameter: ReallyLongType = ... let another_really_long_parameter: AnotherReallyLongType = ... really_long_parameter .foo(another_really_long_parameter) # some indentation! this is ok ``` Lines *beginning* in a continuation token (and, additionally `)`), too, are treated as a continuation of the previous line and can have any level of indentation. If they end in a scope token, the following lines must be indented relative to the indentation of the previous line. ```puck pub func foo() = print "Hello, world!" pub func bar() = # this line is no longer in the above scope. print "Another function declaration." ``` Dedented lines *not* beginning or ending with a continuation token are treated as no longer in the previous scope, returning to the scope of the according indentation level. ```puck if cond then this else that match cond of this then ... of that then ... ``` A line beginning with a scope token is treated as attached to the previous expression. ```puck # Technically allowed. Please don't do this. let foo = ... if cond then if cond then this else that for i in iterable do ... match foo of this then ... of that then ... match foo of this then ... of that then ... ``` This *can* lead to some ugly possibilities for formatting that are best avoided. ```puck # Much preferred. let foo = ... let foo = ... if cond then if cond then this else that if cond then if cond then this else that for i in iterable do ... for i in iterable do ... match foo of this then ... of that then ... ``` The indentation rules are complex, but the effect is such that long statements can be broken *almost* anywhere. ## Expression Rules First, a word on the distinction between *expressions* and *statements*. Expressions return a value. Statements do not. That is all. There are some syntactic constructs unambiguously recognizable as statements: all declarations, modules, and `use` statements. There are no syntactic constructs unambiguously recognizable as expressions. As calls returning `void` are treated as statements, and expressions that return a type could possibly return `void`, there is no explicit distinction between expressions and statements made in the parser: or anywhere before type-checking. Expressions can go almost anywhere. Our indentation rules above allow for it. ```puck # Some different formulations of valid expressions. if cond then this else that if cond then this else that if cond then this else that if cond then this else that let foo = if cond then this else that ``` ```puck # Some different formulations of *invalid* expressions. # These primarily break the rule that everything following a scope token # (ex. `=`, `do`, `then`) not at the end of the line must be self-contained. let foo = if cond then this else that let foo = if cond then this else that let foo = if cond then this else that # todo: how to handle this? if cond then if cond then that else that # shrimple if cond then if cond then that else that # this should be ok if cond then this else that match foo of this then ... of that then ... ``` ## Reserved Keywords The following keywords are reserved: - variables: `let` `var` `const` - control flow: `if` `then` `elif` `else` - pattern matching: `match` `of` - error handling: `try` `with` `finally` - loops: `while` `do` `for` `in` - blocks: `loop` `block` `break` `continue` `return` - modules: `pub` `mod` `use` `as` - functions: `func` `varargs` - metaprogramming: `macro` `quote` `when` - ownership: `lent` `mut` `ref` `refc` - types: `type` `struct` `tuple` `union` `enum` `class` The following keywords are not reserved, but liable to become so. - `impl` `object` `interface` `concept` `auto` `effect` `case` - `suspend` `resume` `spawn` `pool` `thread` `closure` `static` - `cyclic` `acyclic` `sink` `move` `destroy` `copy` `trace` `deepcopy` The following identifiers are in use by the standard prelude: - logic: `not` `and` `or` `xor` `shl` `shr` `div` `mod` `rem` - logic: `+` `-` `*` `/` `<` `>` `<=` `>=` `==` `!=` `is` - async: `async` `await` - types: `int` `uint` `float` `i[\d]+` `u[\d]+` - `f32` `f64` `f128` - `dec64` `dec128` - types: `bool` `byte` `char` `str` - types: `void` `never` - strings: `&` (string append) The following punctuation is taken: - `=` (assignment) - `.` (chaining) - `,` (parameters) - `;` (statements) - `:` (types) - `#` (comment) - `@` (attributes) - `_` (unused bindings) - `|` (generics) - `\` (string/char escaping) - `()` (parameters, tuples) - `[]` (generics, lists) - `{}` (scope, structs) - `""` (strings) - `''` (chars) - ``` `` ``` (unquoting) - unused on qwerty: `~` `%` `^` `$` - perhaps leave `$` unused. but `~`, `%`, and `^` totally could be... ## A Formal Grammar We now shall take a look at a more formal description of Puck's syntax. Syntax rules are described in [extended Backus–Naur form](https://en.wikipedia.org/wiki/Extended_Backus–Naur_form) (EBNF): however, most rules surrounding whitespace, and scope, and line breaks, are modified to how they would appear after a lexing step. ### Identifiers ``` Ident ::= (Letter | '_') (Letter | Digit | '_')* Letter ::= 'A'..'Z' | 'a'..'z' | '\x80'..'\xff' # todo Digit ::= '0'..'9' ``` ### Literals ``` Int ::= '-'? (DecLit | HexLit | OctLit | BinLit) Float ::= '-'? DecLit '.' DecLit BinLit ::= '0b' BinDigit ('_'? BinDigit)* OctLit ::= '0o' OctDigit ('_'? OctDigit)* HexLit ::= '0x' HexDigit ('_'? HexDigit)* DecLit ::= Digit ('_'? Digit)* BinDigit ::= '0'..'1' OctDigit ::= '0'..'7' HexDigit ::= Digit | 'A'..'F' | 'a'..'f' ``` ### Chars, Strings, and Comments ``` CHAR ::= '\'' (PRINT - '\'' | '\\\'')* '\'' STRING ::= SINGLE_LINE_STRING | MULTI_LINE_STRING COMMENT ::= SINGLE_LINE_COMMENT | MULTI_LINE_COMMENT | EXPRESSION_COMMENT SINGLE_LINE_STRING ::= '"' (PRINT - '"' | '\\"')* '"' MULTI_LINE_STRING ::= '"""' (PRINT | '\n' | '\r')* '"""' SINGLE_LINE_COMMENT ::= '#' PRINT* MULTI_LINE_COMMENT ::= '#[' (PRINT | '\n' | '\r' | MULTI_LINE_COMMENT)* ']#' EXPRESSION_COMMENT ::= '#;' SINGLE_STMT PRINT ::= LETTER | DIGIT | OPR | '"' | '#' | "'" | '(' | ')' | # notably the dual of OPR ',' | ';' | '[' | ']' | '_' | '`' | '{' | '}' | ' ' | '\t' ``` ### Values ``` Value ::= Int | Float | String | Char | Array | Tuple | Struct Array ::= '[' (Expr (',' Expr)*)? ']' Tuple ::= '(' (Ident '=')? Expr (',' (Ident '=')? Expr)* ')' Struct ::= '{' Ident '=' Expr (',' Ident '=' Expr)* '}' ``` ### Variables ``` Decl ::= Let | Var | Const | Func | Type Let ::= 'let' Pattern (':' Type)? '=' Expr Var ::= 'var' Pattern (':' Type)? ('=' Expr)? Const ::= 'pub'? 'const' Pattern (':' Type)? '=' Expr Pattern ::= (Ident ('as' Ident)?) | Char | String | Number | Float | Ident? '(' Pattern (',' Pattern)* ')' ``` ### Declarations ``` Func ::= 'pub'? 'func' Ident Generics? Parameters? (':' Type)? '=' Body Macro ::= 'pub'? 'macro' Ident Generics? Parameters? (':' Type)? '=' Body Generics ::= '[' Ident (':' Type)? (',' Ident (':' Type)?)* ']' Parameters ::= '(' Ident (':' Type)? (',' Ident (':' Type)?)* ')' ``` All arguments to functions must have a type. This is resolved at the semantic level, however. (Arguments to macros may lack types. This signifies a generic node.) ### Types ``` TypeDecl ::= 'pub'? 'type' Ident Generics? '=' Type Type ::= TypeStruct | TypeTuple | TypeEnum | TypeUnion | SugarUnion | TypeClass | (Modifier* (Type | ('[' Type ']'))) TypeStruct ::= 'struct' ('[' Ident ':' Type (',' Ident ':' Type)* ']')? TypeUnion ::= 'union' ('[' Ident ':' Type (',' Ident ':' Type)* ']')? SugarUnion ::= '(' Ident ':' Type (',' Ident ':' Type)* ')' TypeTuple ::= 'tuple' ('[' (Ident ':')? Type (',' (Ident ':')? Type)* ']')? TypeEnum ::= 'enum' ('[' Ident ('=' Expr)? (',' Ident ('=' Expr)?)* ']')? TypeClass ::= 'class' ('[' Signature (',' Signature)* ']')? Modifier ::= 'ref' | 'refc' | 'ptr' | 'lent' | 'mut' | 'const' Signature ::= Ident Generics? ('(' Type (',' Type)* ')')? (':' Type)? ``` ## Control Flow ``` If ::= 'if' Expr 'then' Body ('elif' Expr 'then' Body)* ('else' Body)? When ::= 'when' Expr 'then' Body ('elif' Expr 'then' Body)* ('else' Body)? Try ::= 'try' Body ('with' Pattern (',' Pattern)* 'then' Body)+ ('finally' Body)? Match ::= 'match' Expr ('of' Pattern (',' Pattern)* ('where' Expr)? 'then' Body)+ While ::= 'while' Expr 'do' Body For ::= 'for' Pattern 'in' Expr 'do' Body Loop ::= 'loop' Body Block ::= 'block' Ident? Body Const ::= 'const' Body Quote ::= 'quote' QuoteBody ``` ## Modules ``` Mod ::= 'pub'? 'mod' Ident '=' Body Use ::= 'use' Ident ('.' Ident)* ('.' ('[' Ident (',' Ident)* ']'))? ``` ### Operators ``` Operator ::= 'and' | 'or' | 'not' | 'xor' | 'shl' | 'shr' | 'div' | 'mod' | 'rem' | 'is' | 'in' | Opr+ Opr ::= '=' | '+' | '-' | '*' | '/' | '<' | '>' | '@' | '$' | '~' | '&' | '%' | '|' | '!' | '?' | '^' | '.' | ':' | '\\' ``` ## Calls and Expressions This section is (quite) inaccurate due to complexities with respect to significant indentation. Heed caution. ``` Call ::= Ident ('[' Call (',' Call)* ']')? ('(' (Ident '=')? Call (',' (Ident '=')? Call)* ')')? | Ident Call (',' Call)* | Call Operator Call? | Call Body Stmt ::= Let | Var | Const | Func | Type | Mod | Use | Expr Expr ::= Block | Const | For | While | Loop | If | When | Try | Match | Call Body ::= (Stmt ';')* Expr ``` --- References: - [Statements vs. Expressions](https://www.joshwcomeau.com/javascript/statements-vs-expressions/) - [Swift's Lexical Structure](https://docs.swift.org/swift-book/ReferenceManual/LexicalStructure.html) - [The Nim Programming Language](https://nim-lang.github.io/Nim/manual.html) - [Pietro's Notes on Compilers](https://pgrandinetti.github.io/compilers/)