Syntax: A Casual and Formal Look

Call Syntax

There is little difference between a function, macro, and operator call. There are only a few forms such calls can take, too, though notably more than most other languages (due to, among other things, uniform function call syntax): hence this section.

# The standard, unambiguous call.
routine(1, 2, 3, 4)
# The method call syntax equivalent.
1.routine(2, 3, 4)
# A block-based call. This is only really useful for macros taking in a body.
routine
  1
  2
  3
  4
# A parentheses-less call. This is only really useful for `print` and `dbg`.
# Only valid at the start of a line.
routine 1, 2, 3, 4

Binary operators have some special rules.

# Valid call syntaxes for binary operators. What can constitute a binary
# operator is constrained for parsing's sake. Whitespace is optional.
1 + 2
1+2
+ 1, 2 # Only valid at the start of a line. Also, don't do this.
+(1, 2)

As do unary operators.

# The standard call for unary operators. Postfix.
1?
?(1)

Method call syntax has a number of advantages: notably that it can be chained: acting as a natural pipe operator. Redundant parenthesis can also be omitted.

# The following statements are equivalent:
foo.bar.baz
foo().bar().baz()
baz(bar(foo))
baz
  bar
    foo
baz bar(foo)
baz foo.bar

Indentation Rules

The tokens =, then, do, of, else, block, const, block X, and X (where X is an identifier) are scope tokens. They denote a new scope for their associated expressions (functions/macros/declarations, control flow, loops). The tokens ,, . (notably not ...), and all default binary operators (notably not not) are continuation tokens. An expression beginning or ending in one of them would always be a syntactic error.

Line breaks are treated as the end of a statement, with several exceptions.

pub func foo() =
  print "Hello, world!"
  print "This is from a function."

pub func inline_decl() = print "Hello, world!"

Indented lines following a line ending in a scope token are treated as belonging to a new scope. That is, indented lines following a line ending in a scope token form the body of the expression associated with the scope token.

Indentation is not obligatory after a scope token. However, this necessarily constrains the body of the associated expression to one line: no lines following will be treated as an extension of the body, only the expression associated with the original scope token. (This may change in the future.)

pub func foo(really_long_parameter: ReallyLongType,
another_really_long_parameter: AnotherReallyLongType) = # no indentation! this is ok
  print really_long_parameter # this line is indented relative to the first line
  print really_long_type

Lines following a line ending in a continuation token (and, additionally not and () are treated as a continuation of that line and can have any level of indentation (even negative). If they end in a scope token, however, the following lines must be indented relative to the indentation of the previous line.

let really_long_parameter: ReallyLongType = ...
let another_really_long_parameter: AnotherReallyLongType = ...

really_long_parameter
  .foo(another_really_long_parameter) # some indentation! this is ok

Lines beginning in a continuation token (and, additionally )), too, are treated as a continuation of the previous line and can have any level of indentation. If they end in a scope token, the following lines must be indented relative to the indentation of the previous line.

pub func foo() =
  print "Hello, world!"
pub func bar() = # this line is no longer in the above scope.
  print "Another function declaration."

Dedented lines not beginning or ending with a continuation token are treated as no longer in the previous scope, returning to the scope of the according indentation level.

if cond then this
else that

match cond
of this then ...
of that then ...

A line beginning with a scope token is treated as attached to the previous expression.

# Technically allowed. Please don't do this.
let foo
= ...

if cond then if cond then this
else that

for i
in iterable
do ...

match foo of this then ...
of that then ...

match foo of this
then ...
of that then ...

This can lead to some ugly possibilities for formatting that are best avoided.

# Much preferred.

let foo =
  ...
let foo = ...

if cond then
  if cond then
    this
else that
if cond then
  if cond then this
else that

for i in iterable do
  ...
for i in iterable do ...

match foo
of this then ...
of that then ...

The indentation rules are complex, but the effect is such that long statements can be broken almost anywhere.

Expression Rules

First, a word on the distinction between expressions and statements. Expressions return a value. Statements do not. That is all.

There are some syntactic constructs unambiguously recognizable as statements: all declarations, modules, and use statements. There are no syntactic constructs unambiguously recognizable as expressions. As calls returning void are treated as statements, and expressions that return a type could possibly return void, there is no explicit distinction between expressions and statements made in the parser: or anywhere before type-checking.

Expressions can go almost anywhere. Our indentation rules above allow for it.

# Some different formulations of valid expressions.

if cond then
  this
else
  that

if cond then this
else that

if cond
then this
else that

if cond then this else that

let foo =
  if cond then
    this
  else
    that
# Some different formulations of *invalid* expressions.
# These primarily break the rule that everything following a scope token
# (ex. `=`, `do`, `then`) not at the end of the line must be self-contained.

let foo = if cond then
    this
  else
    that

let foo = if cond then this
  else that

let foo = if cond then this
else that

# todo: how to handle this?
if cond then if cond then that
else that

# shrimple
if cond then
  if cond then that
else that

# this should be ok
if cond then this
else that

match foo of
this then ...
of that then ...

Reserved Keywords

The following keywords are reserved:

  • variables: let var const
  • control flow: if then elif else
  • pattern matching: match of
  • error handling: try with finally
  • loops: while do for in
  • blocks: loop block break continue return
  • modules: pub mod use as
  • functions: func varargs
  • metaprogramming: macro quote when
  • ownership: lent mut ref refc
  • types: type struct tuple union enum class

The following keywords are not reserved, but liable to become so.

  • impl object interface concept auto effect case
  • suspend resume spawn pool thread closure static
  • cyclic acyclic sink move destroy copy trace deepcopy

The following identifiers are in use by the standard prelude:

  • logic: not and or xor shl shr div mod rem
  • logic: + - * / < > <= >= == != is
  • async: async await
  • types: int uint float i\d+ u\d+
    • f32 f64 f128
    • dec64 dec128
  • types: bool byte char str
  • types: void never
  • strings: & (string append)

The following punctuation is taken:

  • = (assignment)
  • . (chaining)
  • , (parameters)
  • ; (statements)
  • : (types)
  • # (comment)
  • @ (attributes)
  • _ (unused bindings)
  • | (generics)
  • \ (string/char escaping)
  • () (parameters, tuples)
  • [] (generics, lists)
  • {} (scope, structs)
  • "" (strings)
  • '' (chars)
  • `` (unquoting)
  • unused on qwerty: ~ % ^ $
    • perhaps leave $ unused. but ~, %, and ^ totally could be...

A Formal Grammar

We now shall take a look at a more formal description of Puck's syntax.

Syntax rules are described in extended Backus–Naur form (EBNF): however, most rules surrounding whitespace, and scope, and line breaks, are modified to how they would appear after a lexing step.

Identifiers

Ident  ::= (Letter | '_') (Letter | Digit | '_')*
Letter ::= 'A'..'Z' | 'a'..'z' | '\x80'..'\xff' # todo
Digit  ::= '0'..'9'

Literals

Int ::= '-'? (DecLit | HexLit | OctLit | BinLit)
Float ::= '-'? DecLit '.' DecLit
BinLit ::= '0b' BinDigit ('_'? BinDigit)*
OctLit ::= '0o' OctDigit ('_'? OctDigit)*
HexLit ::= '0x' HexDigit ('_'? HexDigit)*
DecLit ::= Digit ('_'? Digit)*
BinDigit ::= '0'..'1'
OctDigit ::= '0'..'7'
HexDigit ::= Digit | 'A'..'F' | 'a'..'f'

Chars, Strings, and Comments

CHAR    ::= '\'' (PRINT - '\'' | '\\\'')* '\''
STRING  ::= SINGLE_LINE_STRING | MULTI_LINE_STRING
COMMENT ::= SINGLE_LINE_COMMENT | MULTI_LINE_COMMENT | EXPRESSION_COMMENT
SINGLE_LINE_STRING  ::= '"' (PRINT - '"' | '\\"')* '"'
MULTI_LINE_STRING   ::= '"""' (PRINT | '\n' | '\r')* '"""'
SINGLE_LINE_COMMENT ::= '#' PRINT*
MULTI_LINE_COMMENT  ::= '#[' (PRINT | '\n' | '\r' | MULTI_LINE_COMMENT)* ']#'
EXPRESSION_COMMENT  ::= '#;' SINGLE_STMT
PRINT ::= LETTER | DIGIT | OPR |
          '"' | '#' | "'" | '(' | ')' | # notably the dual of OPR
          ',' | ';' | '[' | ']' | '_' |
          '`' | '{' | '}' | ' ' | '\t'

Values

Value ::= Int | Float | String | Char | Array | Tuple | Struct
Array  ::= '[' (Expr (',' Expr)*)? ']'
Tuple  ::= '(' (Ident '=')? Expr (',' (Ident '=')? Expr)* ')'
Struct ::= '{' Ident '=' Expr (',' Ident '=' Expr)* '}'

Variables

Decl  ::= Let | Var | Const | Func | Type
Let   ::= 'let' Pattern (':' Type)? '=' Expr
Var   ::= 'var' Pattern (':' Type)? ('=' Expr)?
Const ::= 'pub'? 'const' Pattern (':' Type)? '=' Expr
Pattern ::= (Ident ('as' Ident)?) | Char | String | Number | Float |
            Ident? '(' Pattern (',' Pattern)* ')'

Declarations

Func  ::= 'pub'? 'func' Ident Generics? Parameters? (':' Type)? '=' Body
Macro ::= 'pub'? 'macro' Ident Generics? Parameters? (':' Type)? '=' Body
Generics   ::= '[' Ident (':' Type)? (',' Ident (':' Type)?)* ']'
Parameters ::= '(' Ident (':' Type)? (',' Ident (':' Type)?)* ')'

All arguments to functions must have a type. This is resolved at the semantic level, however. (Arguments to macros may lack types. This signifies a generic node.)

Types

TypeDecl ::= 'pub'? 'type' Ident Generics? '=' Type
Type     ::= TypeStruct | TypeTuple | TypeEnum | TypeUnion | SugarUnion |
             TypeClass | (Modifier* (Type | ('[' Type ']')))
TypeStruct ::= 'struct' ('[' Ident ':' Type (',' Ident ':' Type)* ']')?
TypeUnion  ::= 'union'  ('[' Ident ':' Type (',' Ident ':' Type)* ']')?
SugarUnion ::= '(' Ident ':' Type (',' Ident ':' Type)* ')'
TypeTuple  ::= 'tuple' ('[' (Ident ':')? Type (',' (Ident ':')? Type)* ']')?
TypeEnum   ::= 'enum'  ('[' Ident ('=' Expr)? (',' Ident ('=' Expr)?)* ']')?
TypeClass  ::= 'class' ('[' Signature (',' Signature)* ']')?
Modifier   ::= 'ref' | 'refc' | 'ptr' | 'lent' | 'mut' | 'const'
Signature  ::= Ident Generics? ('(' Type (',' Type)* ')')? (':' Type)?

Control Flow

If     ::= 'if' Expr 'then' Body ('elif' Expr 'then' Body)* ('else' Body)?
When   ::= 'when' Expr 'then' Body ('elif' Expr 'then' Body)* ('else' Body)?
Try    ::= 'try' Body
           ('except' Ident ('as' Ident)? (',' Ident ('as' Ident)?)*) 'then' Body)+
           ('finally' Body)?
Match  ::= 'match' Expr ('of' Pattern (',' Pattern)* ('where' Expr)? 'then' Body)+
While  ::= 'while' Expr 'do' Body
For    ::= 'for' Pattern 'in' Expr 'do' Body
Loop   ::= 'loop' Body
Block  ::= 'block' Ident? Body
Const  ::= 'const' Body
Quote  ::= 'quote' QuoteBody

Modules

Mod ::= 'pub'? 'mod' Ident '=' Body
Use ::= 'use' Ident ('.' Ident)* ('.' ('[' Ident (',' Ident)* ']'))?

Operators

Operator ::= 'and' | 'or' | 'not' | 'xor' | 'shl' | 'shr' |
             'div' | 'mod' | 'rem' | 'is' | 'in' | Opr+
Opr ::= '=' | '+' | '-' | '*' | '/' | '<' | '>' |
        '@' | '$' | '~' | '&' | '%' | '|' |
        '!' | '?' | '^' | '.' | ':' | '\\'

Calls and Expressions

This section is (quite) inaccurate due to complexities with respect to significant indentation. Heed caution.

Call ::= Ident ('[' Call (',' Call)* ']')? ('(' (Ident '=')? Call (',' (Ident '=')? Call)* ')')? |
         Ident Call (',' Call)* |
         Call Operator Call? |
         Call Body
Stmt ::= Let | Var | Const | Func | Type | Mod | Use | Expr
Expr ::= Block | Const | For | While | Loop | If | When | Try | Match | Call
Body ::= (Stmt ';')* Expr

References: