diff options
author | JJ | 2024-01-02 05:53:00 +0000 |
---|---|---|
committer | JJ | 2024-01-02 05:53:00 +0000 |
commit | d91543bc481401f0b3db3abcec43dbf3acf167e6 (patch) | |
tree | 73eae7f801232ab4b783e7d84f0888c7405b8576 | |
parent | 3fb0a86c7cf24b5f3e0aa28ea97691152f42bb4a (diff) |
docs: finish overview prose, cleanups to syntax and types
-rw-r--r-- | README.md | 2 | ||||
-rw-r--r-- | docs/BASIC.md | 63 | ||||
-rw-r--r-- | docs/SYNTAX.md | 184 | ||||
-rw-r--r-- | docs/TYPES.md | 97 |
4 files changed, 186 insertions, 160 deletions
@@ -3,7 +3,7 @@ A place where I can make some bad decisions. Puck is an experimental, memory safe, structurally typed, interface-first, imperative programming language. -It aims to be clean and succinct while performant: inspired by the syntax and metaprogramming of [Nim](https://nim-lang.org/), the error handling of [Swift](https://www.swift.org/), the performance and safety guarantees of [Rust](https://www.rust-lang.org/), and the module system of [OCaml](https://ocaml.org/). +It aims to be clean and succinct while performant: inspired by the syntax and metaprogramming of [Nim](https://nim-lang.org/), the error handling of [Swift](https://www.swift.org/), the performance and safety guarantees of [Rust](https://www.rust-lang.org/), the async/await and comptime of [Zig](https://ziglang.org/), and the module system of [OCaml](https://ocaml.org/). <details> <summary><b>Example: Interfaces</b></summary> diff --git a/docs/BASIC.md b/docs/BASIC.md index ee8e68f..df4b22f 100644 --- a/docs/BASIC.md +++ b/docs/BASIC.md @@ -14,7 +14,20 @@ const compile_time = when linux: "linux" else: "windows" Variables may be mutable (`var`), immutable (`let`), or compile-time evaluated and immutable (`const`). Type annotations on variables and other bindings follow the name of the binding (with `: Type`), and are typically optional. Variables are conventionally written in `snake_case`. Types are conventionally written in `PascalCase`. -The type system is comprehensive, and complex enough to warrant delaying covering until the end. +The type system is comprehensive, and complex enough to warrant delaying full coverage of until the end. Some basic types are of note, however: +- `int`, `uint`: signed and unsigned integers + - `i8`/`i16`/`i32`/`i64`/`i128`: their fixed-size counterparts + - `u8`/`u16`/`u32`/`u64`/`u128`: their fixed-size counterparts +- `float`, `decimal`: floating-point numbers + - `f32`/`f64`/`f128`: their fixed-size counterparts + - `dec64`/`dec128`: their fixed-size counterparts +- `byte`: an alias to `u8`, representing one byte +- `chr`: an alias to `u32`, representing one Unicode character +- `bool`: defined as `union[false, true]` +- `array[T, S]`: primitive fixed-size (`S`) arrays +- `list[T]`: dynamic lists +- `str`: mutable strings. internally a `list[byte]`, externally a `list[chr]` +- `slice[T]`: borrowed "views" into the three types above Comments are declared with `#` and run until the end of the line. Documentation comments are declared with `##` and may be parsed by language servers and other tooling. @@ -35,9 +48,7 @@ pub yeet func pretty_print[T](value: T) = print!(value) ``` -Functions are declared with the `func` keyword. They take an (optional) list of generic parameters (in brackets), an (optional) list of parameters (in parentheses), and **must** be annotated with a return type if they return a type. Every (non-generic) parameter must be annotated with a parameter. Generic parameters may be each optionally annotated with a type functioning as a _constraint_. - -Every function parameter must be explicitly annotated with a type. Their type may also be prefixed with `mut` or `static`: denoting a *mutable* type (types are copied into functions and thus immutable by default), or a *static* type (known to the compiler at compile time, and usable in `const` exprs). +Functions are declared with the `func` keyword. They take an (optional) list of generic parameters (in brackets), an (optional) list of parameters (in parentheses), and **must** be annotated with a return type if they return a type. Every function parameter must be annotated with a type. Their type may optionally be prefixed with either `mut` or `static`: denoting a *mutable* type (types are copied into functions and thus immutable by default), or a *static* type (known to the compiler at compile time, and usable in `const` exprs). Generic parameters may each be optionally annotated with a type functioning as a _constraint_. <!-- Functions, constants, types, and modules may be optionally prefixed with a `pub` modifier denoting visibility outside the current scope / module. More on the module system later. --> @@ -62,9 +73,15 @@ Boolean logic and integer operations are standard and as one would expect out of The above operations are performed with *operators*, special functions that take a prefixed first argument and (often) a suffixed second argument. Custom operators may be implemented, but they must consist of only a combination of the symbols `=` `+` `-` `*` `/` `<` `>` `@` `$` `~` `&` `%` `|` `!` `?` `^` `\` for the purpose of keeping the grammar context-free. They are are declared identically to functions. -Term (in)equality is expressed with the `==` and `!=` operators. Type equality is expressed with `is`. Subtyping relations may be queried with `of`, which has the additional property of introducing new bindings in the current scope (more on this in the [types document](TYPES.md)). Membership of collections is expressed with `in`, and is overloaded for most types. +Term (in)equality is expressed with the `==` and `!=` operators. Type equality is expressed with `is`. Subtyping relations may be queried with `of`, which has the additional property of introducing new bindings in the current scope (more on this in the [types document](TYPES.md)). <!-- Membership of collections is expressed with `in`, and is overloaded for most types. --> ```puck +let phrase: str = "I am a string! Wheeee! ✨" +for c in phrase: + stdout.write(c) # I am a string! Wheeee! ✨ +for b in phrase.bytes(): + stdout.write(b.chr) # Error: cannot convert between u8 and chr +print phrase.last() # ✨ ``` String concatenation uses a distinct `&` operator rather than overloading the `+` operator (as the complement `-` has no natural meaning for strings). Strings are unified, mutable, internally a byte array, externally a char array, and are stored as a pointer to heap data after their length and capacity (fat pointer). Chars are four bytes and represent a Unicode character in UTF-8 encoding. Slices of strings are stored as a length followed by a pointer to string data, and have non-trivial interactions with the memory management system. More details can be found in the [type system overview](TYPES.md). @@ -79,17 +96,17 @@ All values in Puck must be handled, or explicitly discarded. This allows for con ```puck ``` -Exhaustive structural pattern matching is available with the `match`/`of` statement, and is particularly useful for the `struct` and `union` types. Branches of a `match` statement take a *pattern*, of which the unbound identifiers within will be injected into the branch's scope. Multiple patterns may be used for one branch provided they all bind the same identifiers of the same type. Branches may be *guarded* with the `where` keyword, which takes a conditional, and will necessarily remove the branch from exhaustivity checks. +Exhaustive structural pattern matching is available with the `match`/`of` statement, and is particularly useful for the `struct` and `union` types. `of` branches of a `match` statement take a *pattern*, of which the unbound identifiers within will be injected into the branch's scope. Multiple patterns may be used for one branch provided they all bind the same identifiers of the same type. Branches may be *guarded* with the `where` keyword, which takes a conditional, and will necessarily remove the branch from exhaustivity checks. <!-- todo: structural matching of lists and arrays --> -The `of` statement also stands on its own as a conditional for querying subtype equality. Used as a conditional in `if` statements, it retains the variable injection properties of its `match` counterpart. This allows it to be used as a compact <!-- and coherent --> alternative to `if let` statements in other languages. +The `of` statement also stands on its own as an operator for querying subtype equality. Used as a conditional in `if` statements or `while` loops, it retains the variable injection properties of its `match` counterpart. This allows it to be used as a compact <!-- and coherent --> alternative to `if let` statements in other languages. ```puck func may_fail: Result[T, ref Err] ``` -Error handling is done via a fusion of imperative `try`/`catch` statements and functional `Option`/`Result` types, with much syntactic sugar. Functions may `raise` errors, but should return `Option[T]` or `Result[T, E]` types instead by convention. <!-- Those that `raise` errors or call functions that `raise` errors without handling them must additionally be explicitly marked as `yeet`. This is purely to encourage safe error handling, and is not absolute - there will likely be several builtins considered safe by compiler magic.--> <!-- todo --> +Error handling is done via a fusion of imperative `try`/`catch` statements and functional `Option`/`Result` types, with much syntactic sugar. Functions may `raise` errors, but should return `Option[T]` or `Result[T, E]` types instead by convention. The compiler will note functions that `raise` errors, and force explicit qualification of them via `try`/`catch` statements. A bevy of helper functions and macros are available for `Option`/`Result` types, and are documented and available in the `std.options` and `std.results` modules (included in the prelude by default). Two in particular are of note: the `?` macro accesses the inner value of a `Result[T, E]` or propagates (returns in context) the `Error(e)`, and the `!` accesses the inner value of an `Option[T]` / `Result[T, E]` or raises an error on `None` / the specific `Error(e)`. Both operators take one parameter and so are postfix. (There is additionally another `?` postfix macro, taking in a type, as a shorthand for `Option[T]`) @@ -104,7 +121,7 @@ loop: Three types of loops are available: `for` loops, `while` loops, and infinite loops (`loop` loops). For loops take a binding (which may be structural, see pattern matching) and an iterable object and will loop until the iterable object is spent. While loops take a condition that is executed upon the beginning of each iteration to determine whether to keep looping. Infinite loops are infinite are infinite are infinite are infinite are infinite are infinite and must be manually broken out of. -There is no special concept of iterators: iterable objects are any object that implements the `Iter[T]` interface (more on those in [the type system document](TYPES.md)), that is, provides a `self.next()` function returning an Optional type. As such, iterators are first-class constructs. For loops can be thought of as while loops that unwrap the result of the `next()` function and end iteration upon a `None` value. While loops, in turn, can be thought of as infinite loops with an explicit conditional break. +There is no special concept of iterators: iterable objects are any object that implements the `Iter[T]` interface (more on those in [the type system document](TYPES.md)), that is, provides a `self.next()` function returning an `Option[T]`. As such, iterators are first-class constructs. For loops can be thought of as while loops that unwrap the result of the `next()` function and end iteration upon a `None` value. While loops, in turn, can be thought of as infinite loops with an explicit conditional break. The `break` keyword immediately breaks out of the current loop, and the `continue` keyword immediately jumps to the next iteration of the current loop. Loops may be used in conjunction with blocks for more fine-grained control flow manipulation. @@ -172,26 +189,34 @@ If such behavior is not desired, the `distinct` keyword forces explicit qualific Types, like functions, can be *generic*: declared with "holes" that may be filled in with other types upon usage. A type must have all its holes filled before it can be constructed. The syntax for generics in types much resembles the syntax for generics in functions, and *constraints* and the like also apply. ```puck -let myStruct = struct - a: int - b: int -let myTuple = tuple[int, b: int] -print myTuple.1 +type MyStruct = struct + a: str + b: str +type MyTuple = tuple[str, b: str] + +let a: MyTuple = ("hello", "world") +print a.1 # world ``` Struct and tuple types are declared with `struct[<fields>]` and `tuple[<fields>]`, respectively. Their declarations make them look similar at a glance, but they differ fairly fundamentally. Structs are *unordered*, and every field must be named. They may be constructed with `{}` brackets. Tuples are *ordered* and so field names are optional - names are just syntactic sugar for positional access. Tuples may be constructed with `()` parenthesis. -Puck's type system is *structural*, and there is no better example of what this entails than with structs... todo. This allows for power at the cost of clarity, zero boilerplate multiple inheritance, etc +I am undecided whether to allow *structural subtyping*: that is, `{a: Type, b: Type, c: Type}` being valid in a context expecting `{a: Type, b: Type}`. This has benefits (multiple inheritance with no boilerplate) but also downsides (obvious). It is worth noting that there is no concept of `pub` at a field level on structs - a type is either fully transparent, or fully opaque. This is because such partial transparency breaks with structural initialization (how could one provide for hidden fields?). An idiomatic workaround is to model the desired field structure with a public-facing interface. ```puck type Expr = union - Variable(int) - Abstraction() - Application() # much better + Literal(int) + Variable(str) + Abstraction(param: str, body: ref Expr) + Application(body: ref Expr, arg: ref Expr) ``` +Union types are composed of a list of *variants*. Each variant has a *tag* and an *inner type* the union wraps over. Before the inner type can be accessed, the tag must be pattern matched upon, in order to handle all possible values. These are also known as *sum types* or *tagged unions* in other languages. + +Union types are the bread and butter of structural pattern matching. Composed with structs and tuples, unions provide for a very general programming construct commonly referred to as an *algebraic data type*. +This is often useful as an idiomatic and safer replacement for inheritance. + ```puck pub type Iter[T] = interface next(mut Self): T? @@ -202,7 +227,7 @@ pub type Peek[T] = interface peek_nth(mut Self, int): T? ``` -Interface types function much as type classes in Haskell or traits in Rust do. They are not concrete types, and cannot be constructed - instead, their utility is via indirection, as parameters or as `ref` types, providing constraints that some concrete type must meet. They consist of a list of a list of function signatures, implementations of which must exist for the given type in order to compile. +Interface types function much as type classes in Haskell or traits in Rust do. They are not concrete types, and cannot be constructed - instead, their utility is via indirection, as parameters or as `ref` types, providing constraints that some concrete type must meet. They consist of a list of function signatures, implementations of which must exist for the given type in order to compile. Their major difference, however, is that Puck's interfaces are *implicit*: there is no `impl` block that implementations of their associated functions have to go under. If functions for a concrete type exist satisfying some interface, the type implements that interface. This does run the risk of accidentally implementing an interface one does not desire to, but the author believes such situations are few and far between, well worth the decreased syntactic and semantic complexity, and mitigatable with tactical usage of the `distinct` keyword. diff --git a/docs/SYNTAX.md b/docs/SYNTAX.md index 49fdd59..a1c8bfb 100644 --- a/docs/SYNTAX.md +++ b/docs/SYNTAX.md @@ -1,37 +1,46 @@ # Syntax: A Casual and Formal Look -... - -## A Formal Look - -We now shall take a look at a more formal description of Puck's syntax. Syntax rules are described in extended Backus–Naur form (EBNF) - but most rules surrounding whitespace, and scope, and line breaks, are modified to how they would appear after a lexing step (whitespace is removed, line breaks are normalized, scope is switched to use `{` and `}`). +## Reserved Keywords + +The following keywords are reserved: +- variables: `let` `var` `const` +- control flow: `if` `elif` `else` +- pattern matching: `match` `of` +- loops: `loop` `while` `for` `in` +- blocks: `block` `break` `continue` `return` +- functions: `func` `mut` `static` `varargs` +- modules: `pub` `mod` `use` `as` +- error handling: `try` `catch` `finally` +- metaprogramming: `macro` `quote` `when` +- types: `type` `distinct` `ref` +- types: `struct` `tuple` `union` `enum` `interface` +- reserved: + - `impl` `object` `class` `concept` `auto` `empty` `effect` `case` `nil` + - `suspend` `resume` `spawn` `pool` `thread` `closure` + - `cyclic` `acyclic` `sink` `move` `destroy` `copy` `trace` `deepcopy` + +## A Formal Grammar + +We now shall take a look at a more formal description of Puck's syntax. Syntax rules are described in [extended Backus–Naur form](https://en.wikipedia.org/wiki/Extended_Backus–Naur_form) (EBNF): however, most rules surrounding whitespace, and scope, and line breaks, are modified to how they would appear after a lexing step. ### Identifiers ``` -IDENT ::= LETTER (LETTER | DIGIT | '_')* # todo: support _ -LETTER ::= 'A'..'Z' | 'a'..'z' | '\x80'..'\xff' # todo -DIGIT ::= '0'..'9' +Ident ::= (Letter | '_') (Letter | Digit | '_')* +Letter ::= 'A'..'Z' | 'a'..'z' | '\x80'..'\xff' # todo +Digit ::= '0'..'9' ``` ### Literals ``` -INT_LIT ::= '-'? (DEC_LIT | HEX_LIT | OCT_LIT | BIN_LIT) -BIN_LIT ::= '0b' BIN_DIGIT ('_'? BIN_DIGIT)* -OCT_LIT ::= '0o' OCT_DIGIT ('_'? OCT_DIGIT)* -HEX_LIT ::= '0x' HEX_DIGIT ('_'? HEX_DIGIT)* -DEC_LIT ::= DIGIT ('_'? DIGIT)* -BIN_DIGIT ::= '0'..'1' -OCT_DIGIT ::= '0'..'7' -HEX_DIGIT ::= DIGIT | 'A'..'F' | 'a'..'f' -``` - -### Operators -``` -OPERATOR ::= 'and' | 'or' | 'not' | 'xor' | 'shl' | 'shr' | # todo: more? - 'div' | 'mod' | 'rem' | 'is' | 'isnot' | OPR+ -OPR ::= '=' | '+' | '-' | '*' | '/' | '<' | '>' | # todo: more? - '@' | '$' | '~' | '&' | '%' | '|' | - '!' | '?' | '^' | '.' | ':' | '\\' +Int ::= '-'? (DecLit | HexLit | OctLit | BinLit) +Float ::= '-'? DecLit '.' DecLit +BinLit ::= '0b' BinDigit ('_'? BinDigit)* +OctLit ::= '0o' OctDigit ('_'? OctDigit)* +HexLit ::= '0x' HexDigit ('_'? HexDigit)* +DecLit ::= Digit ('_'? Digit)* +BinDigit ::= '0'..'1' +OctDigit ::= '0'..'7' +HexDigit ::= Digit | 'A'..'F' | 'a'..'f' ``` ### Chars, Strings, and Comments @@ -52,95 +61,90 @@ PRINT ::= LETTER | DIGIT | OPR | ### Values ``` -VALUE ::= INT_LIT | STRING | CHAR | LIST_DECL | ARRAY_DECL | TUPLE_DECL | STRUCT_DECL -LIST_DECL ::= '[' (EXPR (',' EXPR)*)? ']' -ARRAY_DECL ::= '[' (EXPR (',' EXPR)*)? ']' -TUPLE_DECL ::= '(' (IDENT '=')? EXPR (',' (IDENT '=')? EXPR)* ')' -STRUCT_DECL ::= '{' IDENT '=' EXPR (',' IDENT '=' EXPR)* '}' -# note: no union or enum. should struct exist? only in a structural system. +Value ::= Int | Float | String | Char | Array | Tuple | Struct +Array ::= '[' (Expr (',' Expr)*)? ']' +Tuple ::= '(' (Ident ':')? Expr (',' (Ident ':')? Expr)* ')' +Struct ::= '{' Ident ':' Expr (',' Ident ':' Expr)* '}' ``` ### Variables ``` -DECL ::= LET_DECL | VAR_DECL | CONST_DECL | FUNC_DECL | TYPE_DECL -LET_DECL ::= 'let' GROUP ANNOTATION? '=' EXPR -VAR_DECL ::= 'var' GROUP ANNOTATION? ('=' EXPR)? -CONST_DECL ::= 'pub'? 'const' GROUP ANNOTATION? '=' EXPR -GROUP ::= ('(' IDENT (',' IDENT)* ')') | IDENT +Decl ::= Let | Var | Const | Func | Type +Let ::= 'let' Pattern Annotation? '=' Expr +Var ::= 'var' Pattern Annotation? ('=' Expr)? +Const ::= 'pub'? 'const' Pattern Annotation? '=' Expr +Pattern ::= Char | String | Number | Float | Ident | '(' Pattern (',' Pattern)* ')' + Ident '(' Pattern (',' Pattern)* ')' ``` -### Functions +### Declarations ``` -FUNC_DECL ::= SIGNATURE '=' (EXPR | STMT) -SIGNATURE ::= 'pub'? ('pure' | 'yeet' | IDENT)? 'func' IDENT GENERICS? PARAMETERS? -PARAMETERS ::= '(' (PARAMETER (',' PARAMETER)?)? ')' -PARAMETER ::= ('var' | 'static')? IDENT ANNOTATION? -GENERICS ::= '[' IDENT ANNOTATION? (',' IDENT ANNOTATION?)* ']' -ANNOTATION ::= ':' TYPE_DESC +Func ::= 'pub'? 'func' Ident Generics? Parameters? Annotation? '=' Body +Macro ::= 'pub'? 'macro' Ident Generics? Parameters? Annotation? '=' Body +Generics ::= '[' Ident Annotation? (',' Ident Annotation?)* ']' +Parameters ::= '(' Ident Annotation? (',' Ident Annotation?)* ')' +Annotation ::= ':' Type ``` ### Types ``` -TYPE_DECL ::= 'pub'? 'type' IDENT GENERICS? '=' 'ref'? 'distinct'? TYPE_DESC -TYPE_DESC ::= TUPLE_TYPE | STRUCT_TYPE | UNION_TYPE | ENUM_TYPE | INTERFACE | IDENT | - (TYPE_DESC ('|' TYPE_DESC)+) -TUPLE_TYPE ::= 'tuple' '[' (IDENT ':')? TYPE_DESC (',' (IDENT ':')? TYPE_DESC)* ']' -STRUCT_TYPE ::= 'struct' '[' IDENT ANNOTATION (',' IDENT ANNOTATION)* ']' -UNION_TYPE ::= 'union' '[' IDENT ANNOTATION (',' IDENT ANNOTATION)* ']' -ENUM_TYPE ::= 'enum' '[' IDENT ('=' EXPR)? (',' IDENT ('=' EXPR)?)* ']' -FUNC_TYPE ::= 'func' GENERICS? PARAMETERS? ANNOTATION? -INTERFACE ::= 'interface' '[' SIGNATURE (',' SIGNATURE)* ('for' TYPE_DESC)? ']' +TypeDecl ::= 'pub'? 'type' Ident Generics? '=' Type +Type ::= StructType | TupleType | EnumType | UnionType | Interface | + (('distinct' | 'ref' | 'ptr' | 'mut' | 'static') (Type | ('[' Type ']'))?) +StructType ::= 'struct' ('[' Ident ':' Type (',' Ident ':' Type)* ']')? +UnionType ::= 'union' ('[' Ident ':' Type (',' Ident ':' Type)* ']')? +TupleType ::= 'tuple' ('[' (Ident ':')? Type (',' (Ident ':')? Type)* ']')? +EnumType ::= 'enum' ('[' Ident ('=' Expr)? (',' Ident ('=' Expr)?)* ']')? +Interface ::= 'interface' ('[' Signature (',' Signature)* ']')? +Signature ::= Ident Generics? ('(' Type (',' Type)* ')')? Annotation? ``` ## Control Flow ``` -IF_EXPR ::= 'if' EXPR '{' EXPR '}' ('elif' EXPR '{' EXPR '}')* 'else' '{' EXPR '}' -IF_STMT ::= 'if' EXPR '{' STMT '}' ('elif' EXPR '{' STMT '}')* ('else' '{' STMT '}')? -WHEN_EXPR ::= 'when' EXPR '{' EXPR '}' ('elif' EXPR '{' EXPR '}')* 'else' '{' EXPR '}' -WHEN_STMT ::= 'when' EXPR '{' STMT '}' ('elif' EXPR '{' EXPR '}')* ('else' '{' STMT '}')? -BLOCK_EXPR ::= 'block' IDENT? '{' EXPR '}' -BLOCK_STMT ::= 'block' IDENT? '{' STMT '}' -MATCH_EXPR ::= 'match' EXPR '{' - ('case' EXPR ('where' EXPR)? (',' EXPR ('where' EXPR)?)* '{' EXPR '}')+ '}' -MATCH_STMT ::= 'match' EXPR '{' - ('case' EXPR ('where' EXPR)? (',' EXPR ('where' EXPR)?)* '{' STMT '}')+ '}' -LOOP_STMT ::= 'loop' '{' STMT '}' -WHILE_STMT ::= 'while' EXPR '{' STMT '}' -FOR_STMT ::= 'for' GROUP 'in' EXPR '{' STMT '}' +If ::= 'if' Expr ':' Body ('elif' Expr ':' Body)* ('else' ':' Body)? +When ::= 'when' Expr ':' Body ('elif' Expr ':' Body)* ('else' ':' Body)? +Try ::= 'try' ':' Body + ('except' Ident ('as' Ident)? (',' Ident ('as' Ident)?)*) ':' Body)* + ('finally' ':' Body)? +Match ::= 'match' Expr ('of' Pattern (',' Pattern)* ('where' Expr)? ':' Body)+ +Block ::= 'block' Ident? ':' Body +Block ::= 'static' ':' Body +Loop ::= 'loop' ':' Body +While ::= 'while' Expr ':' Body +For ::= 'for' Pattern 'in' Expr Body ``` ## Modules ``` -IMPORT_STMT ::= 'import' IDENT_AS? - ('/' (IDENT_AS | '[' (IDENT_AS (',' IDENT_AS)*)? ']'))* -EXPORT_STMT ::= 'export' IDENT_AS? - ('/' (IDENT_AS | '[' (IDENT_AS (',' IDENT_AS)*)? ']'))* -MODULE_STMT ::= 'module' IDENT '{' STMT '}' -IDENT_AS ::= IDENT ('as' IDENT)? +Mod ::= 'pub'? 'mod' Ident ':' Body +Use ::= 'use' Ident ('/' Ident)* ('/' ('[' Ident (',' Ident)* ']'))? ``` -## Macros +### Operators ``` -MACRO_FUNC ::= IDENT '(' EXPR ')' -MACRO_BLOCK ::= IDENT '{' EXPR '}' # todo +Operator ::= 'and' | 'or' | 'not' | 'xor' | 'shl' | 'shr' | + 'div' | 'mod' | 'rem' | 'is' | 'in' | + Opr+ +Opr ::= '=' | '+' | '-' | '*' | '/' | '<' | '>' | + '@' | '$' | '~' | '&' | '%' | '|' | + '!' | '?' | '^' | '.' | ':' | '\\' ``` -## Calls, Statements, and Expressions +## Calls and Expressions ``` -OPERATION ::= EXPR OPERATOR EXPR -PREFIX ::= OPERATOR EXPR -SUFFIX ::= EXPR OPERATOR -APPLICATION ::= IDENT PARAMS? | IDENT EXPR | (APPLICATION | PREFIX | SUFFIX) '.' IDENT PARAMS? -PARAMS ::= '(' ((IDENT '=')? EXPR (',' (IDENT '=')? EXPR)*)? ')' +Call ::= Ident ('[' Call (',' Call)* ']')? ('(' (Ident '=')? Call (',' (Ident '=')? Call)* ')')? | + Ident Call (',' Call)* | + Call Operator Call? | + Call ':' Body +Expr ::= Let | Var | Const | Func | Type | Mod | Use | Block | Static | + For | While | Loop | If | When | Try | Match | Call +Body ::= Expr | ('{' Expr (';' Expr)* '}') ``` -``` -EXPR ::= IF_EXPR | WHEN_EXPR | BLOCK_EXPR | MATCH_EXPR | - MACRO_FUNC | MACRO_BLOCK | - APPLICATION | OPERATION | PREFIX | SUFFIX | - VALUE | (STMT EXPR) # todo -STMT ::= IF_STMT | WHEN_STMT | BLOCK_STMT | MATCH_STMT | - LOOP_STMT | WHILE_STMT | FOR_STMT | - IMPORT_STMT | EXPORT_STMT | MODULE_STMT | - ((APPLICATION | OPERATION | PREFIX | SUFFIX | DECL) ';') | (STMT+ STMT) -``` +--- + +References: +- https://www.joshwcomeau.com/javascript/statements-vs-expressions/ +- https://pgrandinetti.github.io/compilers/ +- https://docs.swift.org/swift-book/ReferenceManual/LexicalStructure.html +- https://nim-lang.github.io/Nim/manual.html diff --git a/docs/TYPES.md b/docs/TYPES.md index 0e463ec..f666ce8 100644 --- a/docs/TYPES.md +++ b/docs/TYPES.md @@ -6,25 +6,23 @@ Puck has a comprehensive static type system, inspired by the likes of Nim, Rust, Basic types can be one-of: - `bool`: internally an enum. -- `int`: integer number. x bits of precision by default. +- `int`: integer number. x bits of precision by default. <!-- - overflow into bigints for safety and ease of cryptographical code. --> - `uint`: same as `int`, but unsigned for more precision. - `i8`, `i16`, `i32`, `i64`, `i128`: specified integer size - `u8`, `u16`, `u32`, `u64`, `u128`: specified integer size - `float`: floating-point number. - `f32`, `f64`: specified float sizes +- `decimal`: precision decimal number. <!-- https://en.wikipedia.org/wiki/IEEE_754 --> + - `dec32`, `dec64`, `dec128`: specified decimal sizes - `byte`: an alias to `u8`. -- `char`: a *four-byte* value. For working with Unicode. An alias to `u32`. - - these are *packed* when part of a string: and so indexing directly into a string is a no-op. string access is O(n), swift-style. +- `char`: a distinct alias to `u32`. For working with Unicode. <!-- - these are *packed* when part of a string: and so indexing directly into a string is a no-op. string access is O(n), swift-style. --> - `str`: a string type. mutable. internally a byte-array: externally a char-array. -- `void`: an internal type designating the absence of a value. - - possibly, the empty tuple. then would `empty` be better? or `unit`? -- `never`: a type that denotes functions that do not return. - - distinct from returning nothing. - - the bottom type. +- `void`: an internal type designating the absence of a value. often elided. <!-- - possibly, the empty tuple. then would `empty` be better? or `unit`? --> +- `never`: a type that denotes functions that do not return. distinct from returning nothing. <!-- - the bottom type. --> -`bool`, `int`/`uint` and siblings, `float` and siblings, `byte`, and `char` are all considered **primitive types** and are _always_ copied (unless passed as `var`). More on when parameters are passed by value (copied) vs. passed by reference can be found in the [memory management document](MEMORY_MANAGEMENT.md). +`bool` and `int`/`uint`/`float` and siblings (and subsequently `byte` and `char`) are all considered **primitive types** and are _always_ copied (unless passed as mutable). More on when parameters are passed by value vs. passed by reference can be found in the [memory management document](MEMORY_MANAGEMENT.md). -Primitive types combine with `str`, `void`, and `never` to form the **basic types**. `void` and `never` will rarely be referenced by name: instead, the absence of a type typically implicitly denotes one or the other. Still, having a name is helpful in some situations. +Primitive types combine with `str`, `void`, and `never` to form **basic types**. `void` and `never` will rarely be referenced by name: instead, the absence of a type typically implicitly denotes one or the other. Still, having a name is helpful in some situations. ### integers @@ -39,34 +37,27 @@ Strings are: - prefixed with their length and capacity - automatically resize like a list -They are also quite complicated. Puck has full support for Unicode and wishes to be intuitive, performant, and safe, as all languages wish to be. Strings present a problem that much effort has been spent on in Swift and Rust (primarily) to solve. -## Container types +They are also quite complicated. Puck has full support for Unicode and wishes to be intuitive, performant, and safe, as all languages wish to be. Strings present a problem that much effort has been expended on in (primarily) Swift and Rust to solve. -Container types, broadly speaking, are types that contain other types. These exclude the types in [advanced types](#advanced-types). +## Abstract Types -### Container types +Abstract types, broadly speaking, are types described by their *behavior* rather than their *implementation*. They are more commonly know as abstract *data* types: which is confusingly similar to "algebraic data types", another term for the [advanced types](#advanced-types) they are built out of under the hood. We refer to them here as "abstract types" to mitigate some confusion. -Container types can be one-of: -- `array[S, T]`: Fixed-size arrays. Can only contain one type `T`. Of size `S` and cannot grow/shrink. - - Initialized in-place with `[a, b, c]`. -- `list[T]`: Dynamic arrays. Can only contain one type `T`. May grow/shrink dynamically. - - Initialized in-place with `[a, b, c]`. (this is the same as arrays!) <!-- Disambiguated from arrays in much the same way uints are disambiguated from ints. --> -- `slice[T]`: Slices. Used to represent a "view" into some sequence of elements of type `T`. - - Cannot be directly constructed. They are **unsized**. <!-- May be initialized from an array, list, or string, or may be used as a generic parameter on functions (more on that later). --> - - Slices cannot grow/shrink. Their elements may be accessed and mutated. As they are underlyingly a reference to an array or list, they **must not** outlive the data they reference. -- `str`: Strings. Complicated, they are alternatively `list[byte]` or `list[char]` depending on who's asking. In the context of iterable types, they are treated as `list[char]`. +### iterable types -All of these above types are some sort of sequence: and so have a length, and so can be _iterated_. -For convenience, a special `iterable` generic type is defined for use in parameters: that abstracts over all of the container types. This `iterable` type is also extended to any collection of a single type with a length. -- Is this redundant with `slice[T]`? todo +Iterable types can be one-of: +- `array[S, T]`: Fixed-size arrays. Can only contain one type `T`. Of a fixed size `S` and cannot grow/shrink, but can mutate. Initialized in-place with `[a, b, c]`. +- `list[T]`: Dynamic arrays. Can only contain one type `T`. May grow/shrink dynamically. Initialized in-place with `[a, b, c]`. (this is the same as arrays!) <!-- Disambiguated from arrays in much the same way uints are disambiguated from ints. --> +- `slice[T]`: Slices. Used to represent a "view" into some sequence of elements of type `T`. Cannot be directly constructed: they are **unsized**. Cannot grow/shrink, but their elements may be accessed and mutated. As they are underlyingly a reference to an array or list, they **must not** outlive the data they reference: this is non-trivial, and so slices interact in complex ways with the memory management system. <!-- possible syntax sugar: `[T]` --> +- `str`: Strings. Described above. They are alternatively treated as either `list[byte]` or `list[char]`, depending on who's asking. Initialized in-place with `"abc"`. -Elements of container types can be accessed by the `container[index]` syntax. Slices of container types can be accessed by the `container[lowerbound..upperbound]` syntax. Slices of non-consecutive elements can be accessed by the `container[a,b,c..d]` syntax, and indeed, the previous example expands to these. They can also be combined: `container[a,b,c..d]`. +These iterable types are commonly used, and bits and pieces of compiler magic are used here and there (mostly around initialization, and ownership) to ease use. All of these types are some sort of sequence: and implement the `Iter` interface, and so can be iterated (hence the name). -### Abstract data types +### other abstract types -There are an additional suite of related types: abstract data types. While falling under container types, these do not have a necessarily straightforward or best implementation, and so multiple implementations are provided. +Unlike the iterable types above, these abstract types do not have a necessarily straightforward or best implementation, and so multiple implementations are provided in the standard library. -Abstract data types can be one-of: +These abstract data types can be one-of: - `BitSet[T]`: high-performance sets implemented as a bit array. - These have a maximum data size, at which point the compiler will suggest using a `HashSet[T]` instead. - `AssocTable[T, U]`: simple symbol tables implemented as an association list. @@ -74,31 +65,26 @@ Abstract data types can be one-of: - `HashSet[T]`: standard hash sets. - `HashTable[T, U]`: standard hash tables. -Unlike iterable types, abstract data types are not iterable by default: as they are not ordered, and thus, it is not clear how they should be iterated. Despite this: for utility purposes, an `elems()` iterator based on a normalization of the elements is provided for `set` and `HashSet`, and `keys()`, `values()`, and `pairs()` iterators are provided for `table` and `HashTable` based on a normalization of the keys. This is deterministic to prevent user reliance on shoddy randomization, see Golang. +These abstract types do not have a natural *ordering*, unlike the iterable types above, and thus do not implement `Iter`. Despite this: for utility an `elems()` iterator based on a normalization of the elements is provided for `set` and `HashSet`, and `keys()`, `values()`, and `pairs()` iterators are provided for `table` and `HashTable` (based on a normalization of the keys). <!-- this is deterministic to prevent user reliance on shoddy randomization: see Go --> -## Parameter types +## Parameter Types Some types are only valid when being passed to a function, or in similar contexts. No variables may be assigned these types, nor may any function return them. These are monomorphized into more specific functions at compile-time if needed. Parameter types can be one-of: -- mutable: `func foo(a: mut str)`: Denotes the mutability of a parameter. Parameters are immutable by default. - - Passed as a `ref` if not one already, and marked mutable. -- static: `func foo(a: static str)`: Denotes a parameter whose value must be known at compile-time. Useful with `when` for writing generic code. +- mutable: `func foo(a: mut str)`: Marks a parameter as mutable (parameters are immutable by default). Passed as a `ref` if not one already. +- static: `func foo(a: static str)`: Denotes a parameter whose value must be known at compile-time. Useful in macros, and with `when` for writing generic code. - generic: `func foo[T](a: list[T], b: T)`: The standard implementation of generics, where a parameter's exact type is not listed, and instead statically dispatched based on usage. -- constrained: `func foo(a: str | int | float)`: A basic implementation of generics, where a parameter can be one-of several listed types. Makes for particularly straightforward monomorphization. - <!-- - Separated with the bitwise or operator `|` rather than the symbolic or `||` or a raw `or` to give the impression that there isn't a corresponding "and" operation (the `&` operator is preoccupied with strings). --> -- functions: `func foo(a: func (x, y: int): int)`: First-class functions. - - Functions may be prefixed with modifiers: one-of `pure`, `yeet`, `async`. - - Syntactic sugar is available: `func foo(a: (int, int) -> int)`. This is not usable with modifiers. -- interfaces: `func foo(a: Stack[int])`: User-defined duck typing. Equivalent to and more general than Rust's traits or Haskell's typeclasses. - - ex. for above: the `Stack[T]` interface specifies generic `push` and `pop` operations - - More on interfaces in the [interfaces section](#interfaces). -- built-in interface: `func foo(a: enum)`: Included, special interfaces for being generic over [advanced types](#advanced-types). - - Of note is how `slice[T]` functions: it is generic over `lists` and `arrays` of any length. - -These parameter types (except `static`) share a common trait: they are not *sized*. The exact type is not generally known until compilation - and in the case of interfaces, sometimes not even during compilation! As the size is not always rigorously known, problems arise when attempting to construct parameter types or compose them with other types: and so this is disallowed. Not all is lost, however, as they may still be used with *indirection* - detailed in the [section on reference types](#reference-types). +- constrained: `func foo(a: str | int | float)`: A basic implementation of generics, where a parameter can be one-of several listed types. The only allowed operations on such parameters are those shared by each type. Makes for particularly straightforward monomorphization. <!-- - Separated with the bitwise or operator `|` rather than the symbolic or `||` or a raw `or` to give the impression that there isn't a corresponding "and" operation (the `&` operator is preoccupied with strings). --> +- functions: `func foo(a: (int, int) -> int)`: First-class functions. All functions are first class - function declarations implicitly have this type, and may be bound in variable declarations. However, the function *type* is only terribly useful as a parameter type. +- slices: `func foo(a: slice[...])`: Slices of existing lists, strings, and arrays. Generic over length. These are references under the hood, may be either immutable or mutable (with `mut`), and interact non-trivially with Puck's [ownership system](MEMORY_MANAGEMENT.md). +- interfaces: `func foo(a: Stack[int])`: Implicit typeclasses. More in the [interfaces section](#interfaces). + - ex. for above: `type Stack[T] = interface[push(mut Self, T); pop(mut Self): T]` +- built-in interfaces: `func foo(a: struct)`: Included, special interfaces for being generic over [advanced types](#advanced-types). These include `struct`, `tuple`, `union`, `enum`, `interface`, and others. + +Several of these parameter types - specifically, slices, functions, and interfaces - share a common trait: they are not *sized*. The exact size of the type is not generally known until compilation - and in some cases, not even during compilation! As the size is not always rigorously known, problems arise when attempting to construct these parameter types or compose them with other types: and so this is disallowed. They may still be used with *indirection*, however - detailed in the [section on reference types](#reference-types). ### generic types @@ -181,8 +167,8 @@ type LinkedNode[T] = struct data: T let node = { - previous, next = None - data = 413 + previous: None, next: None + data: 413 } func pretty_print(node: LinkedNode[int]) = @@ -195,7 +181,7 @@ prints_data(node) ``` Structs are *structural* and so structs composed entirely of fields with the same signature (identical in name and type) are considered *equivalent*. -This is part of a broader structural trend in the type system, and is discussed in detail in [subtyping](#subtyping). +This is part of a broader structural trend in the type system, and is discussed in detail in the section on [subtyping](#subtyping). ### tuples @@ -360,6 +346,17 @@ Instead of giving in, the compiler instead disallows any non-initializations or todo: consider user-defined defaults (ex. structs) +### signatures and overloading + +Puck supports *overloading* - that is, there may exist multiple functions, or multiple types, or multiple modules, so long as they have the same *signature*. +The signature of a function / type / module is important. Interfaces, among other constructs, depend on the user having some understanding of what the compiler considers to be a signature. +So, it is stated here explicitly: +- The signature of a function is its name and the *types* of each of its parameters, in order. Optional parameters are ignored. Generic parameters are ??? + - ex. ... +- The signature of a type is its name and the number of generic parameters. + - ex. both `Result[T]` and `Result[T, E]` are defined in `std.results` +- The signature of a module is just its name. This may change in the future. + ### subtyping Mention of subtyping has been on occasion in contexts surrounding structural type systems, particularly the section on distinct types, but no explicit description of what the subtyping rules are have been given. |