diff options
Diffstat (limited to 'docs/TYPES.md')
-rw-r--r-- | docs/TYPES.md | 211 |
1 files changed, 108 insertions, 103 deletions
diff --git a/docs/TYPES.md b/docs/TYPES.md index e674739..0e463ec 100644 --- a/docs/TYPES.md +++ b/docs/TYPES.md @@ -1,6 +1,6 @@ # Typing in Puck -Puck has a comprehensive static type system. +Puck has a comprehensive static type system, inspired by the likes of Nim, Rust, and Swift. ## Basic types @@ -10,7 +10,6 @@ Basic types can be one-of: - `uint`: same as `int`, but unsigned for more precision. - `i8`, `i16`, `i32`, `i64`, `i128`: specified integer size - `u8`, `u16`, `u32`, `u64`, `u128`: specified integer size - - overflows into bigints for safety and ease of cryptographical code. - `float`: floating-point number. - `f32`, `f64`: specified float sizes - `byte`: an alias to `u8`. @@ -35,8 +34,8 @@ todo Strings are: - mutable -- internally a byte (uint8) array -- externally a char (uint32) array +- internally a byte array +- externally a char (four bytes) array - prefixed with their length and capacity - automatically resize like a list @@ -45,15 +44,15 @@ They are also quite complicated. Puck has full support for Unicode and wishes to Container types, broadly speaking, are types that contain other types. These exclude the types in [advanced types](#advanced-types). -### Iterable types +### Container types -Iterable types can be one-of: -- `array[S, T]`: Static arrays. Can only contain one type `T`. Of size `S` and cannot grow/shrink. - - Initialize in-place with `array(a, b, c)`. Should we do this? otherwise `[a, b, c]`. +Container types can be one-of: +- `array[S, T]`: Fixed-size arrays. Can only contain one type `T`. Of size `S` and cannot grow/shrink. + - Initialized in-place with `[a, b, c]`. - `list[T]`: Dynamic arrays. Can only contain one type `T`. May grow/shrink dynamically. - - Initialize in-place with `list(a, b, c)`. Should we do this? otherwise `@[a, b, c]`. + - Initialized in-place with `[a, b, c]`. (this is the same as arrays!) <!-- Disambiguated from arrays in much the same way uints are disambiguated from ints. --> - `slice[T]`: Slices. Used to represent a "view" into some sequence of elements of type `T`. - - Cannot be directly constructed. May be initialized from an array, list, or string, or may be used as a generic parameter on functions (more on that later). + - Cannot be directly constructed. They are **unsized**. <!-- May be initialized from an array, list, or string, or may be used as a generic parameter on functions (more on that later). --> - Slices cannot grow/shrink. Their elements may be accessed and mutated. As they are underlyingly a reference to an array or list, they **must not** outlive the data they reference. - `str`: Strings. Complicated, they are alternatively `list[byte]` or `list[char]` depending on who's asking. In the context of iterable types, they are treated as `list[char]`. @@ -68,9 +67,9 @@ Elements of container types can be accessed by the `container[index]` syntax. Sl There are an additional suite of related types: abstract data types. While falling under container types, these do not have a necessarily straightforward or best implementation, and so multiple implementations are provided. Abstract data types can be one-of: -- `set[T]`: high-performance sets implemented as a bit array. +- `BitSet[T]`: high-performance sets implemented as a bit array. - These have a maximum data size, at which point the compiler will suggest using a `HashSet[T]` instead. -- `table[T, U]`: simple symbol tables implemented as an association list. +- `AssocTable[T, U]`: simple symbol tables implemented as an association list. - These do not have a maximum size. However, at some point the compiler will suggest using a `HashTable[T, U]` instead. - `HashSet[T]`: standard hash sets. - `HashTable[T, U]`: standard hash tables. @@ -114,18 +113,20 @@ func length[T](a: T) = # lots of things use .len, but only a few called by this do. # throws a warning if exported for lack of specitivity. -func length[T: str | list](a: T) = +func length(a: str | list) = return a.len ``` The syntax for generics is `func`, `ident`, followed by the names of the generic parameters in brackets `[T, U, V]`, followed by the function's parameters (which may then refer to the generic types). -Generics are replaced with concrete types at compile time (monomorphization) based on their usage in functions within the main function body. +Generics are replaced with concrete types at compile time (monomorphization) based on their usage in function calls within the main function body. -Constrained generics have two syntaxes: the constraint can be directly on a parameter, leaving off the `[T]` box, or it may be defined within the box as `[T: int | float]` for easy reuse in the parameters. +Constrained generics have two syntaxes: the constraint can be defined directly on a parameter, leaving off the `[T]` box, or it may be defined within the box as `[T: int | float]` for easy reuse in the parameters. -## Reference types +Other constructions like modules and type declarations themselves may also be generic. -Types are typically constructed by value on the stack. That is, without any level of indirection: and so type declarations that recursively refer to one another, or involve unsized types (notably including parameter types and interfaces!), would not be allowed. However, Puck provides two avenues for indirection. +## Reference Types + +Types are typically constructed by value on the stack. That is, without any level of indirection: and so type declarations that recursively refer to one another, or involve unsized types (notably including parameter types), would not be allowed. However, Puck provides two avenues for indirection. Reference types can be one-of: - `ref T`: An automatically-managed reference to type `T`. This is a pointer of size `uint` (native). @@ -151,37 +152,22 @@ type UnsafeTree = struct right: ptr UnsafeTree ``` -The `ref` prefix may be placed at the top level of type declarations, or inside on a field of a structural type. `ref` types may often be more efficient when dealing with large data structures. They also provide for the usage of parameter types (except for `static` and `var`) within type declarations. +The `ref` prefix may be placed at the top level of type declarations, or inside on a field of a structural type. `ref` types may often be more efficient when dealing with large data structures. They also provide for the usage of unsized types (functions, interfaces, slices) within type declarations. -The compiler abstracts over `ref` types to provide optimization for reference counts: and so neither a distinction between `Rc`/`Arc`/`Box`, nor a `*` dereference operator is needed. Much care has been given to make references efficient and safe, and so `ptr` should be avoided if at all possible. The compiler will yell at you if you use it (or any other unsafe features). +The compiler abstracts over `ref` types to provide optimization for reference counts: and so a distinction between `Rc`/`Arc`/`Box` is not needed. Furthermore, access implicitly dereferences (with address access available via `.addr`), and so a `*` dereference operator is also not needed. Much care has been given to make references efficient and safe, and so `ptr` should be avoided if at all possible. The compiler will yell at you if you use it (or any other unsafe features). -The implementation of reference types is delved into in further detail in the [document on memory management](MEMORY_MANAGEMENT.md). +The implementation of `ref` is delved into in further detail in the [memory management document](MEMORY_MANAGEMENT.md). ## Advanced Types -The `type` keyword is used to declare custom data types. These are *algebraic*: they function by composition. - -Algebraic data types can be one-of: -- `tuple`: An ordered collection of types. Optionally named. +The `type` keyword is used to declare aliases to custom data types. These types are *algebraic*: they function by composition. Algebraic data types can be one-of: - `struct`: An unordered, named collection of types. May have default values. +- `tuple`: An ordered collection of types. Optionally named. - `enum`: Ordinal labels, that may hold values. Their default values are their ordinality. - `union`: Powerful matchable tagged unions a la Rust. Sum types. -- `interface`: Usage-based typeclasses. User-defined duck typing. -- `distinct`: a type that must be explicitly converted -- type aliases, declared as `type Identifier = Alias` - -### tuples - -Tuples are an *ordered* collection of either named or unnamed types. +- `interface`: Implicit typeclasses. User-defined duck typing. -They are declared with `tuple[Type, identifier: Type, ...]` and initialized with parentheses: `(413, "hello", value: 40000)`. - -They are exclusively ordered - named types within tuples are just syntax sugar for positional access. Passing a fully unnamed tuple into a context that expects a tuple with a named parameter is allowed so long as the types line up in order. - -```puck -``` - -Tuples are particularly useful for "on-the-fly" types. Creating type aliases to tuples is discouraged - structs are generally a better choice for custom type declarations. +There also exist `distinct` types: while `type` declarations define an alias to an existing or new type, `distinct` types define a type that must be explicitly converted to/from. This is useful for having some level of separation from the implicit interfaces that abound. ### structs @@ -211,12 +197,28 @@ prints_data(node) Structs are *structural* and so structs composed entirely of fields with the same signature (identical in name and type) are considered *equivalent*. This is part of a broader structural trend in the type system, and is discussed in detail in [subtyping](#subtyping). +### tuples + +Tuples are an *ordered* collection of either named and/or unnamed types. + +They are declared with `tuple[Type, identifier: Type, ...]` and initialized with parentheses: `(413, "hello", value: 40000)`. Syntax sugar allows for them to be declared with `()` as well. + +They are exclusively ordered - named types within tuples are just syntax sugar for positional access. Passing a fully unnamed tuple into a context that expects a tuple with a named parameter is allowed so long as the types line up in order. + +```puck +let grouping = (1, 2, 3) + +func foo: tuple[string, string] = ("hello", "world") +``` + +Tuples are particularly useful for "on-the-fly" types. Creating type aliases to tuples is discouraged - structs are generally a better choice for custom type declarations. + ### enums -Enums are *ordinal labels* that may have associated values. +Enums are *ordinal labels* that may have *associated values*. They are declared with `enum[Label, AnotherLabel = 4, ...]` and are never initialized (their values are known statically). -Enums may be accessed directly by their label, and are ordinal and iterable regardless of their associated value. +Enums may be accessed directly by their label, and are ordinal and iterable regardless of their associated value. They are useful in collecting large numbers of "magic values", that would otherwise be constants. ```puck type Keys = enum @@ -225,59 +227,63 @@ type Keys = enum B = "b" ``` -In the case of an identifier conflict (with other enum labels, or types, or...) they must be prefixed with the name of their associated type (separated by a dot). (this is standard for identifier conflicts: and is discussed in more detail in the [modules document](MODULES.md).) +In the case of an identifier conflict (with other enum labels, or types, or...) they must be prefixed with the name of their associated type (separated by a dot). This is standard for identifier conflicts: and is discussed in more detail in the [modules document](MODULES.md). ### unions -Unions are *tagged* type unions. They provide a high-level wrapper over an inner type that must be accessed via pattern matching. +Unions are *tagged* type unions. They provide a high-level wrapper over an inner type that must be safely accessed via pattern matching. -They are declared with `union[Variant: Type, ...]` and initialized with the name of a variant followed by its inner type constructor in brackets: `Square(side: 5)`. Tuple and struct types are special-cased to eliminate extraneous parentheses. +They are declared with `union[Variant(Type), ...]` and initialized with the name of a variant followed by its inner type constructor in brackets: `Square(side: 5)`. Tuples and structs are special-cased to eliminate extraneous parentheses. ```puck -type Value = uint64 -type Ident = string +type Value = u64 +type Ident = str type Expr = ref union - Literal: Value - Variable: Ident - Abstraction: struct[param: Ident, body: Expr] - Application: struct[body, arg: Expr] - Conditional: struct - cond, then_case, else_case: Expr + Literal(Value) + Variable(Ident) + Abstraction(param: Ident, body: Expr) + Application(body: Expr, arg: Expr) + Conditional( + condition: Expr + then_case: Expr + else_case: Expr + ) ``` -They take up as much space in memory as the largest variant, plus the size of the tag (typically one byte). +They take up as much space in memory as the largest variant, plus the size of the tag (one byte). #### pattern matching Unions abstract over differing types. In order to *safely* be used, their inner types must be accessed via *pattern matching*: leaving no room for type confusion. Pattern matching in Puck relies on two syntactic constructs: the `match` statement, forcing qualification and handling of all possible types of a variable, and the `of` statement, querying type equality while simultaneously binding new identifiers to underspecified portions of variables. ```puck -import std/tables +use std.tables -func eval(expr: Expr, context: var HashTable[Ident, Value]): Result[Value] +func eval(context: mut HashTable[Ident, Value], expr: Expr): Result[Value] match expr - of Literal(value): value - of Variable(ident): context.get(ident) - of Application{body, arg}: - if body of Abstraction{param, body as inner_body}: - context.set(param, eval(arg)) - inner_body.eval(context) + of Literal(value): Okay(value) + of Variable(ident): + context.get(ident).err("Variable not in context") + of Application(body, arg): + if body of Abstraction(param, body as inner_body): + context.set(param, context.eval(arg)?) # from std.tables + context.eval(inner_body) else: - Error(InvalidExpr) - of Conditional{cond, then_case, else_case}: - if eval(cond, context): - then_case.eval(context) + Error("Expected Abstraction, found {}".fmt(body)) + of Conditional(condition, then_case, else_case): + if context.eval(condition)? == "true": + context.eval(then_case) else: - else_case.eval(context) - of _: - Error(InvalidExpr) + context.eval(else_case) + of expr: + Error("Invalid expression {}".fmt(expr)) ``` -The match statement takes exclusively a list of `of` sub-expressions, and checks for exhaustivity; but the `variable of Type(binding)` syntax can be reused as a conditional, in `if` statements and elsewhere. Each branch of a match expression can have a *guard*: an arbitrary conditional that must be met in order for it to match. Guards are written as `where cond` and immediately follow the last pattern in an `of` branch, preceding the colon. +The match statement takes exclusively a list of `of` sub-expressions, and checks for exhaustivity. The `expr of Type(binding)` syntax can be reused as a conditional, in `if` statements and elsewhere. The `of` *operator* is similar to the `is` operator in that it queries type equality, returning a boolean. However, unbound identifiers within `of` expressions are bound to appropriate values (if matched) and injected into the scope. This allows for succinct handling of `union` types in situations where `match` is overkill. -`match` expressions and `of` operators have some special rules. When matching unions with an inner product type (structs or tuples), external extraneous parenthesis are elided. todo: others? +Each branch of a match expression can also have a *guard*: an arbitrary conditional that must be met in order for it to match. Guards are written as `where cond` and immediately follow the last pattern in an `of` branch, preceding the colon. ### interfaces @@ -287,44 +293,44 @@ The `interface` type is composed of a list of function signatures that refer to ```puck type Stack[T] = interface - func push(self: var Self, val: T) - func pop(self: var Self): T - func peek(self: Self): T + push(self: mut Self, val: T) + pop(self: mut Self): T + peek(self: Self): T func takes_any_stack(stack: Stack[int]) = # only stack.push, stack.pop, and stack.peek are available methods ``` -Differing from Rust, Haskell, and many others, there is no explicit `impl` block. If there exists a function that matches one of an interface's signatures, it is considered to match and the interface typechecks. This may seem strange and ambiguous - but again, static typing and uniform function call syntax help make this reasonable. The purpose of explicit `impl` blocks in ex. Rust is two-fold: to explicitly group together associated code, but primarily to provide a limited form of uniform function call syntax. As any function can be called as either `foo(a, b)` or `a.foo(b)`, `impl` blocks would only serve to group together associated code: which is better done with modules. +Differing from Rust, Haskell, and many others, there is no explicit `impl` block. If there exist functions for a type that satisfy all of an interface's signatures, it is considered to match and the interface typechecks. This may seem strange and ambiguous - but again, static typing and uniform function call syntax help make this a more reasonable design. The purpose of explicit `impl` blocks in ex. Rust is three-fold: to provide a limited form of uniform function call syntax; to explicitly group together associated code; and to disambiguate. UFCS provides for the first, the module system provides for the second, and the third is proposed to not matter. -Interfaces cannot be constructed because they are not sized. They serve purely as a list of valid operations on a type within a context: no information about their memory layout is relevant. The concrete type fulfilling an interface is known at compile time, however, and so there are no issues surrounding interfaces as parameters, just when attempted to be used as (part of) a concrete type. They can be used as part of a concrete type with *indirection*, however: `type Foo = struct[a: int, b: ref interface[...]]` is perfectly valid. +Interfaces cannot be constructed because they are **unsized**. They serve purely as a list of valid operations on a type within a context: no information about their memory layout is relevant. The concrete type fulfilling an interface is known at compile time, however, and so there are no issues surrounding interfaces as parameters, just when attempted to be used as (part of) a concrete type. They can be used as part of a concrete type with *indirection*, however: `type Foo = struct[a: int, b: ref interface[...]]` is perfectly valid. -Interfaces compose with [modules](MODULES.md) to offer fine grained access control. +Interfaces also *cannot* extend or rely upon other interfaces in any way. There is no concept of an interface extending an interface. There is no concept of a parameter satisfying two interfaces. In the author's experience, while such constructions are powerful, they are also an immense source of complexity, leading to less-than-useful interface hierarchies seen in languages like Java, and yes, Rust. -todo: I have not decided whether the names of parameters is / should be relevant, or enforcable, or present. I'm leaning towards them not being present. But if they are enforcable, it makes it harder to implicitly implement the wrong interface. Design notes to consider: https://blog.rust-lang.org/2015/05/11/traits.html - -### generic types +Instead, if one wishes to form an interface that *also* satisfies another interface, they must include all of the other interface's associated functions within the new interface. Given that interfaces overwhelmingly only have a handful of associated functions, and if you're using more than one interface you *really* should be using a concrete type, the hope is that this will provide explicitness. -Types, like functions, can be *generic*: defined for an unknown arbitrary type, monomorphized at compile time. Indeed, we have already seen them before: in the above interface example. The syntax much follows the syntax for generic functions. +<!-- While functions are the primary way of performing operations on types, they are not the only way, and listing all explicitly can be painful - instead, it can be desired to be able to *associate a type* and any field access or existing functions on that type with the interface. todo: i have not decided on the syntax for this yet. --> -```puck -``` - -### distinct types +Interfaces compose with [modules](MODULES.md) to offer fine grained access control. -Puck is a structurally typed language. If you have `type Foo = struct[a: int]` and `type Bar = struct[a, b: int]`, passing a term of type `Bar` into a context that expects `Foo` is allowed: even encouraged. This usage-based paradigm is a consistent underlying theme of Puck. +<!-- todo: I have not decided whether the names of parameters is / should be relevant, or enforcable, or present. I'm leaning towards them not being present. But if they are enforcable, it makes it harder to implicitly implement the wrong interface. Design notes to consider: https://blog.rust-lang.org/2015/05/11/traits.html --> -`distinct` types break this coercion convention. They are primarily useful for extended (type,\_) safety: a `type SqlStr = distinct str` provides an extra barrier to SQL injections, for example, or a `type Point = distinct struct[x, y: int]` can block confusion with a `type Rectangle = distinct struct[x, y: int]`. +### type aliases and distinct types -`distinct` types can still be *explicitly* converted to an equivalent type with `as`. +Any type can be declared as an *alias* to a type simply by assigning it to such. All functions defined on the original type carry over, and functions expecting one type may receive the other with no issues. -### type aliases +```puck +type Float = float +``` -Finally, any type can be declared as an *alias* to a type simply by assigning it to such. +It is no more than an alias. When explicit conversion between types is desired and functions carrying over is undesired, `distinct` types may be used. +```puck +type MyFloat = distinct float +let foo: MyFloat = MyFloat(192.68) ``` -type MyFloat = float -``` + +Types then must be explicitly converted via constructors. ## Errata @@ -339,19 +345,18 @@ But always explicitly initializing types is syntactically verbose, and so most t - `float`, etc: `0.0` - `char`: `'\0'` - `str`: `""` -- `void`, `never`: n/a +- `void`, `never`: unconstructable - `array[T]`, `list[T]`: `[]` - `set[T]`, `table[T, U]`: `{}` - `tuple[T, U, ...]`: `(default values of its fields)` - `struct[T, U, ...]`: `{default values of its fields}` -- `enum[One, Two, ...]`: **disallowed** +- `enum[One, Two, ...]`: `<first label>` - `union[T, U, ...]`: **disallowed** - `slice[T]`, `func`: **disallowed** - `ref`, `ptr`: **disallowed** -For `ref` and `ptr` types, however, this is trickier. There is no reasonable "default" for these types *aside from* null. -Instead of giving in, the compiler will instead disallow any non-initializations or other cases in which a default value would be inserted. -(`slice[T]` and `func` types are references under the hood and also behave in this way) +For unions, slices, references, and pointers, this is a bit trickier. They all have no reasonable "default" for these types *aside from* null. +Instead of giving in, the compiler instead disallows any non-initializations or other cases in which a default value would be inserted. todo: consider user-defined defaults (ex. structs) @@ -360,13 +365,13 @@ todo: consider user-defined defaults (ex. structs) Mention of subtyping has been on occasion in contexts surrounding structural type systems, particularly the section on distinct types, but no explicit description of what the subtyping rules are have been given. Subtyping is the implicit conversion of compatible types, usually in a one-way direction. The following types are implicitly convertible: -- `uint ==> int` -- `int ==> float` -- `uint ==> float` -- `string ==> list[char]` (the opposite no, use `pack`) -- `array[T; n] ==> list[T]` -- `struct[a: T, b: U, ...] ==> struct[a: T, b: U]` -- `union[A: T, B: U] ==> union[A: T, B: U, ...]` +- `uint` ==> `int` +- `int` ==> `float` +- `uint` ==> `float` +- `string` ==> `list[char]` (the opposite no, use `pack`) +- `array[T; n]` ==> `list[T]` +- `struct[a: T, b: U, ...]` ==> `struct[a: T, b: U]` +- `union[A: T, B: U]` ==> `union[A: T, B: U, ...]` ### inheritance |