# Typing in Puck Puck has a comprehensive static type system. ## Basic types Basic types can be one-of: - `bool`: internally an enum. - `int`: integer number. x bits of precision by default. - `uint`: same as `int`, but unsigned for more precision. - `i8`, `i16`, `i32`, `i64`, `i28`: specified integer size - `u8`, `u16`, `u32`, `u64`, `u128`: specified integer size - overflows into bigints for safety and ease of cryptographical code. - `float`: floating-point number. - `f32`, `f64`: specified float sizes - `byte`: an alias to `u8`. - `char`: a *four-byte* value. For working with Unicode. An alias to `u32`. - these are *packed* when part of a string: and so indexing directly into a string is a no-op. string access is O(n), swift-style. - `str`: a string type. mutable. internally a byte-array: externally a char-array. - `void`: an internal type designating the absence of a value. - possibly, the empty tuple. then would `empty` be better? or `unit`? - `never`: a type that denotes functions that do not return. - distinct from returning nothing. - the bottom type. `bool`, `int`/`uint` and siblings, `float` and siblings, `byte`, and `char` are all considered **primitive types** and are _always_ copied (unless passed as `var`). More on when parameters are passed by value (copied) vs. passed by reference can be found in the [memory management document](MEMORY_MANAGEMENT.md). Primitive types combine with `str`, `void`, and `never` to form the **basic types**. `void` and `never` will rarely be referenced by name: instead, the absence of a type typically implicitly denotes one or the other. Still, having a name is helpful in some situations. ### integers todo ### strings todo ## Container types Container types, broadly speaking, are types that contain other types. These exclude the types in [advanced types](#advanced-types). ### Iterable types Iterable types can be one-of: - `array[S, T]`: Static arrays. Can only contain one type `T`. Of size `S` and cannot grow/shrink. - Initialize in-place with `array(a, b, c)`. Should we do this? otherwise `[a, b, c]`. - `list[T]`: Dynamic arrays. Can only contain one type `T`. May grow/shrink dynamically. - Initialize in-place with `list(a, b, c)`. Should we do this? otherwise `@[a, b, c]`. - `slice[T]`: Slices. Used to represent a "view" into some sequence of elements of type `T`. - Cannot be directly constructed. May be initialized from an array, list, or string, or may be used as a generic parameter on functions (more on that later). - Slices cannot grow/shrink. Their elements may be accessed and mutated. As they are underlyingly a reference to an array or list, they **must not** outlive the data they reference. - `str`: Strings. Complicated, they are alternatively `list[byte]` or `list[char]` depending on who's asking. In the context of iterable types, they are treated as `list[char]`. All of these above types are some sort of sequence: and so have a length, and so can be _iterated_. For convenience, a special `iterable` generic type is defined for use in parameters: that abstracts over all of the container types. This `iterable` type is also extended to any collection of a single type with a length. - Is this redundant with `slice[T]`? todo Elements of container types can be accessed by the `container[index]` syntax. Slices of container types can be accessed by the `container[lowerbound..upperbound]` syntax. Slices of non-consecutive elements can be accessed by the `container[a,b,c..d]` syntax, and indeed, the previous example expands to these. They can also be combined: `container[a,b,c..d]`. ### Abstract data types There are an additional suite of related types: abstract data types. While falling under container types, these do not have a necessarily straightforward or best implementation, and so multiple implementations are provided. Abstract data types can be one-of: - `set[T]`: high-performance sets implemented as a bit array. - These have a maximum data size, at which point the compiler will suggest using a `HashSet[T]` instead. - `table[T, U]`: simple symbol tables implemented as an association list. - These do not have a maximum size. However, at some point the compiler will suggest using a `HashTable[T, U]` instead. - `HashSet[T]`: standard hash sets. - `HashTable[T, U]`: standard hash tables. Unlike iterable types, abstract data types are not iterable by default: as they are not ordered, and thus, it is not clear how they should be iterated. Despite this: for utility purposes, an `elems()` iterator based on a normalization of the elements is provided for `set` and `HashSet`, and `keys()`, `values()`, and `pairs()` iterators are provided for `table` and `HashTable` based on a normalization of the keys. This is deterministic to prevent user reliance on shoddy randomization, see Golang. ## Parameter types Some types are only valid when being passed to a function, or in similar contexts. No variables may be assigned these types, nor may any function return them. These are monomorphized into more specific functions at compile-time if needed. Parameter types can be one-of: - mutable: `func foo(a: var str)`: Denotes the mutability of a parameter. Parameters are immutable by default. - Passed as a `ref` if not one already, and marked mutable. - static: `func foo(a: static str)`: Denotes a parameter that's value must be known at compile-time. Useful with `when` for writing generic code. - generic: `func foo[T](a: list[T], b: T)`: The standard implementation of generics, where a parameter's exact type is not listed, and instead statically dispatched based on usage. - constrained: `func foo(a: str | int | float)`: A basic implementation of generics, where a parameter can be one-of several listed types. Makes for particularly straightforward monomorphization. - Separated with the bitwise or operator `|` rather than the symbolic or `||` or a raw `or` to give the impression that there isn't a corresponding "and" operation (the `&` operator is preoccupied with strings). - functions: `func foo(a: func (x, y: int): int)`: First-class functions. - Functions may be prefixed with modifiers: one-of `pure`, `yeet`, `async`. - Syntactic sugar is available: `func foo(a: (int, int) -> int)`. This is not usable with modifiers. - interfaces: `func foo(a: Stack[int])`: User-defined duck typing. Equivalent to and more general than Rust's traits or Haskell's typeclasses. - ex. for above: the `Stack[T]` interface specifies generic `push` and `pop` operations - More on interfaces in the [interfaces section](#interfaces). - built-in interface: `func foo(a: enum)`: Included, special interfaces for being generic over [advanced types](#advanced-types). - Of note is how `slice[T]` functions: it is generic over `lists` and `arrays` of any length. These parameter types (except `static`) share a common trait: they are not *sized*. The exact type is not generally known until compilation - and in the case of interfaces, sometimes not even during compilation! As the size is not always rigorously known, problems arise when attempting to construct parameter types or compose them with other types: and so this is disallowed. Not all is lost, however, as they may still be used with *indirection* - detailed in the [section on reference types](#reference-types). ### Generic types Functions can take a _generic_ type, that is, be defined for a number of types at once: ```puck func add[T](a: list[T], b: T) = return a.add(b) func length[T](a: T) = return a.len # monomorphizes based on usage. # lots of things use .len, but only a few called by this do. # throws a warning if exported for lack of specitivity. func length[T: str | list](a: T) = return a.len ``` The syntax for generics is `func`, `ident`, followed by the names of the generic parameters in brackets `[T, U, V]`, followed by the function's parameters (which may refer to the generic types). Generics are replaced with concrete types at compile time (monomorphization) based on their usage in functions within the main function body. Constrained generics have two syntaxes: the constraint can be directly on a parameter, leaving off the `[T]` box, or it may be defined within the box as `[T: int | float]` for easy reuse in the parameters. ## Reference types Types are typically constructed by value on the stack. That is, without any level of indirection: and so type declarations that recursively refer to one another, or involve unsized types (notably including parameter types and interfaces!), would not be allowed. However, Puck provides two avenues for indirection. Reference types can be one-of: - `ref T`: An automatically-managed reference to type `T`. This is a pointer of size `uint` (native). - `ptr T`: A manually-managed pointer to type `T`. (very) unsafe. ```puck type BinaryTree = ref struct left: BinaryTree right: BinaryTree type AbstractTree[T] = interface func left(self: Self): Option[AbstractTree[T]] func right(self: Self): Option[AbstractTree[T]] func data(self: Self): T type AbstractRoot[T] = struct left: ref AbstractTree[T] right: ref AbstractTree[T] # allowed, but unsafe & strongly discouraged type UnsafeTree = struct left: ptr UnsafeTree right: ptr UnsafeTree ``` The `ref` prefix may be placed at the top level of type declarations, or inside on a field of a structural type. `ref` types may often be more efficient when dealing with large data structures. They also provide for the usage of parameter types (except for `static` and `var`) within type declarations. The compiler abstracts over `ref` types to provide optimization for reference counts: and so neither a distinction between `Rc`/`Arc`/`Box`, nor a `*` dereference operator is needed. Much care has been given to make references efficient and safe, and so `ptr` should be avoided if at all possible. The compiler will yell at you if you use it (or any other unsafe features). The implementation of reference types is delved into in further detail in the [document on memory management](MEMORY_MANAGEMENT.md). ## Advanced Types The `type` keyword is used to declare custom data types. These are *algebraic*: they function by composition. Algebraic data types can be one-of: - `tuple`: An ordered collection of types. Optionally named. - `struct`: An unordered, named collection of types. May have default values. - `enum`: Ordinal labels, that may hold values. Their default values are their ordinality. - `union`: Powerful matchable tagged unions a la Rust. Sum types. - `interface`: Usage-based typeclasses. User-defined duck typing. - `distinct`: a type that must be explicitly converted - type aliases, declared as `type Identifier = Alias` ### tuples Tuples are an *ordered* collection of either named or unnamed types. They are declared with `tuple[Type, identifier: Type, ...]` and initialized with parentheses: `(413, "hello", value: 40000)`. ```puck ``` Tuples are particularly useful for "on-the-fly" types. Creating type aliases to tuples is discouraged - structs are generally a better choice for custom type declarations. ### structs Structs are an *unordered* collection of named types. They are declared with `struct[identifier: Type, ...]` and initialized with brackets: `{field: "value", another: 500}`. ```puck type LinkedNode[T] = struct previous, next: Option[ref LinkedNode[T]] data: T let node = { previous, next = None data = 413 } func pretty_print(node: LinkedNode[int]) = print node.data if node.next of Some(node): node.pretty_print() # structural typing! prints_data(node) ``` Structs are *structural* and so structs composed entirely of fields with the same signature (identical in name and type) are considered *equivalent*. This is part of a broader structural trend in the type system, and is discussed in detail in [subtyping](#subtyping). ### enums Enums are *ordinal labels* that may have associated values. They are declared with `enum[Label, AnotherLabel = 4, ...]` and are never initialized (their values are known statically). Enums may be accessed directly by their label, and are ordinal and iterable regardless of their associated value. ```puck type Keys = enum Left, Right, Up, Down A = "a" B = "b" ``` In the case of an identifier conflict (with other enum labels, or types, or...) they must be prefixed with the name of their associated type (separated by a dot). (this is standard for identifier conflicts: and is discussed in more detail in the [modules document](MODULES.md).) ### unions Unions are *tagged* type unions. They provide a high-level wrapper over an inner type that must be accessed via pattern matching. They are declared with `union[Variant: Type, ...]` and initialized with the name of a variant followed by its inner type constructor in brackets: `Square(side: 5)`. Tuple and struct types are special-cased to eliminate extraneous parentheses. ```puck type Value = uint64 type Ident = string type Expr = ref union Literal: Value Variable: Ident Abstraction: struct[param: Ident, body: Expr] Application: struct[body, arg: Expr] Conditional: struct cond, then_case, else_case: Expr ``` They take up as much space in memory as the largest variant, plus the size of the tag (typically one byte). #### pattern matching Unions abstract over differing types. In order to *safely* be used, their inner types must be accessed via *pattern matching*: leaving no room for type confusion. Pattern matching in Puck relies on two syntactic constructs: the `match` statement, forcing qualification and handling of all possible types of a variable, and the `of` statement, querying type equality while simultaneously binding new identifiers to underspecified portions of variables. ```puck import std/tables func eval(expr: Expr, context: var HashTable[Ident, Value]): Result[Value] match expr of Literal(value): value of Variable(ident): context.get(ident) of Application{body, arg}: if body of Abstraction{param, body as inner_body}: context.set(param, eval(arg)) inner_body.eval(context) else: Error(InvalidExpr) of Conditional{cond, then_case, else_case}: if eval(cond, context): then_case.eval(context) else: else_case.eval(context) of _: Error(InvalidExpr) ``` The match statement takes exclusively a list of `of` sub-expressions, and checks for exhaustivity; but the `variable of Type(x)` syntax can be reused as a conditional, in `if` statements and elsewhere. The `of` operator is similar to the `is` operator in that it queries type equality. However, unbound identifiers within `of` expressions are bound to appropriate values (if matched) and injected into the scope. When matching unions with an inner product type (structs or tuples), external extraneous parenthesis are elided. todo: guards, either `where` or `if` ### interfaces Interfaces can be thought of as analogous to Rust's traits, without explicit `impl` blocks and without need for the `derive` macro. The `interface` type is composed of a list of function signatures that refer to the special type `Self` that must exist for a type to be valid. Interfaces cannot be constructed and so are only of use as parameter types and the like, and so are always used in a *type conversion*. The special type `Self` is replaced with the type being converted at compile time in order to typecheck. They are declared with `interface[signature, ...]`. ```puck type Stack[T] = interface func push(self: var Self, val: T) func pop(self: var Self): T func peek(self: Self): T func takes_any_stack(stack: Stack[int]) = # only stack.push, stack.pop, and stack.peek are available methods ``` Differing from Rust, Haskell, and others, there is no explicit `impl` block. If there exists a function that matches one of an interface's signatures, it is considered to match and the interface typechecks. This may seem strange and ambiguous - but again, static typing and uniform function call syntax help make this fine and idiomatic. The purpose of explicit `impl` blocks in ex. Rust is two-fold: to group together associated code, but primarily to provide a limited form of uniform function call syntax. As any function can be called as either `foo(a, b)` or `a.foo(b)`, `impl` blocks would only serve to group together associated code: which is better done with modules. Interfaces cannot be constructed because they are not sized. They serve purely as a list of valid operations on a type within a context: no information about their memory layout is relevant. The concrete type fulfilling the interface is known at compile time, however, and so there are no issues surrounding interfaces as parameters, just when attempted to be used as (part of) a concrete type. They can be used as part of a concrete type with *indirection*, however: `type Foo = struct[a: int, b: ref interface[...]]` is perfectly valid. While functions are the primary way of performing operations on types, they are not the only way, and listing all explicitly can be painful - instead, it can be desired to be able to *associate a type* and any field access or existing functions on that type with the interface. todo: i have not decided on the syntax for this yet. Interfaces also compose with [the module system](MODULES.md) to offer fine grained crafting of types. ### distinct types Puck is a structurally typed language. If you have `type Foo = struct[a: int]` and `type Bar = struct[a, b: int]`, passing a term of type `Bar` into a context that expects `Foo` is allowed: even encouraged. This usage-based paradigm is a consistent underlying theme of Puck. `distinct` types break this coercion convention. They are primarily useful for extended (type,\_) safety: a `type SqlStr = distinct str` provides an extra barrier to SQL injections, for example, or a `type Point = distinct struct[x, y: int]` can block confusion with a `type Rectangle = distinct struct[x, y: int]`. `distinct` types can still be *explicitly* converted to an equivalent type with `as`. ### type aliases Finally, any type can be declared as an *alias* to a type simply by assigning it to such. ``` type MyFloat = float ``` ## Errata ### default values Puck does not have any concept of `null`: all values *must* be initialized. But always explicitly initializing types is syntactically verbose, and so most types have an associated "default value". **Default values**: - `bool`: `false` - `int`, `uint`, etc: `0` - `float`, etc: `0.0` - `char`: `'\0'` - `str`: `""` - `void`, `never`: n/a - `array[T]`, `list[T]`: `[]` - `set[T]`, `table[T, U]`: `{}` - `tuple[T, U, ...]`: `(default values of its fields)` - `struct[T, U, ...]`: `{default values of its fields}` - `enum[One, Two, ...]`: **disallowed** - `union[T, U, ...]`: **disallowed** - `slice[T]`, `func`: **disallowed** - `ref`, `ptr`: **disallowed** For `ref` and `ptr` types, however, this is trickier. There is no reasonable "default" for these types *aside from* null. Instead of giving in, the compiler will instead disallow any non-initializations or other cases in which a default value would be inserted. (`slice[T]` and `func` types are references under the hood and also behave in this way) todo: consider user-defined defaults (ex. structs) ### subtyping Mention of subtyping has been on occasion in contexts surrounding structural type systems, particularly the section on distinct types, but no explicit description of what the subtyping rules are have been given. Subtyping is the implicit conversion of compatible types, usually in a one-way direction. The following types are implicitly convertible: - `uint ==> int` - `int ==> float` - `uint ==> float` - `string ==> list[char]` (the opposite no, use `pack`) - `array[T; n] ==> list[T]` - `struct[a: T, b: U, ...] ==> struct[a: T, b: U]` - `union[A: T, B: U] ==> union[A: T, B: U, ...]` ### inheritance Puck is not an object-oriented language. Idiomatic design patterns in object-oriented languages are harder to accomplish and not idiomatic here. But, Puck has a number of features that somewhat support the object-oriented paradigm, including: - uniform function call syntax - structural typing / subtyping - interfaces ```puck type Building = struct size: struct[length, width: uint] color: enum[Red, Blue, Green] location: tuple[longitude, latitude: float] type House = struct size: struct[length, width: uint] color: enum[Red, Blue, Green] location: tuple[longitude, latitude: float] occupant: str func init(_: type[House]): House = { size: {length, width: 500}, color: Red location: (0.0, 0.0), occupant: "Barry" } func address(building: Building): str = let number = int(building.location.0 / building.location.1).abs let street = "Logan Lane" return number.str & " " & street # subtyping! methods! print House.init().address() func address(house: House): str = let number = int(house.location.0 - house.location.1).abs let street = "Logan Lane" return number.str & " " & street # overriding! (will warn) print address(House.init()) # abstract types! inheritance! type Addressable = interface for Building func address(self: Self) ``` These features may *compose* into code that closely resembles its object-oriented counterpart. But make no mistake! Puck is static first and functional somewhere in there: dynamic dispatch and the like are not accessible (currently).