aboutsummaryrefslogtreecommitdiff
path: root/docs/TYPES.md
blob: 5ea320a8e4e94c726ba0625824e93c1124f766e6 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
# Typing in Puck

> ! This section **needs a rewrite**. Proceed with low standards.

Puck has a comprehensive static type system, inspired by the likes of Nim, Rust, and Swift.

## Basic types

Basic types can be one-of:
- `bool`: internally an enum.
- `int`: integer number. x bits of precision by default.
  - `uint`: same as `int`, but unsigned for more precision.
  - `i[\d+]`, `u[\d+]`: arbitrarily sized integers
- `float`: floating-point number.
  - `f32`, `f64`: specified float sizes
- `decimal`: precision decimal number.
  - `dec32`, `dec64`, `dec128`: specified decimal sizes
- `byte`: an alias to `u8`.
- `char`: an alias to `u32`. For working with Unicode.
- `str`: a string type. mutable. packed: internally a byte-array, externally a char-array.
- `void`: an internal type designating the absence of a value. often elided.
- `never`: a type that denotes functions that do not return. distinct from returning nothing.

`bool` and `int`/`uint`/`float` and siblings (and subsequently `byte` and `char`) are all considered **primitive types** and are _always_ copied (unless passed as mutable). More on when parameters are passed by value vs. passed by reference can be found in the [memory management document](MEMORY_MANAGEMENT.md).

Primitive types, alongside `str`, `void`, and `never`, form **basic types**. `void` and `never` will rarely be referenced by name: instead, the absence of a type typically implicitly denotes one or the other. Still, having a name is helpful in some situations.

### integers

todo

### strings

Strings are:
- mutable
- internally a byte array
- externally a char (four bytes) array
- prefixed with their length and capacity
- automatically resize

They are also quite complicated. Puck has full support for Unicode and wishes to be intuitive, performant, and safe, as all languages wish to be. Strings present a problem that much effort has been expended on in (primarily) Swift and Rust to solve.

## Abstract Types

Abstract types, broadly speaking, are types described by their *behavior* rather than their *implementation*. They are more commonly know as abstract *data* types: which is confusingly similar to "algebraic data types", another term for the [advanced types](#advanced-types) they are built out of under the hood. We refer to them here as "abstract types" to mitigate some confusion.

### iterable types

Iterable types can be one-of:
- `array[T, size]`: Fixed-size arrays. Can only contain one type `T`. Of a fixed size `size` and cannot grow/shrink, but can mutate. Initialized in-place with `[a, b, c]`.
- `list[T]`: Dynamic arrays. Can only contain one type `T`. May grow/shrink dynamically. Initialized in-place with `[a, b, c]`. (this is the same as arrays!)
- `slice[T]`: Slices. Used to represent a "view" into some sequence of elements of type `T`. Cannot be directly constructed: they are **unsized**. Cannot grow/shrink, but their elements may be accessed and mutated. As they are underlyingly a reference to an array or list, they **must not** outlive the data they reference: this is non-trivial, and so slices interact in complex ways with the memory management system.
- `str`: Strings. Described above. They are alternatively treated as either `list[byte]` or `list[char]`, depending on who's asking. Initialized in-place with `"abc"`.

These iterable types are commonly used, and bits and pieces of compiler magic are used here and there (mostly around initialization, and ownership) to ease use. All of these types are some sort of sequence: and implement the `Iter` interface, and so can be iterated (hence the name).

### other abstract types

Unlike the iterable types above, these abstract types do not have a necessarily straightforward or best implementation, and so multiple implementations are provided in the standard library.

These abstract data types can be one-of:
- `BitSet[T]`: high-performance sets implemented as a bit array.
  - These have a maximum data size, at which point the compiler will suggest using a `HashSet[T]` instead.
- `AssocTable[T, U]`: simple symbol tables implemented as an association list.
  - These do not have a maximum size. However, at some point the compiler will suggest using a `HashTable[T, U]` instead.
- `HashSet[T]`: standard hash sets.
- `HashTable[T, U]`: standard hash tables.

These abstract types do not have a natural *ordering*, unlike the iterable types above, and thus do not implement `Iter`. Despite this: for utility an `elems()` iterator based on a normalization of the elements is provided for `set` and `HashSet`, and `keys()`, `values()`, and `pairs()` iterators are provided for `table` and `HashTable` (based on a normalization of the keys). <!-- this is deterministic to prevent user reliance on shoddy randomization: see Go -->

## Parameter Types

Some types are only valid when being passed to a function, or in similar contexts.
No variables may be assigned these types, nor may any function return them.
These are monomorphized into more specific functions at compile-time if needed.

Parameter types can be one-of:
- mutable: `func foo(a: mut str)`: Marks a parameter as mutable (parameters are immutable by default). Passed as a `ref` if not one already.
- constant: `func foo(a: const str)`: Denotes a parameter whose value must be known at compile-time. Useful in macros, and with `when` for writing generic code.
- generic: `func foo[T](a: list[T], b: T)`: The standard implementation of generics, where a parameter's exact type is not listed, and instead statically dispatched based on usage.
- constrained: `func foo(a: str | int | float)`: A basic implementation of generics, where a parameter can be one-of several listed types. The only allowed operations on such parameters are those shared by each type. Makes for particularly straightforward monomorphization. <!-- - Separated with the bitwise or operator `|` rather than the symbolic or `||` or a raw `or` to give the impression that there isn't a corresponding "and" operation (the `&` operator is preoccupied with strings). -->
- functions: `func foo(a: (int, int) -> int)`: First-class functions. All functions are first class - function declarations implicitly have this type, and may be bound in variable declarations. However, the function *type* is only terribly useful as a parameter type.
- slices: `func foo(a: slice[...])`: Slices of existing lists, strings, and arrays. Generic over length. These are references under the hood, may be either immutable or mutable (with `mut`), and interact non-trivially with Puck's [ownership system](MEMORY_MANAGEMENT.md).
- classes: `func foo(a: Stack[int])`: Implicit typeclasses. More in the [classes section](#classes).
  - ex. for above: `type Stack[T] = interface[push(mut Self, T); pop(mut Self): T]`
- built-in interfaces: `func foo(a: struct)`: Included, special interfaces for being generic over [advanced types](#advanced-types). These include `struct`, `tuple`, `union`, `enum`, `interface`, and others.

Several of these parameter types - specifically, slices, functions, and interfaces - share a common trait: they are not *sized*. The exact size of the type is not generally known until compilation - and in some cases, not even during compilation! As the size is not always rigorously known, problems arise when attempting to construct these parameter types or compose them with other types: and so this is disallowed. They may still be used with *indirection*, however - detailed in the [section on reference types](#reference-types).

### generic types

Functions can take a _generic_ type, that is, be defined for a number of types at once:

```puck
# fully generic. monomorphizes based on usage.
func add[T](a: list[T], b: T) = a.push(b)

# constrained generics. restricts possible operations to the intersection
# of defined methods on each type.
func length[T](a: str | list[T]) =
  a.len # both strings and lists have a `len` method

# alternative formulation: place the constraint on a generic parameter.
# this ensures both a and b are of the *same* type.
func add[T: int | float](a: T, b: T) = a + b
```

The syntax for generics is `func`, `ident`, followed by the names of the generic parameters in brackets `[T, U, V]`, followed by the function's parameters (which may then refer to the generic types). Generics are replaced with concrete types at compile time (monomorphization) based on their usage in function calls within the main function body.

Constrained generics have two syntaxes: the constraint can be defined directly on a parameter, leaving off the `[T]` box, or it may be defined within the box as `[T: int | float]` for easy reuse in the parameters.

Other constructions like type declarations themselves may also be generic over types. In the future, modules also may be generic: whether that is to be over types or over other modules is to be determined.

## Reference Types

Types are typically constructed by value on the stack. That is, without any level of indirection: and so type declarations that recursively refer to one another, or involve unsized types (notably including parameter types), would not be allowed. However, Puck provides several avenues for indirection.

Reference types can be one-of:
- `ref T`: An owned reference to a type `T`. This is a pointer of size `uint` (native).
- `refc T`: A reference-counted reference to a type `T`. This allows escaping the borrow checker.
- `ptr T`: A manually-managed pointer to a type `T`. (very) unsafe. The compiler will yell at you.

```puck
type BinaryTree = ref struct
  left: BinaryTree
  right: BinaryTree

type AbstractTree[T] = class
  func left(self: Self): Option[AbstractTree[T]]
  func right(self: Self): Option[AbstractTree[T]]
  func data(self: Self): T

type AbstractRoot[T] = struct
  left: ref AbstractTree[T]
  right: ref AbstractTree[T]

# allowed, but unsafe & strongly discouraged
type UnsafeTree = struct
  left: ptr UnsafeTree
  right: ptr UnsafeTree
```

The `ref` prefix may be placed at the top level of type declarations, or inside on a field of a structural type. `ref` types may often be more efficient when dealing with large data structures. They also provide for the usage of unsized types (functions, interfaces, slices) within type declarations.

The compiler abstracts over `ref` types to provide optimization for reference counts: and so a distinction between `Rc`/`Arc`/`Box` is not needed. Furthermore, access implicitly dereferences (with address access available via `.addr`), and so a `*` dereference operator is also not needed.


Much care has been given to make references efficient and safe, and so `ptr` should be avoided if at all possible. They are only usable inside functions explicitly marked with `#[safe]`.

The implementations of reference types are delved into in further detail in the [memory management document](MEMORY_MANAGEMENT.md).

## Advanced Types

The `type` keyword is used to declare aliases to custom data types. These types are *algebraic*: they function by *composition*. Such *algebraic data types* can be one-of:
- `struct`: An unordered, named collection of types. May have default values.
- `tuple`: An ordered collection of types. Optionally named.
- `enum`: Ordinal labels, that may hold values. Their default values are their ordinality.
- `union`: Powerful matchable tagged unions a la Rust. Sum types.
- `class`: Implicit type classes. User-defined duck typing.

All functions defined on the original type carry over. If this is not desired, the newtype paradigm is preferred: declaring a single-field `struct` and copying function declarations over.

Types may be explicitly to and from via the `Coerce` and `Convert` classes and provided `from` and `to` functions.

### structs

Structs are an *unordered* collection of named types.

They are declared with `struct[identifier: Type, ...]` and initialized with brackets: `{ field = "value", another = 500}`. Structs are *structural*: while the type system is fundamentally nominal, and different type declarations are treated as distinct, a struct object initialized with `{}` is usable in any context that expects a struct with the same fields.

```puck
type LinkedNode[T] = ref struct
  previous: Option[LinkedNode[T]]
  next: Option[LinkedNode[T]]
  data: T

let node = { # inferred type: LinkedNode[int], from prints_data call
  previous = None, next = None
  data = 413
}

func pretty_print(node: LinkedNode[int]) =
  print node.data
  if node.next of Some(node) then
    node.pretty_print()

# structural typing!
prints_data(node)
```

### tuples

Tuples are an *ordered* collection of either named and/or unnamed types.

They are declared with `tuple[Type, identifier: Type, ...]` and initialized with parentheses: `(413, "hello", value: 40000)`. Syntactic sugar allows for them to be declared with `()` as well.

They are exclusively ordered - named types within tuples are just syntactic sugar for positional access. Passing a fully unnamed tuple into a context that expects a tuple with a named parameter is allowed (so long as the types line up).

```puck
let grouping = (1, 2, 3)

func foo: tuple[str, str] = ("hello", "world")
dbg grouping.foo # prints '("hello", "world")'

func bar(a: (str, str)) = a.1
dbg grouping.bar # prints '"world"'
```

Tuples are particularly useful for "on-the-fly" types. Creating type declarations to tuples is discouraged - structs are generally a better choice, as they are fully named, support default values, and may have their layout optimized by the compiler.

### enums

Enums are *ordinal labels* that may have *associated values*.

They are declared with `enum[Label, AnotherLabel = 4, ...]` and are never initialized (their values are known statically). Enums may be accessed directly by their label, and are ordinal and iterable regardless of their associated value. They are useful in collecting large numbers of "magic values" that would otherwise be constants.

```puck
type Keys = enum
  Left, Right, Up, Down
  A = "a"
  B = "b"
```

In the case of an identifier conflict (with other enum labels, or types, or...) they must be prefixed with the name of their associated type (separated by a dot). This is standard for identifier conflicts: and is discussed in more detail in the [modules document](MODULES.md).

### unions

Unions are *tagged* type unions. They provide a high-level wrapper over an inner type that must be safely accessed via pattern matching.

They are declared with `union[Variant(Type), ...]` and initialized with the name of a variant followed by its inner type constructor in brackets: `Square(side: 5)`. Tuples and structs are special-cased to eliminate extraneous parentheses.

```puck
type Value = u64
type Ident = str
type Expr = ref union
  Literal(Value)
  Variable(Ident)
  Abstraction(param: Ident, body: Expr)
  Application(body: Expr, arg: Expr)
  Conditional(
    condition: Expr
    then_case: Expr
    else_case: Expr
  )
```

They take up as much space in memory as the largest variant, plus the size of the tag (one byte).

#### pattern matching

Unions abstract over differing types. In order to *safely* be used, their inner types must be accessed via *pattern matching*: leaving no room for type confusion. Pattern matching in Puck relies on two syntactic constructs: the `match` statement, forcing qualification and handling of all possible types of a variable, and the `of` statement, querying type equality while simultaneously binding new identifiers to underspecified portions of variables.

```puck
use std.tables

func eval(context: mut HashTable[Ident, Value], expr: Expr): Result[Value]
  match expr
  of Literal(value) then Okay(value)
  of Variable(ident) then
    context.get(ident).err("Variable not in context")
  of Application(body, arg) then
    if body of Abstraction(param, body as inner_body):
      context.set(param, context.eval(arg)?) # from std.tables
      context.eval(inner_body)
    else
      Error("Expected Abstraction, found {}".fmt(body))
  of Conditional(condition, then_case, else_case):
    if context.eval(condition)? == "true" then
      context.eval(then_case)
    else:
      context.eval(else_case)
  of expr then
    Error("Invalid expression {}".fmt(expr))
```

The match statement takes exclusively a list of `of` sub-expressions, and checks for exhaustivity. The `expr of Type(binding)` syntax can be reused as a conditional, in `if` statements and elsewhere.

The `of` *operator* is similar to the `is` operator in that it queries type equality, returning a boolean. However, unbound identifiers within `of` expressions are bound to appropriate values (if matched) and injected into the scope. This allows for succinct handling of `union` types in situations where `match` is overkill.

Each branch of a match expression can also have a *guard*: an arbitrary conditional that must be met in order for it to match. Guards are written as `where cond` and immediately follow the last pattern in an `of` branch, preceding `then`.

### classes

Classes can be thought of as analogous to Rust's traits: without explicit `impl` blocks and without need for the `derive` macro. Types that have functions defined on them fulfilling the class requirements implicitly implement the associated class.

The `class` type is composed of a list of function signatures that refer to the special type `Self` that must exist for a type to be valid. The special type `Self` is replaced with the concrete type at compile time in order to typecheck. They are declared with `class[signature, ...]`.

```puck
type Stack[T] = class
  push(self: mut Self, val: T)
  pop(self: mut Self): T
  peek(self: lent Self): lent T

func takes_any_stack(stack: Stack[int]) =
  # only stack.push, stack.pop, and stack.peek are available, regardless of the concrete type passed
```

Differing from Rust, Haskell, and many others, there is no explicit `impl` block. If there exist functions for a type that satisfy all of a class's signatures, it is considered to match and the class typechecks. This may seem strange and ambiguous - but again, static typing and uniform function call syntax help make this a more reasonable design. The purpose of explicit `impl` blocks in ex. Rust is three-fold: to provide a limited form of uniform function call syntax; to explicitly group together associated code; and to disambiguate. UFCS provides for the first, the module system provides for the second, and type-based disambiguation provides for the third, with such information exposed to the user via the language server protocol.

```puck
type Set[T] = class
  in(lent Self, T): bool
  add(mut Self, T)
  remove(mut Self, T): Option[T]

type Foo = struct
  a: int
  b: ref Set[int] # indirection: now perfectly valid
```

Classes cannot be constructed, as they are **unsized**. They serve purely as a list of valid operations on a type: no information about their memory layout is relevant. The *concrete type* fulfilling a class is known at compile time, however, and so there are no issues surrounding the use of classes as parameters, just when attempted to be used as (part of) a concrete type in ex. a struct. They can be used with *indirection*, however: as references are sized (consisting of a memory address).

```puck
## The Display class. Any type implementing `str` is printable.
## Any type that is Display must necessarily also implement Debug.
pub type Display = class
  str(Self): str
  dbg(Self): str

## The Debug class. Broadly implemented for every type with compiler magic.
## Types can (and should) override the generic implementations.
pub type Debug = class
  dbg(Self): str
```

Classes also *cannot* extend or rely upon other classes in any way, nor is there any concept of a parameter satisfying two classes. In the author's experience, while such constructions are powerful, they are also an immense source of complexity, leading to less-than-useful hierarchies seen in languages like Java, and yes, Rust. Instead, if one wishes to form an class that *also* satisfies another class, they must name a new class that explicitly includes all of the other class's associated functions. Given that classes in Puck overwhelmingly only have a small handful of associated functions, and if you're using more than one class you *really* should be using a concrete type: the hope is that this will provide for explicitness and reduce complexity.

Classes compose well with [modules](MODULES.md) to offer fine grained access control.

## Errata

### default values

Puck does not have any concept of `null`: all values *must* be initialized.
But always explicitly initializing types is syntactically verbose, and so most types have an associated "default value".

**Default values**:
- `bool`: `false`
- `int`, `uint`, etc: `0`
- `float`, etc: `0.0`
- `char`: `'\0'`
- `str`: `""`
- `void`, `never`: unconstructable
- `array[T]`, `list[T]`: `[]`
- `set[T]`, `table[T, U]`: `{}`
- `tuple[T, U, ...]`: `(default values of its fields)`
- `struct[T, U, ...]`: `{default values of its fields}`
- `enum[One, Two, ...]`: **disallowed**
- `union[T, U, ...]`: **disallowed**
- `slice[T]`, `func`: **disallowed**
- `ref`, `refc`, `ptr`: **disallowed**

For unions, slices, references, and pointers, this is a bit trickier. They all have no reasonable "default" for these types *aside from* null.
Instead of giving in, the compiler instead disallows any non-initializations or other cases in which a default value would be inserted.

todo: consider user-defined defaults (ex. structs)

### signatures and overloading

Puck supports *overloading* - that is, there may exist multiple functions, or multiple types, or multiple modules, with the same name - so long as they have a different *signature*.
The signature of a function/type/module is important. Classes, among other constructs, depend on the user having some understanding of what the compiler considers to be a signature. So we state it here explicitly:
- The signature of a function is its name and the *types* of each of its parameters, in order, ignoring optional parameters. Generic parameters are ???
  - ex. ...
- The signature of a type is its name and the number of generic parameters.
  - ex. both `Result[T]` and `Result[T, E]` are defined in `std.results`
- The signature of a module is just its name. This may change in the future.

### structural subtyping

Mention of subtyping has been on occasion in contexts surrounding structural type systems, particularly the section on distinct types, but no explicit description of what the subtyping rules are have been given.