aboutsummaryrefslogtreecommitdiff
path: root/book/src/guides
diff options
context:
space:
mode:
Diffstat (limited to 'book/src/guides')
-rw-r--r--book/src/guides/adding_languages.md97
1 files changed, 58 insertions, 39 deletions
diff --git a/book/src/guides/adding_languages.md b/book/src/guides/adding_languages.md
index 5844a48e..51256288 100644
--- a/book/src/guides/adding_languages.md
+++ b/book/src/guides/adding_languages.md
@@ -1,45 +1,68 @@
# Adding languages
-## Submodules
+## Language configuration
-To add a new language, you should first add a tree-sitter submodule. To do this,
-you can run the command
-```sh
-git submodule add -f <repository> helix-syntax/languages/tree-sitter-<name>
-```
-For example, to add tree-sitter-ocaml you would run
-```sh
-git submodule add -f https://github.com/tree-sitter/tree-sitter-ocaml helix-syntax/languages/tree-sitter-ocaml
+To add a new language, you need to add a `language` entry to the
+[`languages.toml`][languages.toml] found in the root of the repository;
+this `languages.toml` file is included at compilation time, and is
+distinct from the `languages.toml` file in the user's [configuration
+directory](../configuration.md).
+
+```toml
+[[language]]
+name = "mylang"
+scope = "scope.mylang"
+injection-regex = "^mylang$"
+file-types = ["mylang", "myl"]
+comment-token = "#"
+indent = { tab-width = 2, unit = " " }
```
-Make sure the submodule is shallow by doing
-```sh
-git config -f .gitmodules submodule.helix-syntax/languages/tree-sitter-<name>.shallow true
+
+These are the available keys and descriptions for the file.
+
+| Key | Description |
+| ---- | ----------- |
+| `name` | The name of the language |
+| `scope` | A string like `source.js` that identifies the language. Currently, we strive to match the scope names used by popular TextMate grammars and by the Linguist library. Usually `source.<name>` or `text.<name>` in case of markup languages |
+| `injection-regex` | regex pattern that will be tested against a language name in order to determine whether this language should be used for a potential [language injection][treesitter-language-injection] site. |
+| `file-types` | The filetypes of the language, for example `["yml", "yaml"]`. Extensions and full file names are supported. |
+| `shebangs` | The interpreters from the shebang line, for example `["sh", "bash"]` |
+| `roots` | A set of marker files to look for when trying to find the workspace root. For example `Cargo.lock`, `yarn.lock` |
+| `auto-format` | Whether to autoformat this language when saving |
+| `diagnostic-severity` | Minimal severity of diagnostic for it to be displayed. (Allowed values: `Error`, `Warning`, `Info`, `Hint`) |
+| `comment-token` | The token to use as a comment-token |
+| `indent` | The indent to use. Has sub keys `tab-width` and `unit` |
+| `config` | Language server configuration |
+| `grammar` | The tree-sitter grammar to use (defaults to the value of `name`) |
+
+## Grammar configuration
+
+If a tree-sitter grammar is available for the language, add a new `grammar`
+entry to `languages.toml`.
+
+```toml
+[[grammar]]
+name = "mylang"
+source = { git = "https://github.com/example/mylang", rev = "a250c4582510ff34767ec3b7dcdd3c24e8c8aa68" }
```
-or you can manually add `shallow = true` to `.gitmodules`.
+Grammar configuration takes these keys:
-## languages.toml
+| Key | Description |
+| --- | ----------- |
+| `name` | The name of the tree-sitter grammar |
+| `path` | A path within the grammar directory which should be built. Some grammar repositories host multiple grammars (for example `tree-sitter-typescript` and `tree-sitter-ocaml`) in subdirectories. This key is used to point `hx --build-grammars` to the correct path for compilation. When ommitted, the root of the grammar directory is used |
+| `source` | The method of fetching the grammar - a table with a schema defined below |
-Next, you need to add the language to the [`languages.toml`][languages.toml] found in the root of
-the repository; this `languages.toml` file is included at compilation time, and
-is distinct from the `language.toml` file in the user's [configuration
-directory](../configuration.md).
+Where `source` is a table with either these keys when using a grammar from a
+git repository:
-These are the available keys and descriptions for the file.
+| Key | Description |
+| --- | ----------- |
+| `git` | A git remote URL from which the grammar should be cloned |
+| `rev` | The revision (commit hash or tag) which should be fetched |
-| Key | Description |
-| ---- | ----------- |
-| name | The name of the language |
-| scope | A string like `source.js` that identifies the language. Currently, we strive to match the scope names used by popular TextMate grammars and by the Linguist library. Usually `source.<name>` or `text.<name>` in case of markup languages |
-| injection-regex | regex pattern that will be tested against a language name in order to determine whether this language should be used for a potential [language injection][treesitter-language-injection] site. |
-| file-types | The filetypes of the language, for example `["yml", "yaml"]` |
-| shebangs | The interpreters from the shebang line, for example `["sh", "bash"]` |
-| roots | A set of marker files to look for when trying to find the workspace root. For example `Cargo.lock`, `yarn.lock` |
-| auto-format | Whether to autoformat this language when saving |
-| diagnostic-severity | Minimal severity of diagnostic for it to be displayed. (Allowed values: `Error`, `Warning`, `Info`, `Hint`) |
-| comment-token | The token to use as a comment-token |
-| indent | The indent to use. Has sub keys `tab-width` and `unit` |
-| config | Language server configuration |
+Or a `path` key with an absolute path to a locally available grammar directory.
## Queries
@@ -51,18 +74,14 @@ gives more info on how to write queries.
> NOTE: When evaluating queries, the first matching query takes
precedence, which is different from other editors like neovim where
-the last matching query supercedes the ones before it. See
+the last matching query supersedes the ones before it. See
[this issue][neovim-query-precedence] for an example.
## Common Issues
-- If you get errors when building after switching branches, you may have to remove or update tree-sitter submodules. You can update submodules by running
- ```sh
- git submodule sync; git submodule update --init
- ```
-- Make sure to not use the `--remote` flag. To remove submodules look inside the `.gitmodules` and remove directories that are not present inside of it.
+- If you get errors when running after switching branches, you may have to update the tree-sitter grammars. Run `hx --fetch-grammars` to fetch the grammars and `hx --build-grammars` to build any out-of-date grammars.
-- If a parser is segfaulting or you want to remove the parser, make sure to remove the submodule *and* the compiled parser in `runtime/grammar/<name>.so`
+- If a parser is segfaulting or you want to remove the parser, make sure to remove the compiled parser in `runtime/grammar/<name>.so`
- The indents query is `indents.toml`, *not* `indents.scm`. See [this](https://github.com/helix-editor/helix/issues/114) issue for more information.