**Problems:**
- `vim.treesitter.language.inspect()` returns duplicate
symbol names, sometimes up to 6 of one kind in the case of `markdown`
- The list-like `symbols` table can have holes and is thus not even a
valid msgpack table anyway, mentioned in a test
**Solution:** Return symbols as a map, rather than a list, where field
names are the names of the symbol. The boolean value associated with the
field encodes whether or not the symbol is named.
Note that anonymous nodes are surrounded with double quotes (`"`) to
prevent potential collisions with named counterparts that have the same
identifier.
This commit also marks `child_containing_descendant()` as deprecated
(per upstream's documentation), and uses `child_with_descendant()` in
its place. Minimum required tree-sitter version will now be `0.24`.
An implication of this current approach is that `NVIM_API_LEVEL` should be
bumped when a new Lua function is added.
TODO(future): add a lint check which requires `@since` on all new functions.
ref #25416
**Problem:** The documentation for `TSNode` and `TSTree` methods is
incomplete from the LSP perspective. This is because they are written
directly to the vimdoc, rather than in Lua and generated to vimdoc.
**Solution:** Migrate the docs to Lua and generate them into the vimdoc.
This requires breaking up the `treesitter/_meta.lua` file into a
directory with a few different modules.
This commit also makes the vimdoc generator slightly more robust with
regard to sections that have multiple help tags (e.g. `*one* *two*`)
Problem: No clear way to check whether parsers are available for a given
language.
Solution: Make `language.add()` return `true` if a parser was
successfully added and `nil` otherwise. Use explicit `assert` instead of
relying on thrown errors.
Problem: Language names are only registered for filetype<->language
lookups when parsers are actually loaded; this means users cannot rely
on `vim.treesitter.language.get_lang()` or `get_filetypes()` to return
the correct value when language and filetype coincide and always need to
add explicit fallbacks.
Solution: Always return the language name as valid filetype in
`get_filetypes()`, and default to the filetype in `get_lang()`. Document
this behavior.
For context, see https://github.com/neovim/neovim/pull/24738. Before
that PR, Nvim did not correctly handle captures with quantifiers. That
PR made the correct behavior opt-in to minimize breaking changes, with
the intention that the correct behavior would eventually become the
default. Users can still opt-in to the old (incorrect) behavior for now,
but this option will eventually be removed completely.
BREAKING CHANGE: Any plugin which uses `Query:iter_matches()` must
update their call sites to expect an array of nodes in the `match`
table, rather than a single node.
Problem: Installing treesitter parser is hard (harder than
climbing to heaven).
Solution: Add optional support for wasm parsers with `wasmtime`.
Notes:
* Needs to be enabled by setting `ENABLE_WASMTIME` for tree-sitter and
Neovim. Build with
`make CMAKE_EXTRA_FLAGS=-DENABLE_WASMTIME=ON
DEPS_CMAKE_FLAGS=-DENABLE_WASMTIME=ON`
* Adds optional Rust (obviously) and C11 dependencies.
* Wasmtime comes with a lot of features that can negatively affect
Neovim performance due to library and symbol table size. Make sure to
build with minimal features and full LTO.
* To reduce re-compilation times, install `sccache` and build with
`RUSTC_WRAPPER=<path/to/sccache> make ...`
This is identical to `named_node_for_range` except that it includes
anonymous nodes. This maintains consistency in the API because we
already have `descendant_for_range` and `named_descendant_for_range`.
Co-authored-by: Ilia Choly <ilia.choly@gmail.com>
Co-authored-by: Jose Pedro Oliveira <jose.p.oliveira.oss@gmail.com>
Co-authored-by: Maria José Solano <majosolano99@gmail.com>
Co-authored-by: zeertzjq <zeertzjq@outlook.com>
Problem: `has-ancestor?` is O(n²) for the depth of the tree since it iterates over each of the node's ancestors (bottom-up), and each ancestor takes O(n) time.
This happens because tree-sitter's nodes don't store their parent nodes, and the tree is searched (top-down) each time a new parent is requested.
Solution: Make use of new `ts_node_child_containing_descendant()` in tree-sitter v0.22.6 (which is now the minimum required version) to rewrite the `has-ancestor?` predicate in C to become O(n).
For a sample file, decreases the time taken by `has-ancestor?` from 360ms to 6ms.
* fix(treesitter): enforce lowercase language names
Problem: On case-insensitive file systems (e.g., macOS), `has_parser`
will return `true` for uppercase aliases, which will then try to inject
the uppercase language unsuccessfully.
Solution: Enforce and assume parser names to be lowercase when
resolving language names.
Problem: Injecting languages for file redirects (e.g., in bash) is not
possible.
Solution: Add `@injection.filename` capture that is piped through
`vim.filetype.match({ filename = node_text })`; the resulting filetype
(if not `nil`) is then resolved as a language (either directly or
through the list maintained via `vim.treesitter.language.register()`).
Note: `@injection.filename` is a non-standard capture introduced by
Helix; having two editors implement it makes it likely to be upstreamed.
Problem:
`TSNode:_rawquery()` is complicated, has known issues and the Lua and
C code is awkwardly coupled (see logic with `active`).
Solution:
- Add `TSQueryCursor` and `TSQueryMatch` bindings.
- Replace `TSNode:_rawquery()` with `TSQueryCursor:next_capture()` and `TSQueryCursor:next_match()`
- Do more stuff in Lua
- API for `Query:iter_captures()` and `Query:iter_matches()` remains the same.
- `treesitter.c` no longer contains any logic related to predicates.
- Add `match_limit` option to `iter_matches()`. Default is still 256.
Problem: Not all standard treesitter groups are documented.
Solution: Document them all (without relying on fallback); add default
link for new `*.builtin` groups to `Special` and `@keyword.type` to
`Structure`. Remove `@markup.environment.*` which only made sense for
LaTeX.
- Added `@inlinedoc` so single use Lua types can be inlined into the
functions docs. E.g.
```lua
--- @class myopts
--- @inlinedoc
---
--- Documentation for some field
--- @field somefield integer
--- @param opts myOpts
function foo(opts)
end
```
Will be rendered as
```
foo(opts)
Parameters:
- {opts} (table) Object with the fields:
- somefield (integer) Documentation
for some field
```
- Marked many classes with with `@nodoc` or `(private)`.
We can eventually introduce these when we want to.
Problem:
The documentation flow (`gen_vimdoc.py`) has several issues:
- it's not very versatile
- depends on doxygen
- doesn't work well with Lua code as it requires an awkward filter script to convert it into pseudo-C.
- The intermediate XML files and filters makes it too much like a rube goldberg machine.
Solution:
Re-implement the flow using Lua, LPEG and treesitter.
- `gen_vimdoc.py` is now replaced with `gen_vimdoc.lua` and replicates a portion of the logic.
- `lua2dox.lua` is gone!
- No more XML files.
- Doxygen is now longer used and instead we now use:
- LPEG for comment parsing (see `scripts/luacats_grammar.lua` and `scripts/cdoc_grammar.lua`).
- LPEG for C parsing (see `scripts/cdoc_parser.lua`)
- Lua patterns for Lua parsing (see `scripts/luacats_parser.lua`).
- Treesitter for Markdown parsing (see `scripts/text_utils.lua`).
- The generated `runtime/doc/*.mpack` files have been removed.
- `scripts/gen_eval_files.lua` now instead uses `scripts/cdoc_parser.lua` directly.
- Text wrapping is implemented in `scripts/text_utils.lua` and appears to produce more consistent results (the main contributer to the diff of this change).
Query patterns can contain quantifiers (e.g. (foo)+ @bar), so a single
capture can map to multiple nodes. The iter_matches API can not handle
this situation because the match table incorrectly maps capture indices
to a single node instead of to an array of nodes.
The match table should be updated to map capture indices to an array of
nodes. However, this is a massively breaking change, so must be done
with a proper deprecation period.
`iter_matches`, `add_predicate` and `add_directive` must opt-in to the
correct behavior for backward compatibility. This is done with a new
"all" option. This option will become the default and removed after the
0.10 release.
Co-authored-by: Christian Clason <c.clason@uni-graz.at>
Co-authored-by: MDeiml <matthias@deiml.net>
Co-authored-by: Gregory Anders <greg@gpanders.com>
Document that the `start` and `stop` parameters in
`Query:iter_captures()` and `Query:iter_matches()` are optional.
The tree-sitter lib has been bumped up to 0.20.9, so we also no longer
need "Requires treesitter >= 0.20.9".
- `TSQuery`: userdata object for parsed query.
- `vim.treesitter.Query`: renamed from `Query`.
- Add a new field `lang`.
- `TSQueryInfo`:
- Move to `vim/treesitter/_meta.lua`, because C code owns it.
- Correct typing for `patterns`, should be a map from `integer`
(pattern_id) to `(integer|string)[][]` (list of predicates or
directives).
- `vim.treesitter.QueryInfo` is added.
- This currently has the same structure as `TSQueryInfo` (exported
from C code).
- Document the fields (see `TSQuery:inspect`).
- Add typing for `vim._ts_parse_query()`.
Problem: Currently default color scheme defines most of treesitter
highlight groups. This might be an issue for users defining their own
color scheme as it breaks the "fallback property" for some of groups.
Solution: Define less default treesitter groups; just enough for default
color scheme to be useful. That is:
- All first level groups (`@character`, `@string`, etc.).
- All `@xxx.builtin` groups as a most common subgroup.
- Some special cases (links/URLs, `@diff.xxx`, etc.).
Docs for treesitter would benefit from including more real-world and
practical examples of queries and usages, rather than hypothetical ones
(e.g. names such as "foo", "bar"). Improved examples should be more
user-friendly and clear to understand.
In addition, align the capture names in some examples with the actual
ones being used in the built-in query files or in the nvim-treesitter
plugin, e.g.:
- `@parameter` -> `@variable.parameter`
- `@comment.doc.java` -> `@comment.documentation.java`
- etc.
Problem: Sharing queries with upstream and Helix is difficult due to
different capture names.
Solution: Define and document a new set of standard captures that
matches tree-sitter "standard captures" (where defined) and is closer to
Helix' Atom-style nested groups.
This is a breaking change for colorschemes that defined highlights based
on the old captures. On the other hand, the default colorscheme now
defines links for all standard captures (not just those used in bundled
queries), improving the out-of-the-box experience.
Problem: Some lines in the generated vim doc are overflowing, not
correctly wrapped at 78 characters. This happens when docs body contains
several consecutive 'inline' elements generated by doxygen.
Solution: Take into account the current column offset of the last line,
and prepend some padding before doc_wrap().
Improve error messages for `:InspectTree`, when no parsers are available
for the current buffer and filetype. We can show more informative and
helpful error message for users (e.g., which lang was searched for):
```
... No parser available for the given buffer:
+... no parser for 'custom_ft' language, see :help treesitter-parsers
```
Also improve the relevant docs for *treesitter-parsers*.
Problem: Not all default highlight groups show their actual colors.
Solution: Refactor `vimhelp.lua` and apply it to all relevant lists (UI
groups, syntax groups, treesitter groups, LSP groups, diagnostic groups).
PROBLEM: `vim.treesitter.get_node()` does not recognize the `lang` in
the option table. This option was used in somewhere else, for instance,
`vim.treesitter.dev` (for `inspect_tree`) but was never implemented.
SOLUTION: Make `get_node()` correctly use `opts.lang` when getting a
treesitter parser.