Problem:
A region managed by an injected parser may shrink after re-running the
injection query. If the updated region goes out of the range to be
parsed, then the corresponding tree will remain outdated, possibly
retaining the nodes that shouldn't exist anymore. This results in
outdated highlights.
Solution:
Re-parse an invalid tree if its region intersects the range to be
parsed.
- Remove some unused fields
- Prefix classes with `vim.`
- Move around some functions so the query stuff is at the top.
- Improve type hints
- Rework how hl_cache is implemented
Problem:
Treesitter highlighter's on_line was iterating all the parsed trees,
which can be quite a lot when injection is used. This may slow down
scrolling and cursor movement in big files with many comment injections
(e.g., lsp/_meta/protocol.lua).
Solution:
In on_win, collect trees inside the visible range, and use them in
on_line.
NOTE:
This optimization depends on the correctness of on_win's botline_guess
parameter (i.e., it's always greater than or equal to the line numbers
passed to on_line). The documentation does not guarantee this, but I
have never noticed a problem so far.
* Collect on_bytes and flush at the invocation of the scheduled callback
to take account of commands that triggers multiple on_bytes.
* More accurately track movement of folds so that foldexpr returns
reasonable values even when the scheduled computation is not run yet.
* Start computing folds from the line above (+ foldminlines) the changed
lines to handle the folds that are removed due to the size limit.
* Shrink folds that end at the line at which another fold starts to
assign proper level to that line.
* Use level '=' for lines that are not computed yet.
Problem:
`LanguageTree:for_each_tree` calls itself for child nodes, so when we
calls `for_each_tree` inside `for_each_tree`, this quickly leads to
exponential tree calls.
Solution:
Use `pairs(child:trees())` directly in this case, as we don't need the
extra callback for each children, this is already handled from the outer
`for_each_tree` call
When first opened, the tree-sitter inspector traverses all of the nodes
in the buffer to calculate an array of nodes. This traversal is done
only once, and _all_ nodes (both named and anonymous) are included.
Toggling anonymous nodes in the inspector only changes how the tree is
drawn in the buffer, but does not affect the underlying data structure
at all.
When the buffer is traversed and the list of nodes is calculated, we
don't know whether or not anonymous nodes will be displayed in the
inspector or not. Thus, we cannot determine during traversal where to
put closing parentheses. Instead, this must be done when drawing.
When we draw, the tree structure has been flatted into a single array,
so we lose parent-child relationships that would otherwise make
determining the number of closing parentheses straightforward. However,
we can instead rely on the fact that a delta between the depth of a node
and the depth of the successive node _must_ mean that more closing
parentheses are required:
(foo
(bar)
(baz) ↑
│
└ (bar) and (baz) have different depths, so (bar) must have an
extra closing parenthesis
This does not depend on whether or not anonymous nodes are displayed and
so works in both cases.
Problem: Only injections under the top level tree are found.
Solution: Iterate through all trees to find injections. When two
injections are contained within the same node in the parent tree, prefer
the injection with the larger byte length.
When parsing with a range, languagetree looks up injections and adds
them if needed. This explicitly invalidates parser, making `is_valid`
report `false` both when including and excluding children.
This is an attempt to describe desired behaviour of `is_valid` in tests,
with what ended up being a single line change to satisfy them.
Problem: Visual highlight is inconsistent on a folded line with
treesitter foldtext.
Solution: Don't added Folded highlight as it is already in background.
This is incorrect in the following scenario:
1. The language tree is Lua > Vim > Lua.
2. An edit simultaneously wipes out the `_regions` of all nodes, while
taking the Vim injection off-screen.
3. The Vim injection is not re-parsed, so the child Lua `_regions` is
still `nil`.
4. The child Lua is assumed, incorrectly, to occupy the whole document.
5. This causes the injections to be parsed again, resulting in Lua > Vim
> Lua > Vim.
6. Now, by the same process, Vim ends up with its range assumed over the
whole document. Now the parse is broken and results in broken
highlighting and poor performance.
It should be fine to instead treat an unparsed node as occupying
nothing (i.e. effectively non-existent). Since, either:
- The parent was just parsed, hence defining `_regions`
- The parent was not just parsed, in which case this node doesn't need
to be parsed either.
Also, the name `has_regions` is confusing; it seems to simply
mean the opposite of "root" or "full_document". However, this PR does
not touch it.
Memoizes a function, using a custom function to hash the arguments.
Private for now until:
- There are other places in the codebase that could benefit from this
(e.g. LSP), but might require other changes to accommodate.
- Invalidation of the cache needs to be controllable. Using weak tables
is an acceptable invalidation policy, but it shouldn't be the only
one.
- I don't think the story around `hash_fn` is completely thought out. We
may be able to have a good default hash_fn by hashing each argument,
so basically a better 'concat'.
Problem:
With incremental injection parsing, injected languages' parsers parse
only the relevant regions and stores the result in _trees with the index
of the corresponding region. Therefore, there can be holes in _trees.
Solution:
* Use generic table functions where appropriate.
* Fix type annotations and docs.
Problem:
It doesn't make much sense to flatten each region (= list of ranges).
This coincidentally worked for region with a single range.
Solution:
Custom function for combining regions.
Problem
---
If a highlighter query returns a significant number of predicate
non-matches, the highlighter will scan well past the end of the window.
Solution
---
In the iterator returned from `iter_captures`, accept an optional
parameter `end_line`. If no parameter provided, the behavior is
unchanged, hence this is a non-invasive tweak.
Fixes: #25113nvim-treesitter/nvim-treesitter#5057
The name for_each_child is misleading and caused bugs.
After #25111, #25115, there are no more usages of `for_each_child` in Nvim.
In the future if we want to restore this functionality we can consider a
generalized vim.traverse(node, key, visitor) function.
Problem:
Folds are opened when the visible range changes even if there are no
modifications to the buffer, e.g, when using zM for the first time. If
the parsed tree was invalid, on_win re-parses and gets empty tree
changes, which triggers fold updates.
Solution:
Don't update folds in on_changedtree if there are no changes.
`LanguageTree:parse` is recursive, and calls
`LanguageTree:for_each_child`, which is also recursive.
That means that, starting from the third level (child of child of root),
nodes will be parsed twice.
Which then means that if the tree is N layers deep, there will be ~2^N
parses even if the branching factor is 1.
Now, why was the tree deepening with each character inserted? And why
did this only regress in #24647? These are mysteries for another time.
Fixes: #25104
Problem:
* The guessed botline might be smaller than the actual botline e.g. when
there are folds and the user is typing in insert mode. This may result
in incorrect treesitter highlights for injections.
* botline can be larger than the last line number of the buffer, which
results in errors when placing extmarks.
Solution:
* Take a more conservative approximation. I am not sure if it is
sufficient to guarantee correctness, but it seems to be good enough
for the case mentioned above.
* Clamp it to the last line number.
Co-authored-by: Lewis Russell <me@lewisr.dev>
Problem:
With treesitter fold, InsertLeave can be slow, because a single session
of insert mode may schedule multiple fold updates in on_bytes and
on_changedtree.
Solution:
Don't create duplicate autocmds.
According to `:h TSNode` docs, there's also `TSNode:sexpr()` and
`TSNode:has_error()` that is part of `TSNode` class, but this wasn't
documented in `treesitter/_meta.lua`.
Adding missing fields in so the types is similar to `:h TSNode`
Problem:
Treesitter highlighting is slow for large files with lots of injections.
Solution:
Only parse injections we are going to render during a redraw cycle.
---
- `LanguageTree:parse()` will no longer parse injections by default and
now requires an explicit range argument to be passed.
- `TSHighlighter` now parses injections incrementally during on_win
callbacks for the line range being rendered.
- Plugins which require certain injections to be parsed must run
`parser:parse({ start_row, end_row })` before using the tree.
* feat(treesitter): add injection language fallback
Problem: injection languages are often specified via aliases (e.g.,
filetype or in upper case), requiring custom directives.
Solution: include lookup logic (try as parser name, then filetype, then
lowercase) in LanguageTree itself and remove `#inject-language`
directive.
Co-authored-by: Lewis Russell <me@lewisr.dev>