Problem:
A region managed by an injected parser may shrink after re-running the
injection query. If the updated region goes out of the range to be
parsed, then the corresponding tree will remain outdated, possibly
retaining the nodes that shouldn't exist anymore. This results in
outdated highlights.
Solution:
Re-parse an invalid tree if its region intersects the range to be
parsed.
When parsing with a range, languagetree looks up injections and adds
them if needed. This explicitly invalidates parser, making `is_valid`
report `false` both when including and excluding children.
This is an attempt to describe desired behaviour of `is_valid` in tests,
with what ended up being a single line change to satisfy them.
This is incorrect in the following scenario:
1. The language tree is Lua > Vim > Lua.
2. An edit simultaneously wipes out the `_regions` of all nodes, while
taking the Vim injection off-screen.
3. The Vim injection is not re-parsed, so the child Lua `_regions` is
still `nil`.
4. The child Lua is assumed, incorrectly, to occupy the whole document.
5. This causes the injections to be parsed again, resulting in Lua > Vim
> Lua > Vim.
6. Now, by the same process, Vim ends up with its range assumed over the
whole document. Now the parse is broken and results in broken
highlighting and poor performance.
It should be fine to instead treat an unparsed node as occupying
nothing (i.e. effectively non-existent). Since, either:
- The parent was just parsed, hence defining `_regions`
- The parent was not just parsed, in which case this node doesn't need
to be parsed either.
Also, the name `has_regions` is confusing; it seems to simply
mean the opposite of "root" or "full_document". However, this PR does
not touch it.
Memoizes a function, using a custom function to hash the arguments.
Private for now until:
- There are other places in the codebase that could benefit from this
(e.g. LSP), but might require other changes to accommodate.
- Invalidation of the cache needs to be controllable. Using weak tables
is an acceptable invalidation policy, but it shouldn't be the only
one.
- I don't think the story around `hash_fn` is completely thought out. We
may be able to have a good default hash_fn by hashing each argument,
so basically a better 'concat'.
Problem:
With incremental injection parsing, injected languages' parsers parse
only the relevant regions and stores the result in _trees with the index
of the corresponding region. Therefore, there can be holes in _trees.
Solution:
* Use generic table functions where appropriate.
* Fix type annotations and docs.
Problem:
It doesn't make much sense to flatten each region (= list of ranges).
This coincidentally worked for region with a single range.
Solution:
Custom function for combining regions.
The name for_each_child is misleading and caused bugs.
After #25111, #25115, there are no more usages of `for_each_child` in Nvim.
In the future if we want to restore this functionality we can consider a
generalized vim.traverse(node, key, visitor) function.
`LanguageTree:parse` is recursive, and calls
`LanguageTree:for_each_child`, which is also recursive.
That means that, starting from the third level (child of child of root),
nodes will be parsed twice.
Which then means that if the tree is N layers deep, there will be ~2^N
parses even if the branching factor is 1.
Now, why was the tree deepening with each character inserted? And why
did this only regress in #24647? These are mysteries for another time.
Fixes: #25104
Problem:
Treesitter highlighting is slow for large files with lots of injections.
Solution:
Only parse injections we are going to render during a redraw cycle.
---
- `LanguageTree:parse()` will no longer parse injections by default and
now requires an explicit range argument to be passed.
- `TSHighlighter` now parses injections incrementally during on_win
callbacks for the line range being rendered.
- Plugins which require certain injections to be parsed must run
`parser:parse({ start_row, end_row })` before using the tree.
* feat(treesitter): add injection language fallback
Problem: injection languages are often specified via aliases (e.g.,
filetype or in upper case), requiring custom directives.
Solution: include lookup logic (try as parser name, then filetype, then
lowercase) in LanguageTree itself and remove `#inject-language`
directive.
Co-authored-by: Lewis Russell <me@lewisr.dev>
When an injection has not set include children, make sure not to add
the injection if no ranges are determined.
This could happen when there is an injection with a child that has the
same range as itself. e.g. consider this Makefile snippet
```make
foo:
$(VAR)
```
Line 2 has an injection for bash and a make variable reference. If
include-children isn't set (default), then there is no range on line 2
to inject since the variable reference needs to be excluded.
This caused the language tree to return an empty range, which the parser
now interprets to mean the full buffer. This caused makefiles to have
completely broken highlighting.
* docs(lua): teach lua2dox how to table
* docs(lua): teach gen_vimdoc.py about local functions
No more need to mark local functions with @private
* docs(lua): mention @nodoc and @meta in dev-lua-doc
* fixup!
Co-authored-by: Justin M. Keyes <justinkz@gmail.com>
---------
Co-authored-by: Justin M. Keyes <justinkz@gmail.com>
- Add bindings to Treesitter ts_parser_set_logger and ts_parser_logger
- Add logfile with path STDPATH('log')/treesitter.c
- Rework existing LanguageTree loggin to use logfile
- Begin implementing log levels for vim.g.__ts_debug
When injections are added or removed make sure to:
- invoke 'changedtree' callbacks for when new trees are added.
- invoke 'changedtree' callbacks for when trees are invalidated
- redraw regions when languagetree children are removed
Problem:
Codebase inconsistently binds vim.api onto a or api.
Solution:
Use api everywhere. a as an identifier is too short to have at the
module level.
Never return the changes an only notify them using the `on_changedtree`
callback.
It is not guaranteed for a plugin that it'll be the first one to call
`tree:parse()` and thus get the changes.
Closes#19915
Problem:
gen_vimdoc.py / lua2dox.lua does not support @defgroup or \defgroup
except for "api-foo" modules.
Solution:
Modify `gen_vimdoc.py` to look for section names based on `helptag_fmt`.
TODO:
- Support @module ?
https://github.com/LuaLS/lua-language-server/wiki/Annotations#module
Problem:
Help tags like vim.treesitter.language.add() are confusing because
`vim.treesitter.language` is (thankfully) not a user-facing module.
Solution:
Ignore the "fstem" when generating "treesitter" tags.
Problem:
Treesitter injections are slow because all injected trees are invalidated on every change.
Solution:
Implement smarter invalidation to avoid reparsing injected regions.
- In on_bytes, try and update self._regions as best we can. This PR just offsets any regions after the change.
- Add valid flags for each region in self._regions.
- Call on_bytes recursively for all children.
- We still need to run the query every time for the top level tree. I don't know how to avoid this. However, if the new injection ranges don't change, then we re-use the old trees and avoid reparsing children.
This should result in roughly a 2-3x reduction in tree parsing when the comment injections are enabled.
Problem:
vim.treesitter does not know how to map a specific filetype to a parser.
This creates problems since in a few places (including in vim.treesitter itself), the filetype is incorrectly used in place of lang.
Solution:
Add an API to enable this:
- Add vim.treesitter.language.add() as a replacement for vim.treesitter.language.require_language().
- Optional arguments are now passed via an opts table.
- Also takes a filetype (or list of filetypes) so we can keep track of what filetypes are associated with which langs.
- Deprecated vim.treesitter.language.require_language().
- Add vim.treesitter.language.get_lang() which returns the associated lang for a given filetype.
- Add vim.treesitter.language.register() to associate filetypes to a lang without loading the parser.