Problem:
Treesitter injections are slow because all injected trees are invalidated on every change.
Solution:
Implement smarter invalidation to avoid reparsing injected regions.
- In on_bytes, try and update self._regions as best we can. This PR just offsets any regions after the change.
- Add valid flags for each region in self._regions.
- Call on_bytes recursively for all children.
- We still need to run the query every time for the top level tree. I don't know how to avoid this. However, if the new injection ranges don't change, then we re-use the old trees and avoid reparsing children.
This should result in roughly a 2-3x reduction in tree parsing when the comment injections are enabled.
Problem:
vim.treesitter does not know how to map a specific filetype to a parser.
This creates problems since in a few places (including in vim.treesitter itself), the filetype is incorrectly used in place of lang.
Solution:
Add an API to enable this:
- Add vim.treesitter.language.add() as a replacement for vim.treesitter.language.require_language().
- Optional arguments are now passed via an opts table.
- Also takes a filetype (or list of filetypes) so we can keep track of what filetypes are associated with which langs.
- Deprecated vim.treesitter.language.require_language().
- Add vim.treesitter.language.get_lang() which returns the associated lang for a given filetype.
- Add vim.treesitter.language.register() to associate filetypes to a lang without loading the parser.
Problem: Some injections (like markdown) allow specifying arbitrary
language names for code blocks, which may be lead to errors when
looking for a corresponding parser in runtime path.
Solution: Validate that the language name only contains alphanumeric
characters and `_` (e.g., for `c_sharp`) and error otherwise.
Extend the capabilities of is_os to detect more platforms such as
freebsd and openbsd. Also remove `iswin()` helper function as it can be
replaced by `is_os("win")`.
This is essentially a convenience wrapper around the `pending()`
function, similar to `skip_fragile()` but more general-purpose.
Also remove `pending_win32` function as it can be replaced by
`skip(iswin())`.
fix(treesitter): get_captures_at_position returns metadata
Return the full `metadata` table for the capture instead of just the
priority.
Further cleanup of related docs.
The @error capture is used for tree-sitter's ERROR node, which indicates
a parsing error -- which can be quite frequent (and jarring) while typing.
Users can still manually `hi link @error Error` in their config.
This removes the support for defining links via
vim.treesitter.highlighter.hl_map (never documented, but plugins did
anyway), or the uppercase-only `@FooGroup.Bar` to `FooGroup` rule.
The fallback is now strictly `@foo.bar.lang` to `@foo.bar` to `@foo`,
and casing is irrelevant (as it already was outside of treesitter)
For compatibility, define default links to builting syntax groups
as defined by pre-existing color schemes
Problem:
1. CI logs have too many (40+) logs mentioning SIGHUP:
```
WRN 2022-06-18T16:05:47.075 T3568.22499.0/c deadly_signal:177: got signal 1 (SIGHUP)
WRN 2022-06-18T16:05:47.273 T3569.91095.0/c deadly_signal:177: got signal 1 (SIGHUP)
WRN 2022-06-18T16:05:47.651 T3570.59545.0/c deadly_signal:177: got signal 1 (SIGHUP)
```
2. TS parser test still sometimes fails on BSD CI.
3. remote_spec test fails too often.
Solution:
1. Log deadly signals at INFO level. It hasn't been helpful in CI, and
for local troubleshooting it's reasonable to adjust the loglevel as
needed.
2. Adjust the TS parser test again. ref #18911
3. Skip the remote_spec test. The `--remote` feature was merged before
it was fully formed and needs to be revisited.
The "first run" has high variability. Looks like the test failures
correlate with 545dc82c1b
, which makes sense because that improves "first run" performance.
So the `1000*` factor of this test could be adjusted to e.g. `300*` or `500*`.
ref https://github.com/neovim/neovim/pull/16945
When trying to load a language parser, escape the value of
the language.
With language injection, the language might be picked up from the
buffer. If this value is erroneous it can cause `nvim_get_runtime_file`
to hard error.
E.g., the markdown expression `~~~{` will extract '{' as a language and
then try to get the parser using `parser/{*` as the pattern.
Based on https://github.com/neovim/neovim/pull/14445
This extends `vim.treesitter.query.get_node_text` to return the text
that spans a node's range even if start_row ~= end_row.
For the case of Clojure and other Lisp syntax highlighting, it is
necessary to create huge regexps consisting of hundreds of symbols with
the pipe (|) character. To make things more difficult, these Lisp
symbols sometimes consists of special characters that are themselves
part of special regexp characters like '*'. In addition to being
difficult to maintain, it's performance is suboptimal.
This patch introduces a new predicate to perform 'source' matching in
amortized constant time. This is accomplished by compiling a hash table
on the first use.
Add support for default start and end row when omitted in the
query:iter_captures and query:iter_matches functions.
When the start and end row values are omitted, the values of the given
node is used. The end row value is incremented by 1 to include the node end
row in the match.
Updated tests and docs accordingly.