The removes the previous restriction that nvim_buf_set_extmark()
could not be used to highlight arbitrary multi-line regions
The problem can be summarized as follows: let's assume an extmark with a
hl_group is placed covering the region (5,0) to (50,0) Now, consider
what happens if nvim needs to redraw a window covering the lines 20-30.
It needs to be able to ask the marktree what extmarks cover this region,
even if they don't begin or end here.
Therefore the marktree needs to be augmented with the information covers
a point, not just what marks begin or end there. To do this, we augment
each node with a field "intersect" which is a set the ids of the
marks which overlap this node, but only if it is not part of the set of
any parent. This ensures the number of nodes that need to be explicitly
marked grows only logarithmically with the total number of explicitly
nodes (and thus the number of of overlapping marks).
Thus we can quickly iterate all marks which overlaps any query position
by looking up what leaf node contains that position. Then we only need
to consider all "start" marks within that leaf node, and the "intersect"
set of that node and all its parents.
Now, and the major source of complexity is that the tree restructuring
operations (to ensure that each node has T-1 <= size <= 2*T-1) also need
to update these sets. If a full inner node is split in two, one of the
new parents might start to completely overlap some ranges and its ids
will need to be moved from its children's sets to its own set.
Similarly, if two undersized nodes gets joined into one, it might no
longer completely overlap some ranges, and now the children which do
needs to have the have the ids in its set instead. And then there are
the pivots! Yes the pivot operations when a child gets moved from one
parent to another.
marktree.c was originally constructed as a "generic" datatype,
to make the prototyping of its internal logic as simple as possible
and also as the usecases for various kinds of extmarks/decorations was not yet decided.
As a consequence of this, various extra indirections and allocations was
needed to use marktree to implement extmarks (ns/id pairs) and
decorations of different kinds (some which is just a single highlight
id, other an allocated list of virtual text/lines)
This change removes a lot of indirection, by making Marktree specialized
for the usecase. In particular, the namespace id and mark id is stored
directly, instead of the 64-bit global id particular to the Marktree
struct. This removes the two maps needed to convert between global and
per-ns ids.
Also, "small" decorations are stored inline, i.e. those who
doesn't refer to external heap memory anyway. That is highlights (with
priority+flags) are stored inline, while virtual text, which anyway
occurs a lot of heap allocations, do not. (previously a hack was used
to elide heap allocations for highlights with standard prio+flags)
TODO(bfredl): the functionaltest-lua CI version of gcc is having
severe issues with uint16_t bitfields, so splitting up compound
assignments and redundant casts are needed. Clean this up once we switch
to a working compiler version.
As per #14236, performing extmark cleanup in a certain namespace does
not guarantee removing all the extmarks inside given namespace.
The issue resides within the tree node removal method and results in
a couple of rare edge cases.
To demonstrate what causes this bug, I'll give an example covering one
of the edge cases.
=== AN EXAMPLE ===
(A) (B) (C) (D) (E)
--------- --------- --------- --------- ---------
<0, 1> <0, 1> <0, 1> <0, 1> <0, 1>
<0, 2> <0, 2> <0, 2> <0, 2> <0, 2>
<0, 3> <0, 3> <0, 3> <0, 3> <0, 3>
<0, 4> <0, 4> <0, 4> <0, 4> <0, 4>
<0, 5> <0, 5> <0, 5> <0, 5> <0, 5>
<0, 6> <0, 6> <0, 6> <0, 6> <0, 6>
<0, 7> <0, 7> <0, 7> <0, 7> <0, 7>
<0, 8> <0, 8> <0, 8> <0, 8> <0, 8>
<0, 9> <0, 9> * * <0, 9> * <0, 9>
[0, 10] * [0, 10] <0, 9> [0, 11] [0, 11]
[0, 11] [0, 11] [0, 11] [0, 12] [0, 12] *
[0, 12] [0, 12] [0, 12] [0, 13] [0, 13]
[0, 13] [0, 13] [0, 13] [0, 14] [0, 14]
[0, 14] [0, 14] [0, 14] [0, 15] [0, 15]
[0, 15] [0, 15] [0, 15] [0, 16] [0, 16]
[0, 16] [0, 16] [0, 16] [0, 17] [0, 17]
[0, 17] [0, 17] [0, 17] [0, 18] [0, 18]
[0, 18] [0, 18] [0, 18] [0, 19] [0, 19]
[0, 19] [0, 19] [0, 19] [0, 20] [0, 20]
[0, 20] [0, 20] [0, 20]
DIAGRAM EXPLANATION
* Every column is a state of the marktree at a certain stage.
* To make it simple, I don't draw the whole tree. What you see are
2 leftmost parent nodes ([0, 10], [0, 20]) and their children placed
in order `MarkTreeIter` would iterate through. From top to bottom.
* Numbers on this diagram represent extmark coordinates. Relative
positioning and actual mark IDs used by the marktree are avoided
for simplicity.
* 2 types of brackets around coordinates represent 2 different
extmark namespaces (`ns_id`s).
* '*' shows iterator position.
ACTUAL EXPLANATION
Let's assume, we have two sets of extmarks from 2 different plugins:
* Plugin1: <0, 1-9>
* Plugin2: [0, 10-20]
1. Plugin2 calls
`vim.api.nvim_buf_clear_namespace(buf_handle, ns_id, 0, -1)`
to clear all its extmarks which results in `extmark_clear` call.
2. The iteration process goes on ignoring extmarks with irrelevant
`ns_id` from Plugin1, until it reaches [0, 10], entering state (A).
3. At the end of cleaning up process, `marktree_del_itr` gets called.
This function is supposed to remove given node and, if necessary,
restructure the tree. Also, move the iterator to the next node.
The bug occurs in this function.
4. The iterator goes backwards to the node's last child, to put it
in the place of its deleted parent later. (B)
5. The parent node is deleted and replaced with its child node. (C)
6. Since now this node has 8 children, which is less than
`MT_BRANCH_FACTOR - 1`, it get's merged with the next node. (D)
7. Finally, since at (B) the iterator went backward, it goes forward
twice, skipping [0, 11] node, causing this extmark to persist,
causing the bug. (E)
ANALYSIS AND SOLUTION
The algorithm works perfectly when the parent node gets replaced by
its child, but no merging occurs. I.e. the exact same diagram,
but without the (D) stage. If not for (D), it would iterate to <0, 9>
and then to [0, 11]. So, iterating twice makes sense. The actual problem
is in (C) stage, because the iterator index isn't adjusted and still
pointing to no longer existent node. So my solution is to adjust
iterator index after removing the child node.
More info: https://github.com/neovim/neovim/pull/14719
This is inspired by Atom's "marker index" data structure to efficiently
adjust marks to text insertions deletions, but uses a wide B-tree
(derived from kbtree) to keep the nesting level down.