marktree.c was originally constructed as a "generic" datatype,
to make the prototyping of its internal logic as simple as possible
and also as the usecases for various kinds of extmarks/decorations was not yet decided.
As a consequence of this, various extra indirections and allocations was
needed to use marktree to implement extmarks (ns/id pairs) and
decorations of different kinds (some which is just a single highlight
id, other an allocated list of virtual text/lines)
This change removes a lot of indirection, by making Marktree specialized
for the usecase. In particular, the namespace id and mark id is stored
directly, instead of the 64-bit global id particular to the Marktree
struct. This removes the two maps needed to convert between global and
per-ns ids.
Also, "small" decorations are stored inline, i.e. those who
doesn't refer to external heap memory anyway. That is highlights (with
priority+flags) are stored inline, while virtual text, which anyway
occurs a lot of heap allocations, do not. (previously a hack was used
to elide heap allocations for highlights with standard prio+flags)
TODO(bfredl): the functionaltest-lua CI version of gcc is having
severe issues with uint16_t bitfields, so splitting up compound
assignments and redundant casts are needed. Clean this up once we switch
to a working compiler version.
As per #14236, performing extmark cleanup in a certain namespace does
not guarantee removing all the extmarks inside given namespace.
The issue resides within the tree node removal method and results in
a couple of rare edge cases.
To demonstrate what causes this bug, I'll give an example covering one
of the edge cases.
=== AN EXAMPLE ===
(A) (B) (C) (D) (E)
--------- --------- --------- --------- ---------
<0, 1> <0, 1> <0, 1> <0, 1> <0, 1>
<0, 2> <0, 2> <0, 2> <0, 2> <0, 2>
<0, 3> <0, 3> <0, 3> <0, 3> <0, 3>
<0, 4> <0, 4> <0, 4> <0, 4> <0, 4>
<0, 5> <0, 5> <0, 5> <0, 5> <0, 5>
<0, 6> <0, 6> <0, 6> <0, 6> <0, 6>
<0, 7> <0, 7> <0, 7> <0, 7> <0, 7>
<0, 8> <0, 8> <0, 8> <0, 8> <0, 8>
<0, 9> <0, 9> * * <0, 9> * <0, 9>
[0, 10] * [0, 10] <0, 9> [0, 11] [0, 11]
[0, 11] [0, 11] [0, 11] [0, 12] [0, 12] *
[0, 12] [0, 12] [0, 12] [0, 13] [0, 13]
[0, 13] [0, 13] [0, 13] [0, 14] [0, 14]
[0, 14] [0, 14] [0, 14] [0, 15] [0, 15]
[0, 15] [0, 15] [0, 15] [0, 16] [0, 16]
[0, 16] [0, 16] [0, 16] [0, 17] [0, 17]
[0, 17] [0, 17] [0, 17] [0, 18] [0, 18]
[0, 18] [0, 18] [0, 18] [0, 19] [0, 19]
[0, 19] [0, 19] [0, 19] [0, 20] [0, 20]
[0, 20] [0, 20] [0, 20]
DIAGRAM EXPLANATION
* Every column is a state of the marktree at a certain stage.
* To make it simple, I don't draw the whole tree. What you see are
2 leftmost parent nodes ([0, 10], [0, 20]) and their children placed
in order `MarkTreeIter` would iterate through. From top to bottom.
* Numbers on this diagram represent extmark coordinates. Relative
positioning and actual mark IDs used by the marktree are avoided
for simplicity.
* 2 types of brackets around coordinates represent 2 different
extmark namespaces (`ns_id`s).
* '*' shows iterator position.
ACTUAL EXPLANATION
Let's assume, we have two sets of extmarks from 2 different plugins:
* Plugin1: <0, 1-9>
* Plugin2: [0, 10-20]
1. Plugin2 calls
`vim.api.nvim_buf_clear_namespace(buf_handle, ns_id, 0, -1)`
to clear all its extmarks which results in `extmark_clear` call.
2. The iteration process goes on ignoring extmarks with irrelevant
`ns_id` from Plugin1, until it reaches [0, 10], entering state (A).
3. At the end of cleaning up process, `marktree_del_itr` gets called.
This function is supposed to remove given node and, if necessary,
restructure the tree. Also, move the iterator to the next node.
The bug occurs in this function.
4. The iterator goes backwards to the node's last child, to put it
in the place of its deleted parent later. (B)
5. The parent node is deleted and replaced with its child node. (C)
6. Since now this node has 8 children, which is less than
`MT_BRANCH_FACTOR - 1`, it get's merged with the next node. (D)
7. Finally, since at (B) the iterator went backward, it goes forward
twice, skipping [0, 11] node, causing this extmark to persist,
causing the bug. (E)
ANALYSIS AND SOLUTION
The algorithm works perfectly when the parent node gets replaced by
its child, but no merging occurs. I.e. the exact same diagram,
but without the (D) stage. If not for (D), it would iterate to <0, 9>
and then to [0, 11]. So, iterating twice makes sense. The actual problem
is in (C) stage, because the iterator index isn't adjusted and still
pointing to no longer existent node. So my solution is to adjust
iterator index after removing the child node.
More info: https://github.com/neovim/neovim/pull/14719
This is inspired by Atom's "marker index" data structure to efficiently
adjust marks to text insertions deletions, but uses a wide B-tree
(derived from kbtree) to keep the nesting level down.