Use the grapheme break algorithm from utf8proc to support grapheme
clusters from recent unicode versions.
Handle variant selector VS16 turning some codepoints into double-width
emoji. This means we need to use ptr2cells rather than char2cells when
possible.
Specifically, functions that are run in the context of the test runner
are put in module `test/testutil.lua` while the functions that are run
in the context of the test session are put in
`test/functional/testnvim.lua`.
Closes https://github.com/neovim/neovim/issues/27004.
This is the command invoked repeatedly to make the changes:
:%s/^\(.*\)|\%(\*\(\d\+\)\)\?$\n\1|\%(\*\(\d\+\)\)\?$/\=submatch(1)..'|*'..(max([str2nr(submatch(2)),1])+max([str2nr(submatch(3)),1]))/g
Problem: buffer text with composing chars are converted from UTF-8
to an array of up to seven UTF-32 values and then converted back
to UTF-8 strings.
Solution: Convert buffer text directly to UTF-8 based schar_T values.
The limit of the text size is now in schar_T bytes, which is currently
31+1 but easily could be raised as it no longer multiplies the size
of the entire screen grid when not used, the full size is only required
for temporary scratch buffers.
Also does some general cleanup to win_line text handling, which was
unnecessarily complicated due to multibyte rendering being an "opt-in"
feature long ago. Nowadays, a char is just a char, regardless if it consists
of one ASCII byte or multiple bytes.
The 'arabicshape' feature of vim is a transformation of unicode text to
make arabic and some related scripts look better at display time. In
particular the content of a cell will be adjusted depending on the
(original) content of the cells just before and after it.
This is implemented by the arabic_shape() function in nvim. Before this
commit, shaping was invoked in four different contexts:
- when rendering buffer text in win_line()
- in line_putchar() for rendering virtual text
- as part of grid_line_puts, used by messages and statuslines and
similar
- as part of draw_cmdline() for drawing the cmdline
This replaces all these with a post-processing step in grid_put_linebuf(),
which has become the entry point for all text rendering after recent
refactors.
An aim of this is to make the handling of multibyte text yet simpler.
One of the main reasons multibyte chars needs to be "parsed" into
codepoint arrays of composing chars is so that these could be inspected
for the purpose of shaping. This can likely be vastly simplified in many
contexts where only the total length (in bytes) and width of composed
char is needed.
Previously, a screen cell would occupy 28+4=32 bytes per cell
as we always made space for up to MAX_MCO+1 codepoints in a cell.
As an example, even a pretty modest 50*80 screen would consume
50*80*2*32 = 256000, i e a quarter megabyte
With the factor of two due to the TUI side buffer, and even more when
using msg_grid and/or ext_multigrid.
This instead stores a 4-byte union of either:
- a valid UTF-8 sequence up to 4 bytes
- an escape char which is invalid UTF-8 (0xFF) plus a 24-bit index to a
glyph cache
This avoids allocating space for huge composed glyphs _upfront_, while
still keeping rendering such glyphs reasonably fast (1 hash table lookup
+ one plain index lookup). If the same large glyphs are using repeatedly
on the screen, this is still a net reduction of memory/cache
consumption. The only case which really gets worse is if you blast
the screen full with crazy emojis and zalgo text and even this case
only leads to 4 extra bytes per char.
When only <= 4-byte glyphs are used, plus the 4-byte attribute code,
i e 8 bytes in total there is a factor of four reduction of memory use.
Memory which will be quite hot in cache as the screen buffer is scanned
over in win_line() buffer text drawing
A slight complication is that the representation depends on host byte
order. I've tested this manually by compling and running this
in qemu-s390x and it works fine. We might add a qemu based solution
to CI at some point.
Note for external UIs: Nvim can now emit multiple "redraw" event batches
before a final "flush" event is received. To retain existing behavior,
clients should make sure to update visible state at an explicit "flush"
event, not just the end of a "redraw" batch of event.
* Get rid of copy_object() blizzard in the auto-generated ui_event layer
* Special case "grid_line" by encoding screen state directly to
msgpack events with no intermediate API events.
* Get rid of the arcane notion of referring to the screen as the "shell"
* Array and Dictionary are kvec_t:s, so define them as such.
* Allow kvec_t:s, such as Arrays and Dictionaries, to be allocated with
a predetermined size within an arena.
* Eliminate redundant capacity checking when filling such kvec_t:s
with values.
It is perfectly fine and expected to detach from the screen just by
the UI disconnecting from nvim or exiting nvim. Just keep detach() in
screen_basic_spec, to get some coverage of the detach method itself.
This avoids hang on failure in many situations (though one could argue
that detach() should be "fast", or at least "as fast as resize",
which works in press-return already).
Never use detach() just to change the size of the screen, try_resize()
method exists for that specifically.
closes#8466closes#8664
Regression by 0d7daaad98.
- Fix length comparison.
- Fix loop(s) which iterated over all fields of array `pcc` even if it
was not filled up (try unicode 0x9f as statusline character).
Note about the tests:
- To input unicode with more than two hex digits you can use <C-v>U...:
a + U+fe20: a︠
a + U+fe20 + U+fe21: a︠︡