|
|
_How to layout, shape, and render text in complex or mixed scripts._
|
|
|
_How to lay out, shape, and render text in complex or mixed scripts._
|
|
|
|
|
|
Table of Contents
|
|
|
|
|
|
- [Overview](#overview)
|
|
|
- [General Process](#general-process)
|
|
|
- [Internals](#internals)
|
|
|
- [Example](#example)
|
|
|
|
|
|
## Overview
|
|
|
|
|
|
## Key Points
|
|
|
As described in [Using Enhanced Font Interfaces](Using-Enhanced-Font-Interfaces),
|
|
|
in order to support complex or mixed scripts, we tuned and enhanced
|
|
|
MiniGUI's font interfaces in version 4.0.0.
|
|
|
|
|
|
But this is only a small part of the whole story. To show a text paragraph in
|
|
|
complex/mixed scripts, we must implement the complete APIs which conforming
|
|
|
to Unicode standards and specifications, including the Unicode
|
|
|
Bidirectional Algorithm (UAX #9), Unicode Line Breaking Algorithm
|
|
|
(UAX #14), Unicode Normalization Forms (UAX #15), Unicode Script Property
|
|
|
(UAX #24), Unicode Text Segmentation (UAX #29), Unicode Vertical
|
|
|
Text Layout (UAX #50), and so on.
|
|
|
|
|
|
On the basis of full support for Unicode, we provides APIs for laying out,
|
|
|
shaping, and rendering the text in complex or mixed scripts.
|
|
|
|
|
|
This document describe how to use these APIs to lay out, shape, and render
|
|
|
text in complex or mixed scripts.
|
|
|
|
|
|
Before we continue, let's clarify a few terms and concepts first.
|
|
|
|
|
|
A `script` means a set of letters of one human language, for example, Latin,
|
|
|
Han, Arabic, Hangul, and so on.
|
|
|
|
|
|
The `writing system` means the specific writing convention of a
|
|
|
language/script. For example, we write letters in Latin from left to right
|
|
|
horizontally then from top to bottom vertically, while we write traditional
|
|
|
Chinese words from top to bottom vertically then from right to left
|
|
|
horizontally.
|
|
|
|
|
|
The common languages, such as English and modern Chinese, are
|
|
|
`standard scripts`. In these scripts, there is a one-to-one relationship
|
|
|
between an encoded character (e.g. 0x4E2D in UTF-8) and the glyph (中)
|
|
|
that represents it. And we write the characters in standard scripts
|
|
|
always from left to right horizontally then from top to bottom vertically.
|
|
|
|
|
|
A `glyph` is a set of data which represents a specific character in a
|
|
|
visual or printable form. In computer, a glyph may be a bitmap or a
|
|
|
vector path data.
|
|
|
|
|
|
Generally, one `font` contains a lot of glyphs in bitmap or vector path data
|
|
|
for the characters in a specific language or script or a few similar
|
|
|
languages or scripts. For example, today, most fonts for East Asia markets
|
|
|
contain almost all glyphs for Chinese (both traditional and simplified),
|
|
|
Japanese, and Korea characters.
|
|
|
|
|
|
Systems and applications that handle the standard scripts do not need to
|
|
|
make a distinction between character processing and glyph processing.
|
|
|
In working with text in a standard script, it is most often convenient to
|
|
|
think only in terms of character processing, or simply text processing:
|
|
|
that is, the sequential rendering of glyphs representing character codes
|
|
|
as input in logical order.
|
|
|
|
|
|
The term `complex script` refers to any writing system that requires
|
|
|
some degree of character reordering and/or glyph processing to display,
|
|
|
print or edit. In other words, scripts for which Unicode logical order
|
|
|
and nominal glyph rendering of codepoints do not result in acceptable text.
|
|
|
Such scripts, examples of which are Arabic and the numerous Indic scripts
|
|
|
descended from the Brahmi writing system, are generally identifiable by
|
|
|
their morphographic characteristics: the changing of the shape or position
|
|
|
of glyphs as determined by their relationship to each other. It should be
|
|
|
noted that such processing is not optional, but is essential to correctly
|
|
|
rendering text in these scripts. Additional glyph processing to render
|
|
|
appropriately sophisticated typography may be desirable beyond the minimum
|
|
|
required to make the text readable.
|
|
|
|
|
|
The situation becomes more complicated when we process a text in
|
|
|
`mixed scripts`. That is, the text contains characters from different
|
|
|
scripts, for example, Latin, Arabic, and Chinese.
|
|
|
|
|
|
## General Process
|
|
|
|
|
|
To lay out, shape, and render a text in mixed scripts, you should call
|
|
|
`GetUCharsUntilParagraphBoundary` function first to convert
|
|
|
a multi-byte string to a Unicode string under the specified white space
|
|
|
rule, breaking rule, and transformation rule. For example, converting a
|
|
|
general C string in UTF-8 or GB18030 to a Uchar32 string by calling this
|
|
|
function. You can call `CreateLogFontForMChar2UChar` function to create
|
|
|
a dummy logfont object for this purpose in order to expense a minimal
|
|
|
memory.
|
|
|
|
|
|
If the text is in simple scripts, like Latin or Chinese, you can call
|
|
|
`GetGlyphsExtentPointEx` function to lay out the paragraph. This function
|
|
|
returns a glyph string which can fit in a line with the specified
|
|
|
maximal extent and rendering flags. After this, you call
|
|
|
`DrawGlyphStringEx` function to draw the glyph string to the
|
|
|
specific position of a DC.
|
|
|
|
|
|
If the text is in complex and/or mixed scripts, like Arabic, Thai,
|
|
|
and Indic, you should create a TEXTRUNS object first by calling
|
|
|
`CreateTextRuns` function, then initialize the shaping engine for
|
|
|
laying out the text.
|
|
|
|
|
|
MiniGUI provides two types of shaping engine. One is the basic
|
|
|
shaping engine. The corresponding function is `InitBasicShapingEngine`.
|
|
|
The other is called complex shaping engine, which is based on HarfBuzz.
|
|
|
The corresponding function is `InitComplexShapingEngine`. The latter
|
|
|
one can give you a better shaping result.
|
|
|
|
|
|
After this, you should call `CreateLayout` to create a layout object
|
|
|
for laying out the text, then call `LayoutNextLine` to lay out the lines
|
|
|
one by one.
|
|
|
|
|
|
You can render the laid out lines by calling `DrawLayoutLine` function.
|
|
|
|
|
|
Finally, you call `DestroyLayout` and `DestroyTextRuns` to destroy
|
|
|
the layout object and text runs object.
|
|
|
|
|
|
Before rendering the glyphs laid out, you can also call `GetLayoutLineRect`
|
|
|
to get the line rectangle, or call `CalcLayoutBoundingRect` to get
|
|
|
the bounding rectangle of one paragraph.
|
|
|
|
|
|
## Internals
|
|
|
|
|
|
These new APIs provide a very flexible implementation for your apps
|
|
|
to process the complex scripts. The implementation is derived from
|
|
|
LGPL'd Pango, but we optimize and simplify the original implementation
|
|
|
in the following respects:
|
|
|
|
|
|
* We split the layout process into two stages. We get the text runs
|
|
|
(Pango items) in the first stage, and the text runs will keep as
|
|
|
constants for subsequent different layouts. In the second stage,
|
|
|
we create a layout object for a set of specific layout parameters,
|
|
|
and generates the lines one by one for the caller. This is useful
|
|
|
for an app like browser, it can reuse the text runs if the output
|
|
|
width or height changed, and it is no need to re-generate the text
|
|
|
runs because of the size change of the output rectangle.
|
|
|
|
|
|
* We use MiniGUI's fontname for the font attributes of text, and leave
|
|
|
the font selection and the glyph generating to MiniGUI's logfont
|
|
|
module. In this way, we simplify the layout process greatly.
|
|
|
|
|
|
* We always use Uchar32 string for the whole layout process. So the
|
|
|
code and the structures are clearer than original implementation.
|
|
|
|
|
|
* We provide two shaping engines for rendering the text. One is a
|
|
|
basic shaping engine and other is the complex shaping engine based
|
|
|
on HarfBuzz. The former can be used for some simple applications.
|
|
|
|
|
|
##
|
|
|
|
|
|
TBC... |
|
|
\ No newline at end of file |