How to lay out, shape, and render text in complex or mixed scripts.
Table of Contents
Overview
As described in Using Enhanced Font Interfaces, in order to support complex or mixed scripts, we tuned and enhanced MiniGUI's font interfaces in version 4.0.0.
But this is only a small part of the whole story. To show a text paragraph in complex/mixed scripts, we must implement the complete APIs which conforming to Unicode standards and specifications, including the Unicode Bidirectional Algorithm (UAX#9), Unicode Line Breaking Algorithm (UAX#14), Unicode Normalization Forms (UAX#15), Unicode Script Property (UAX#24), Unicode Text Segmentation (UAX#29), Unicode Vertical Text Layout (UAX#50), and so on.
On the basis of full support for Unicode, we provides APIs for laying out, shaping, and rendering the text in complex or mixed scripts. This document describes how to use these APIs.
Before we continue, let's clarify a few terms and concepts first.
A script
means a set of letters of one human language, for example, Latin,
Han, Arabic, Hangul, and so on.
The writing system
means the specific writing convention of a
language/script. For example, we write letters in Latin from left to right
horizontally then from top to bottom vertically, while we write traditional
Chinese words from top to bottom vertically then from right to left
horizontally.
The common languages, such as English and modern Chinese, are
standard scripts
. In these scripts, there is a one-to-one relationship
between an encoded character (e.g. 0x4E2D in UTF-8) and the glyph (中)
that represents it. And we write the characters in standard scripts
always from left to right horizontally then from top to bottom vertically.
A glyph
is a set of data which represents a specific character in a
visual or printable form. In computer, a glyph may be a bitmap or a
vector path data.
Generally, one font
contains a lot of glyphs in bitmap or vector path data
for the characters in a specific language or script or a few similar
languages or scripts. For example, today, most fonts for East Asia markets
contain almost all glyphs for Chinese (both traditional and simplified),
Japanese, and Korea characters.
Systems and applications that handle the standard scripts do not need to make a distinction between character processing and glyph processing. In working with text in a standard script, it is most often convenient to think only in terms of character processing, or simply text processing: that is, the sequential rendering of glyphs representing character codes as input in logical order.
The term complex script
refers to any writing system that requires
some degree of character reordering and/or glyph processing to display,
print or edit. In other words, scripts for which Unicode logical order
and nominal glyph rendering of codepoints do not result in acceptable text.
Such scripts, examples of which are Arabic and the numerous Indic scripts
descended from the Brahmi writing system, are generally identifiable by
their morphographic characteristics: the changing of the shape or position
of glyphs as determined by their relationship to each other. It should be
noted that such processing is not optional, but is essential to correctly
rendering text in these scripts. Additional glyph processing to render
appropriately sophisticated typography may be desirable beyond the minimum
required to make the text readable.
The situation becomes more complicated when we process a text in
mixed scripts
. That is, the text contains characters from different
scripts, for example, Latin, Arabic, and Chinese.
General Process
To lay out, shape, and render a text in mixed scripts, you should call
GetUCharsUntilParagraphBoundary
function first to convert
a multi-byte string to a Unicode string under the specified white space
rule, breaking rule, and transformation rule. For example, converting a
general C string in UTF-8 or GB18030 to a Uchar32 string by calling this
function. You can call CreateLogFontForMChar2UChar
function to create
a dummy logfont object for this purpose in order to expense a minimal
memory.
If the text is in simple scripts, like Latin or Chinese, you can call
GetGlyphsExtentPointEx
function to lay out the paragraph. This function
returns a glyph string which can fit in a line with the specified
maximal extent and rendering flags. After this, you call
DrawGlyphStringEx
function to draw the glyph string to the
specific position of a DC.
If the text is in complex and/or mixed scripts, like Arabic, Thai,
and Indic, you should create a TEXTRUNS object first by calling
CreateTextRuns
function, then initialize the shaping engine for
laying out the text.
MiniGUI provides two types of shaping engine. One is the basic
shaping engine. The corresponding function is InitBasicShapingEngine
.
The other is called complex shaping engine, which is based on HarfBuzz.
The corresponding function is InitComplexShapingEngine
. The latter
one can give you a better shaping result.
After this, you should call CreateLayout
to create a layout object
for laying out the text, then call LayoutNextLine
to lay out the lines
one by one.
You can render the laid out lines by calling DrawLayoutLine
function.
Finally, you call DestroyLayout
and DestroyTextRuns
to destroy
the layout object and text runs object.
Before rendering the glyphs laid out, you can also call GetLayoutLineRect
to get the line rectangle, or call CalcLayoutBoundingRect
to get
the bounding rectangle of one paragraph.
Internals
These new APIs provide a very flexible implementation for your apps to process the complex scripts. The implementation is derived from LGPL'd Pango, but we optimize and simplify the original implementation in the following respects:
-
We split the layout process into two stages. We get the text runs (Pango items) in the first stage, and the text runs will keep as constants for subsequent different layouts. In the second stage, we create a layout object for a set of specific layout parameters, and generates the lines one by one for the caller. This is useful for an app like browser, it can reuse the text runs if the output width or height changed, and it is no need to re-generate the text runs because of the size change of the output rectangle.
-
We use MiniGUI's fontname for the font attributes of text, and leave the font selection and the glyph generating to MiniGUI's logfont module. In this way, we simplify the layout process greatly.
-
We always use Uchar32 string for the whole layout process. So the code and the structures are clearer than original implementation.
-
We provide two shaping engines for rendering the text. One is a basic shaping engine and other is the complex shaping engine based on HarfBuzz. The former can be used for some simple applications.