... | ... | @@ -3,9 +3,10 @@ _How to lay out, shape, and render text in complex or mixed scripts._ |
|
|
Table of Contents
|
|
|
|
|
|
- [Overview](#overview)
|
|
|
- [General Process](#general-process)
|
|
|
* [Text in Standard Scripts](#text-in-standard-scripts)
|
|
|
* [Text in Complex Scripts](#text-in-complex-scripts)
|
|
|
- [APIs for Text Processing](#apis-for-text-processing)
|
|
|
* [New APIs Conforming to Unicode 12.0](#new-apis-conforming-to-unicode-120)
|
|
|
* [Processing Text in Standard Scripts](#processing-text-in-standard-scripts)
|
|
|
* [Processing Text in Complex Scripts](#processing-text-in-complex-scripts)
|
|
|
- [Internals](#internals)
|
|
|
- [Restrictions](#restrictions)
|
|
|
|
... | ... | @@ -78,9 +79,57 @@ The situation becomes more complicated when we process a text in |
|
|
`mixed scripts`. That is, the text contains characters from different
|
|
|
scripts, for example, Latin, Arabic, and Chinese.
|
|
|
|
|
|
## General Process
|
|
|
|
|
|
### Text in Standard Scripts
|
|
|
## APIs for Text Processing
|
|
|
|
|
|
### New APIs Conforming to Unicode 12.0
|
|
|
|
|
|
- New types:
|
|
|
1. `Uchar32`: the Unicode code point value of a Unicode character.
|
|
|
1. `Achar32`: the abstract character index value under a certain
|
|
|
charset/encoding. Under Unicode charset or encodings, the
|
|
|
abstract character index value will be identical to the Unicode
|
|
|
code point, i.e., Achar32 is equivalent to Uchar32 under this
|
|
|
situation.
|
|
|
1. `Glyph32`: the glyph index value in a device font. Note that
|
|
|
a Glyph32 value is always bound to a specific logfont object.
|
|
|
|
|
|
- New functions to determine the Unicode character properties:
|
|
|
1. `UCharGetCategory` for getting the general category of
|
|
|
a Unicode character.
|
|
|
1. `UCharGetBreakType` for getting the breaking type of
|
|
|
a Unicode character.
|
|
|
1. `UStrGetBreaks` for getting the breaking types of
|
|
|
a Unicode string.
|
|
|
1. `UCharGetBidiType` for getting the bidi type of
|
|
|
a Unicode character.
|
|
|
1. `UStrGetBidiTypes` for getting the bidi types of a Unicode
|
|
|
character string.
|
|
|
1. `UCharGetBracketType` for getting the bracketed character of a
|
|
|
Uchar32 character.
|
|
|
1. `UStrGetBracketTypes` for getting the bracketed characters of a
|
|
|
Uchar32 string.
|
|
|
1. `UCharGetMirror` for getting the mirrored character of a Uchar32
|
|
|
character.
|
|
|
1. `UCharGetJoiningType` for getting the joining type of a Uchar32
|
|
|
character.
|
|
|
1. `UStrGetJoiningTypes` for getting the joining types of a Uchar32
|
|
|
string.
|
|
|
1. `UBidiGetParagraphDir` for getting the base paragraph direction
|
|
|
of a single paragraph.
|
|
|
1. `UBidiGetParagraphEmbeddingLevels` for getting the bidi embedding
|
|
|
levels of a paragraph.
|
|
|
1. `UBidiReorderLine`, `UBidiShapeMirroring`, `UBidiJoinArabic`,
|
|
|
` UBidiShapeArabic`, and `UBidiShape` for doing bidi-aware
|
|
|
mirroring, joining, and shaping.
|
|
|
1. `UCharGetScriptType` for getting the script type of a Uchar32
|
|
|
character.
|
|
|
|
|
|
MiniGUI also provides some utilities/helpers for Unicode character
|
|
|
conversion, such as from lower case to upper case, single width to
|
|
|
full width. Please see MiniGUI API reference document for the detailed
|
|
|
description.
|
|
|
|
|
|
### Processing Text in Standard Scripts
|
|
|
|
|
|
If you can determine the text to process is in standard scripts, you
|
|
|
can still to use MiniGUI GDI functions (`TextOut`, `DrawText` and so on)
|
... | ... | @@ -88,7 +137,7 @@ to show the text. For more information, please refer to: |
|
|
|
|
|
<http://www.minigui.com/doc-api-ref-minigui-sa-4.0.0/html/group__text__output__fns.html>
|
|
|
|
|
|
### Text in Complex Scripts
|
|
|
### Processing Text in Complex Scripts
|
|
|
|
|
|
Before dealing with the bidirectional algorithm and the glyph shaping,
|
|
|
we first need to divide the text into paragraphs and get the relevant
|
... | ... | @@ -218,9 +267,8 @@ please refer to: |
|
|
|
|
|
Note that, the white space rule, character transformation rule, word breaking rule,
|
|
|
line breaking policy, and layout flags used by the new APIs conform
|
|
|
to the CSS Text Module Level 3 specification:
|
|
|
|
|
|
<https://www.w3.org/TR/css-text-3/>
|
|
|
to the specifications of [CSS Text Module Level 3]
|
|
|
and [CSS Writing Modes Level 3].
|
|
|
|
|
|
## Internals
|
|
|
|
... | ... | @@ -268,3 +316,5 @@ flag is supported, and the basic shaping engine can not support scripts |
|
|
which need to re-position the glyphs, such as Indic, Tibetan. For this
|
|
|
purpose, you must use the complex shaping engine with OpenType fonts.
|
|
|
|
|
|
[CSS Text Module Level 3]: https://www.w3.org/TR/css-text-3/
|
|
|
[CSS Writing Modes Level 3]: https://www.w3.org/TR/css-writing-modes-3/ |