diff --git a/doc/concept.md b/doc/concept.md new file mode 100644 index 0000000..e70b426 --- /dev/null +++ b/doc/concept.md @@ -0,0 +1,1814 @@ +## Complex rich text layout + +Complex text means a sequence of characters containing mixed scripts, languages and directionality, and potentially requiring sophisticated analysis and transformations for proper selection and display. Rich text consists of spans describing various formatting styles including font families, font styles, font sizes, colors, decorations, spacing and other properties. The following will be a walk-through of the complex rich text layout process as it will be implemented for this project. + +For this purpose, the string “We _will_ لقاء في **09:35** في ال 🏖️” (which translates to “We will meet at 09:35 at the (beach)”) has been chosen both because it is short enough to admit manual analysis and also contains text that showcases complex requirements. + +This document uses a series of tables to describe the sample text sequence as it is transformed by the layout process. It concludes with a set of mockup renders showing the desired result. + + +## Basic text analysis + +The API that will be implemented begins with constructing a “builder” object with the full paragraph of text for layout, followed by applying range based attributes for each desired style. For the sample text provided, this can be constructed like so: + + +``` +let mut builder = text.new_text_layout("We will لقاء في 09:35 في ال 🏖️"); +``` + + +Where `text` is an instance of a type implementing the piet `Text` trait. Given the source paragraph, it is possible to complete a font-independent analysis which includes script determination, BiDi level resolution and Unicode boundary detection. The following table shows the result of such an analysis for the sample text. + + +##### Table 1. Properties of basic text analysis + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Index + Byte Offset + Character + Codepoint + Script + BiDi Level + Boundary +
0 + 0 + W + U+0057 + Latin + 0 + word +
1 + 1 + e + U+0065 + Latin + 0 + +
2 + 2 + + U+0020 + Common + 0 + word +
3 + 3 + w + U+0077 + Latin + 0 + line +
4 + 4 + i + U+0069 + Latin + 0 + +
5 + 5 + l + U+006C + Latin + 0 + +
6 + 6 + l + U+006C + Latin + 0 + +
7 + 7 + + U+0020 + Common + 0 + word +
8 + 8 + ل + U+0644 + Arabic + 1 + line +
9 + 10 + ق + U+0642 + Arabic + 1 + +
10 + 12 + ا + U+0627 + Arabic + 1 + +
11 + 14 + ء + U+0621 + Arabic + 1 + +
12 + 16 + + U+0020 + Common + 1 + word +
13 + 17 + ف + U+0641 + Arabic + 1 + line +
14 + 19 + ي + U+064A + Arabic + 1 + +
15 + 21 + + U+0020 + Common + 1 + word +
16 + 22 + 0 + U+0030 + Common + 2 + line +
17 + 23 + 9 + U+0039 + Common + 2 + +
18 + 24 + : + U+003A + Common + 2 + word +
19 + 25 + 3 + U+0033 + Common + 2 + word +
20 + 26 + 5 + U+0035 + Common + 2 + +
21 + 27 + + U+0020 + Common + 1 + word +
22 + 28 + ف + U+0641 + Arabic + 1 + line +
23 + 30 + ي + U+064A + Arabic + 1 + +
24 + 32 + + U+0020 + Common + 1 + word +
25 + 33 + ا + U+0627 + Arabic + 1 + line +
26 + 35 + ل + U+0644 + Arabic + 1 + +
27 + 37 + + U+0020 + Common + 0 + word +
28 + 38 + 🏖️ + U+1F3D6 + Common + 0 + line +
+ + + +## Style application + +Styles are applied by assigning text attributes to various ranges within the specified text that is provided by the builder. According to the sample text as rendered above, the word “will” should receive an italic style while the time “09:35” should be rendered with a bold weight. The analysis table shows that the appropriate ranges (in byte offsets with inclusive start and exclusive end) are 3..7 and 22..27, respectively. These can be applied with the appropriate methods on the builder: + + +``` +text.new_text_layout("We will لقاء في 09:35 في ال 🏖️") + .range_attribute(3..7, FontStyle::Italic) + .range_attribute(22..27, FontWeight::BOLD) +``` + + +A truncated table describing style properties applied to the desired ranges is shown below. + + +##### Table 2. Style properties + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Byte Offset + Character + Style +
3 + w + italic +
4 + i + italic +
5 + l + italic +
6 + l + italic +
22 + 0 + bold +
23 + 9 + bold +
24 + : + bold +
25 + 3 + bold +
26 + 5 + bold +
+ + + +## Itemization + +This stage breaks the paragraph into runs that are suitable for shaping, selects appropriate fonts and maps each character to its nominal glyph identifier. + +The following table shows the sample text broken into items with assigned fonts and nominal glyph identifiers. Items are delineated by alternating colors. + + +##### Table 3. Itemized text + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Index + Byte Offset + Character + Font + Glyph ID + Item +
0 + 0 + W + Segoe UI + 58 + 0 +
1 + 1 + e + Segoe UI + 72 + 0 +
2 + 2 + + Segoe UI + 3 + 0 +
3 + 3 + w + Segoe UI Italic + 90 + 1 +
4 + 4 + i + Segoe UI Italic + 76 + 1 +
5 + 5 + l + Segoe UI Italic + 79 + 1 +
6 + 6 + l + Segoe UI Italic + 79 + 1 +
7 + 7 + + Segoe UI + 3 + 2 +
8 + 8 + ل + Segoe UI + 2341 + 3 +
9 + 10 + ق + Segoe UI + 2339 + 3 +
10 + 12 + ا + Segoe UI + 2317 + 3 +
11 + 14 + ء + Segoe UI + 2311 + 3 +
12 + 16 + + Segoe UI + 3 + 3 +
13 + 17 + ف + Segoe UI + 2338 + 3 +
14 + 19 + ي + Segoe UI + 2347 + 3 +
15 + 21 + + Segoe UI + 3 + 3 +
16 + 22 + 0 + Segoe UI Bold + 19 + 4 +
17 + 23 + 9 + Segoe UI Bold + 28 + 4 +
18 + 24 + : + Segoe UI Bold + 29 + 4 +
19 + 25 + 3 + Segoe UI Bold + 22 + 4 +
20 + 26 + 5 + Segoe UI Bold + 24 + 4 +
21 + 27 + + Segoe UI + 3 + 5 +
22 + 28 + ف + Segoe UI + 2338 + 5 +
23 + 30 + ي + Segoe UI + 2347 + 5 +
24 + 32 + + Segoe UI + 3 + 5 +
25 + 33 + ا + Segoe UI + 2317 + 5 +
26 + 35 + ل + Segoe UI + 2341 + 5 +
27 + 37 + + Segoe UI + 3 + 6 +
28 + 38 + 🏖️ + Segoe UI Emoji + 3130 + 7 +
+ + + +## Shaping + +This stage applies font, script and language specific processing rules, transforming each glyph run using substitutions and positioning adjustments. + +The following table shows the results of shaping, including substitutions and computed advances. Substituted glyphs are underlined. These substitutions provide the appropriate + +forms for Arabic cursive joining. + + +##### Table 4. Shaped glyph identifiers and advance widths. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Index + Byte Offset + Character + Glyph ID + Advance +
0 + 0 + W + 58 + 16.11 +
1 + 1 + e + 72 + 9.41 +
2 + 2 + + 3 + 4.92 +
3 + 3 + w + 90 + 12.95 +
4 + 4 + i + 76 + 4.66 +
5 + 5 + l + 79 + 4.66 +
6 + 6 + l + 79 + 4.66 +
7 + 7 + + 3 + 4.92 +
8 + 8 + ل + 2286 + 6.00 +
9 + 10 + ق + 2284 + 9.02 +
10 + 12 + ا + 2247 + 4.50 +
11 + 14 + ء + 2311 + 7.50 +
12 + 16 + + 3 + 4.92 +
13 + 17 + ف + 2280 + 9.02 +
14 + 19 + ي + 2293 + 15.02 +
15 + 21 + + 3 + 4.92 +
16 + 22 + 0 + 19 + 10.34 +
17 + 23 + 9 + 28 + 10.34 +
18 + 24 + : + 29 + 4.88 +
19 + 25 + 3 + 22 + 10.34 +
20 + 26 + 5 + 24 + 10.34 +
21 + 27 + + 3 + 4.92 +
22 + 28 + ف + 2280 + 9.02 +
23 + 30 + ي + 2293 + 15.02 +
24 + 32 + + 3 + 4.92 +
25 + 33 + ا + 2317 + 4.50 +
26 + 35 + ل + 2341 + 12.02 +
27 + 37 + + 3 + 4.92 +
28 + 38 + 🏖️ + 3130 + 24.70 +
+ + + +## Line breaking and reordering + +The final stage performs line breaking based on the boundary analysis shown in table 1. The runs in each line are then reordered according to the procedure specified by the Unicode bidirectional algorithm. + +The following table shows the final sequence of glyphs, in proper order and with fully specified positions. Bidirectional runs are reordered and right-to-left runs are reversed corresponding to the Unicode rules. + + +##### Table 5. Final glyph layout. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Index + Byte Offset + Character + Glyph ID + Position +
0 + 0 + W + 58 + 0.00, 20.00 +
1 + 1 + e + 72 + 16.11, 20.00 +
2 + 2 + + 3 + 25.52, 20.00 +
3 + 3 + w + 90 + 30.44, 20.00 +
4 + 4 + i + 76 + 43.39, 20.00 +
5 + 5 + l + 79 + 48.05, 20.00 +
6 + 6 + l + 79 + 52.70, 20.00 +
7 + 7 + + 3 + 57.36, 20.00 +
26 + 35 + ل + 2341 + 62.28, 20.00 +
25 + 33 + ا + 2317 + 74.30, 20.00 +
24 + 32 + + 3 + 78.80, 20.00 +
23 + 30 + ي + 2293 + 83.72, 20.00 +
22 + 28 + ف + 2280 + 98.73, 20.00 +
21 + 27 + + 3 + 107.75, 20.00 +
16 + 22 + 0 + 19 + 112.67, 20.00 +
17 + 23 + 9 + 28 + 123.02, 20.00 +
18 + 24 + : + 29 + 133.36, 20.00 +
19 + 25 + 3 + 22 + 138.23, 20.00 +
20 + 26 + 5 + 24 + 148.58, 20.00 +
15 + 21 + + 3 + 158.92, 20.00 +
14 + 19 + ي + 2293 + 163.84, 20.00 +
13 + 17 + ف + 2280 + 178.86, 20.00 +
12 + 16 + + 3 + 187.88, 20.00 +
11 + 14 + ء + 2311 + 192.80, 20.00 +
10 + 12 + ا + 2247 + 200.30, 20.00 +
9 + 10 + ق + 2284 + 204.80, 20.00 +
8 + 8 + ل + 2286 + 213.81, 20.00 +
27 + 37 + + 3 + 219.81, 20.00 +
28 + 38 + 🏖️ + 3130 + 224.73, 20.00 +
+ + + +## Mockups + +The following are renders representing complex rich text that will be produced using the layout process described in this document. + + +##### Mockup of the sample text. + +![mockup1](mockup1.png) + + + +##### A more sophisticated example from an earlier proof of concept. + +![mockup2](mockup2.png)