WackoWiki: Typografica Revisited

https://wackowiki.org/doc     Version: 8 (03.07.2024 08:02)

Typografica Revisited



see Project Typografica[link1]

In its current form Typografica focuses only on Latin and Kyrillic. With Unicode support it is turned off by default, because it has no rule sets for non-latin languages.

Rule sets for non-latin languages can be added and applied in the corresponding context.

1. Typografica

static
dynamic

Transforms "&" quotes into either English quotes or angle quotes when enabled.

However which quotes are chosen, should be determined by the provided context, which can be the
  1. text language
  2. page language
  3. user language



Elar2000[link2]:

It took me some time to read about the quotes in various languages as there are not only Latin and Cyrillic quotes, there are lots of quote types that are default for various keyboard layouts:
https://en.wikipedia.org/wiki/Quotation_mark
https://en.wikipedia.org/wiki/Keyboard_layout

Additional options for typografica may include moving of native quote symbols (and maybe some other native symbols) to lang.XX.php variables for certain lanuages to ensure the proper typography in various languages.
And at the same time the regexps in Typorgafica should evolve to work with lang.XX.php variables. But this should be done carefully taking the performance into consideration.


Turn off English quotes for non-english Latin pages:
Spanish, German
diff --git a/src/formatter/class/typografica.php b/src/formatter/class/typografica.php
index beeb383..ce607bb 100644
--- a/src/formatter/class/typografica.php
+++ b/src/formatter/class/typografica.php
@@ -222,7 +222,7 @@
         }

         // 1. English quotes (\p{Latin} only)
-        if ($this->settings['quotes'])
+        if ($this->settings['quotes'] && $this->options['lang'] == 'en')
         {
             $data    = str_replace('""',  '""',  $data);
             $data    = str_replace('"."', '"."', $data);

2. Quotes

Input Name Sign Example
english quotes
left double quotation mark U+201C “simple data”
right double quotation mark U+201D
double low quotation mark U+201E „simple data‟
double high reversed quotation mark U+201F
angle quotes
« left guillemet U+00AB «simple data»
» right guillemet U+00BB

  1. english quotes: \p{Latin} and English only
  2. angle quotes: \p{Cyrillic} and \p{Latin} only?

3. Tests


one "english" two "русский" three

"english"

"русский"

<text>



(123) 567-890

- dash
-- long dash

(c)
(r)
(tm)
(p)
+-
^C
^F


4. Settings

pass lang as option -> page_lang -> user_lang (English quotes, etc.)

%%(notypo) %% wrapper -> page setting to turn it off (?)

(c), (r), (p) are rather problematic because this syntax may conflict when used as inline counter or list.
Suggestion: disable all tree options by default

4.1. Defaults

dashglue is turned off by default (see formatter/typografica.php)

public array $settings	= [
	'inches'	=> 1, // convert inches into &quot;
	'apostroph'	=> 0, // apostrophe converter
	'laquo'		=> 1, // angle quotes
	'quotes'	=> 1, // English quotes
	'dash'		=> 1, // (150) - middle dash
	'emdash'	=> 1, // (151) - long dash by two minus
	'(c)'		=> 1, // special characters, as you know
	'(r)'		=> 1,
	'(tm)'		=> 1,
	'(p)'		=> 1,
	'+-'		=> 1,
	'degrees'	=> 1, // degree character
	'[--]'		=> 1, // indents like $Indent*
	'dashglue'	=> 1, // dash glue
	'wordglue'	=> 1, // word glue
	'spacing'	=> 1, // comma and spacing, exchange
	'phones'	=> 1, // phone number processing
	'html'		=> 0  // HTML tags ban
];	

5. Special symbols

The main function of the page is to replace it with auto-correct characters. Pay attention to the quotes-Christmas trees and the processing of cons. Minuses are converted to dashes only if surrounded by delimiters (spaces, line breaks, tabs)

Input Output HTML
"русский" «русский» « »
"english" “english” &#147; &#148;
- (minus) &#150; average dash
-- (2 minus) &#151; long dash
(c) © ©
(r) ® <sup>®</sup>
(p) § §
+- ± ±
^C °C °C
^F °F °F

6. Macros

Red line
Small red line: <-> (insert /z.gif length 25 pixels.)
Large red line: <-> (insert /z.gif length 50 pixels.)

7. Heuristics


Translating strings
Enabled heuristics replaces line feeds on <br>. You can only include the replacement of double transfers (as an alternative to paragraphs).

Prepositions and nbsp;
Bold heuristics suggests that short words (1-3 characters) should not break away from following them. Therefore, it replaces the separators following a short word with a non-breaking space.

Dashes and nbsp;
Conclusion of all words separated by dashes into <nobr> tags.

Commas and spaces
Control negligence of the author. Removes random spaces before commas and periods.

Prevent HTML tags
Prevents the use of HTML tags and special sequences &xxxx; in the text.