Typografica Revisited

1. Typografica


  • parsed once when saving a page (body_r) or file description


  • when used in a handler or action like files

Transforms "&" quotes into either English quotes or angle quotes when enabled.

However which quotes are chosen, should be determined by the provided context, which can be the

  1. text language
  2. page language
  3. user language


It took me some time to read about the quotes in various languages as there are not only Latin and Cyrillic quotes, there are lots of quote types that are default for various keyboard layouts:

Additional options for typografica may include moving of native quote symbols (and maybe some other native symbols) to lang.XX.php variables for certain lanuages to ensure the proper typography in various languages.
And at the same time the regexps in Typorgafica should evolve to work with lang.XX.php variables. But this should be done carefully taking the performance into consideration.

Turn off English quotes for non-english Latin pages:
Spanish, German

diff --git a/src/formatter/class/typografica.php b/src/formatter/class/typografica.php
index beeb383..ce607bb 100644
--- a/src/formatter/class/typografica.php
+++ b/src/formatter/class/typografica.php
@@ -222,7 +222,7 @@
         // 1. English quotes (\p{Latin} only)
-        if ($this->settings['quotes'])
+        if ($this->settings['quotes'] && $this->options['lang'] == 'en')
             $data    = str_replace('""',  '""',  $data);
             $data    = str_replace('"."', '"."', $data);


Input Name Sign Example
english quotes
left double quotation mark U+201C “simple data”
right double quotation mark U+201D
double low quotation mark U+201E „simple data‟
double high reversed quotation mark U+201F
angle quotes
« left guillemet U+00AB «simple data»
» right guillemet U+00BB

  1. english quotes: \p{Latin} and English only
  2. angle quotes: \p{Cyrillic} and \p{Latin} only?

3. Tests

one "english" two "русский" three




(123) 567-890

- dash
-- long dash


4. Settings

pass lang as option -> page_lang -> user_lang (English quotes, etc.)

%%(notypo) %% wrapper -> page setting to turn it off (?)

(c), (r), (p) are rather problematic because this syntax may conflict when used as inline counter or list.
Suggestion: disable all tree options by default

4.1. Defaults

dashglue is turned off by default (see formatter/typografica.php)

public array $settings	= [
	'inches'	=> 1, // convert inches into &quot;
	'apostroph'	=> 0, // apostrophe converter
	'laquo'		=> 1, // angle quotes
	'quotes'	=> 1, // English quotes
	'dash'		=> 1, // (150) - middle dash
	'emdash'	=> 1, // (151) - long dash by two minus
	'(c)'		=> 1, // special characters, as you know
	'(r)'		=> 1,
	'(tm)'		=> 1,
	'(p)'		=> 1,
	'+-'		=> 1,
	'degrees'	=> 1, // degree character
	'[--]'		=> 1, // indents like $Indent*
	'dashglue'	=> 1, // dash glue
	'wordglue'	=> 1, // word glue
	'spacing'	=> 1, // comma and spacing, exchange
	'phones'	=> 1, // phone number processing
	'html'		=> 0  // HTML tags ban

5. Special symbols

The main function of the page is to replace it with auto-correct characters. Pay attention to the quotes-Christmas trees and the processing of cons. Minuses are converted to dashes only if surrounded by delimiters (spaces, line breaks, tabs)

Input Output HTML
"русский" «русский» « »
"english" “english” &#147; &#148;
- (minus) &#150; average dash
-- (2 minus) &#151; long dash
(c) © ©
(r) ® <sup>®</sup>
(p) § §
+- ± ±
^C °C °C
^F °F °F

6. Macros

Red line
Small red line: <-> (insert /z.gif length 25 pixels.)
Large red line: <-> (insert /z.gif length 50 pixels.)

7. Heuristics

Translating strings
Enabled heuristics replaces line feeds on <br>. You can only include the replacement of double transfers (as an alternative to paragraphs).

Prepositions and nbsp;
Bold heuristics suggests that short words (1-3 characters) should not break away from following them. Therefore, it replaces the separators following a short word with a non-breaking space.

Dashes and nbsp;
Conclusion of all words separated by dashes into <nobr> tags.

Commas and spaces
Control negligence of the author. Removes random spaces before commas and periods.

Prevent HTML tags
Prevents the use of HTML tags and special sequences &xxxx; in the text.