WackoWiki Markup

Also available in Deutsch, Español and Русский.


This rather long article is about how the "Wacko markup" appeared and why it is the way it is now. If you do not know what "Wacko markup" is and, especially, what a "wiki" is, you should read a couple of articles of documentation at the WackoWiki project site and, especially an article on the wiki-concept.



1. About the Wacko markup

In short, the Wacko markup is a special way of formatting text, which allows with a minimum of special characters and minimal visual distortion of the "source text" to get a beautiful HTML after the program conversion.


Wacko markup capabilities are enough for the complete design of pages, large and small. Besides traditional modifiers of "italics", "bold", "underlined" and "strikethrough" text, there is also a marking of lists (including multilevel), headings, tables, pictures, citation highlighting and many other things.


Wacko markup is used in the wiki engine WackoWiki and in other projects of various developers. In order to embed a Wacko markup into your project, you just need to download and incorporate the following components into your project:

  • Wacko formatter, a PHP class that turns Wacko markup into HTML;
  • Visual editor for wikis. This is an optional, but extremely convenient addition to the Wacko formatter.

The source text for this article is written in the Wacko markup format and available for review.

2. Basic principles

Although We originally inherited the Wacko markup from the WakkaWiki project (previously available at http://wakkawiki.de, but now unavailable), it was based on clear principles that determined its further success.


Thus, the formation of markup rules, although it was done in a "evolutionary" way, was still guided by a small set of principles that helped markup remain convenient and natural.


The principles were invented to achieve the following goals:

  • reduce the likelihood of "accidental triggering" (i.e., when ordinary text is perceived as part of the markup);
  • achieve intuitive and easy to remember markup;
  • protect the formatted text environment from possible misprints by the author.

2.1. Semantics should be passed through the syntax

Any wiki markup is based on the principle WYTIWYG, so each Wacko markup rule tries to reflect the meaning (semantics) with its own design (syntax). Since the entire layout is done with simple text -- symbols that can be typed from the keyboard -- we have tried to find combinations that reflect the meaning:

  • **bold** -- asterisks for bold text, which is separated from the total mass;
  • //italics// -- italics is formed by braided lines;
  • --strikethrough-- and __underlining__ respectively;
  • The lists look the same as we used to write such lists in the text input fields.

Here we have moved away from the classical wiki markup with "only 5 apostrophes on each side".

2.2. Paired characters

As you have noticed, all the above examples of job markups contain paired characters. Thanks to this principle, the text "simply inserted from some other editor" does not require any special preprocessing -- it is unlikely that it will contain the "paired" characters in this way.


There are cases where triple, quadruple, and even 5-6 consecutive sequences are used to highlight titles at different levels. The more characters, the deeper the header level is.

2.3. Markup is in contact with the internal text

It is possible that you are one of those people who always write a dash as "two minuses" (like this: --). Anyway, the authors are obviously among these people. And what's more, there are a lot of texts containing double minus both as a dash and in other qualities.


The principle that markup should be "in contact" with the internal text avoids such accidental transformation of "plain text" into markup text. This principle has not been introduced for all markup rules, but for those in which a single paired character may appear in the "source code" (for example, for strikethrough).


I.e. --which is how it works--, and if -- we do so-- -- the text will not be strikethrough.


It should be noted that this rule, the previous and the next ones are aimed at minimizing the "accidental triggering" of markup processing, i.e. to get rid of situations when some characters in the text are perceived as part of markup.

2.4. Translate string -- completion of some text fragment

Unlike, say, OpenWiki markup, the line break in the "source code" of the Wacko markup is perceived literally as a line break (two line breaks will start a new paragraph). Thus, we accept that a line break is the completion of some finished text, such as a sentence or even a paragraph.


Some Wacko markup rules: rules that change font settings for text (bold, italic, underline, etc.) and headings -- most of these rules only work on a fragment of text limited to line breaks. Using this principle allows you to solve three problems at once:

  • reduce the likelihood of "accidental triggering";
  • to simplify the perception of markup (because the "beginning" and "end" of the selection are not far from each other);
  • to reduce the role of "font style" to the small text fragments formatting -- which has a positive effect on the perception of the result.

2.5. "Broken" markup should not break HTML around text

The above principles allow us to solve one very important task -- to "protect" HTML around the formatted text from possible formatting errors. Thanks to the "sticky" rule, we can almost always determine which part of the markup is the "beginning" and which is the "end" of the markup, as well as insert both HTML tags (or not insert any) at once. For those rules of a Wacko markup which do not have so easily achievable "pairing" all efforts should be undertaken to achieve "tightness" of result. And, as we can see, it has succeeded.

2.6. There must be a way to write exactly what I want to write

Any markup should be able to "escape" -- i.e. to make the markup rules ignored and the text fragment passed through the formatting unchanged. This article is an example of when such "escaping" is necessary.

2.7. Minimum switching of the keyboard layout

Last but not least, no English-speaking author could have had a very important principle, but for Russian-speaking users this principle largely determines the resulting convenience.


The essence of the principle is to select the markup rules in such a way that most of them (especially frequently used rules) can be typed, while remaining in the Russian layout. At the same time, we should not forget about the same accessibility for the English-speaking audience.


Thus, the original reference syntax [[WikiLink Textual description]] found its duplicate ((WikiLink Text on link])), which can be used both in English and Russian layouts. Many wiki markup rules have this valuable property.

3. Markup rules

By applying these principles, we have developed and improved the original set of rules, carefully studying the different wiki engines. If I dared to brag, I would say that the final wiki markup incorporated the best of the different markup rules, remaining slim and simple. Hell, though! As a result of our work, our markup has incorporated the best of all the different markup rules, remaining slim and simple.


This chapter briefly describes how we came up with the most key rules of markup.

3.1. Single-line markup

Single-line markups were all those rules that used symbols that were often used in "plain text" and that concerned some "local" design of the text:

  • bold, cursive, underlining
  • small and monospace text
  • Headings (just meaningless when splitting into lines)
  • links (the text of which is also better to remain "short")
  • degree and indexes (which only work within one word)

3.2. Multiline markup

The rest of the markup rules work on large fragments of strings. This is mainly because their design is often applied to large text fragments.


It's, like, a strikethrough.
Or
a quote that captures the entire paragraph.

3.3. Markup cancellation

Markup cancellation is implemented in two ways:

  • block cancellation with "paired characters" "";
  • escaping (entering a special character before the canceled sequence) with the symbol ~ (tilde).

The first method allows you to disable the markup of a large piece of text relatively easily, and the second is convenient when you want to disable markup for only one or two words or even one "paired character". In addition, tilde escaping also allows us to display double quotes, as in the example above.


If you open !/source of this article, you will find that markup cancellation is very common -- otherwise we would not have been able to give any examples of markup in it's unprocessed form.

3.4. References and images

Wacko markup, automatically turns URLs in the text into the hyper links. If it is clear that the URL points to an "image" (for example, ends with .jpg or .gif) -- an image is inserted into the formatted text using the tag <img>. This is convenient because it allows you to insert links from the clipboard without much thought, getting quite a quality result.


There are two tags that have several variants of use for a link with a description:

  • ((URL Text description))
  • [[URL Text description]] -- same, but less convenient
  • ((Wiki Reference With Spaces == Text Description)) -- allows you to insert spaces in the "left side" of a wiki link (also works in square brackets)

Wiki Link Concept allows you to insert links "inside the site" more easily and the Wacko markup also takes this into account. All links that are not "internal" are immediately made out as "external", the same happens with email addresses.


Descriptions to images are placed very similar to the descriptions to the links:

  • ((picture.jpg Description)) is a way to set a picture title
  • ((URL small_picture.jpg)) is a way to put a link from a picture

3.5. List structures


How is it most natural for a person to write lists, numbered and unnumbered, as well as nested?
**Probably something like this:**
  * element of the list-1
  * element of the list-2
  * sublist
    * element of the sublist-1
    * element of the sublist-2
  * continuation of the list
  
  1. Numbered list one
  2. Second element
    1. as well as with a nested list
    6. Numbering may be confusing


What do you want to get:

  • element of the list-1
  • element of the list-2
  • sublist
    • element of the sublist-1
    • element of the sublist-2
  • continuation of the list

  1. Numbered list one
  2. Second element
    1. as well as with a nested list
    2. Numbering may be confusing

That's how the lists markup rules in a Wacko markup are organized. This example illustrates well why all the bullets in the list are initially shifted to the right by a couple of characters -- it avoids confusion with bold text.


The indent is selected equal to two characters based on the principle of "paired characters", which are almost non-existent in ordinary texts. Some other wiki formats use five to six spaces -- you can imagine what a third nesting level in <textarea> would look like!


The original layout from c2.com uses a tab character, which seems to us very unreasoned: in most interfaces, Tab performs navigation functions, and in some browsers it is impossible to enter a tab character in the input field at all.


Numbering in numbered lists is made automatically, which allows you to easily delete items, add new ones and transfer them to the hierarchy. If you need a numbered list starting with number three, start with 1.#3 Like this.

  1. Like this.

3.6. Table markup

Tables are one of the most difficult parts of any markup. The best solution is a set of rules, the marking of which imitates the vertical grid lines of the table. This is easier than drawing the entire grid with hyphens, pluses, and vertical lines, and at the same time gives a table that is fully readable in the "source code". More information about table markup can be found in Wacko markup syntax. There are also examples.


Unfortunately, table markup is the most "vulnerable" fragment of a Wacko markup, which allows to "break" the structure of the external environment of a page in some cases. This is a small price that we decided to pay for the clarity of the syntax and the speed of its parsing.

4. Markup development

First of all, not all markup rules are described above, only the most important ones. If you are missing something, look at the documentation first for more complete documentation. If it becomes clear that the existing rules are not enough, markup should be extended. This is how most of our markup rules appeared.

4.1. When should we change the existing markup?

Never.


This is the case if we are not talking about introducing new rules, but about changing existing ones. Changing the behavior of markup, which seems useful to you, can significantly disturb other people who are used to working with such markup. Imagine if on some sites <b> would work as <a> and vice versa.


Of course, if there is a rule within your particular project that significantly disturbs you from doing so, then you can think about replacing it. When replacing/removing a rule, think about how it will be accepted by all those who are accustomed to the standard markup!

4.2. When is it worthwhile to enter new markup entities?

As little as possible.


Wacko markup has the concept of "highlighters," which are plug-in entities that can be added easily without taking up the "primary markup space" -- paired characters. The concept of highlighter is described below.


If you simply enter a new "paired character" without thinking this over properly, you will make your users think about it. The fact that they are using a Wacko markup without hesitation has only become possible because the developers have thought long and hard about the consequences of each new rule.


If you want to expand your markup, it is best to contact the authors of this format -- maybe they already have an idea or they will be happy to accept yours. Coordinated action is always more effective.

5. Complex transformations and service functions

The idea of Wacko markup, borrowed from WakkaWiki and extended by the authors, provides two ways to significantly enhance the capabilities of markup and embedding different complex functionality into pages.


These are the two ways that we primarily recommend for those who want to extend their markup rules. Among other things, the use of these methods does not require a detailed parsing of the initial class of the Wacko formatter.

5.1. The concept of Highlighters

Highlighters is a concept that allows the developer to implement a complex logic of converting a block of text, and allows the user to easily transfer this block of text to the design.


In the body of the Wacko markup, the use of the highlighter looks like this %%(html)<div class="test">testdiv</div>...%% -- i.e., the "paired character" %%% is used, indicating in parentheses which highlighting function to pass what is inside the highlighter block.


The developer, writes these functions that take a text variable, make some transformations with it and then return the result.


As a rule, this syntax is used for various text "coloring" -- for highlighting syntax, markup of Chat-logs and emails, insertion of some random blocks of text.


This concept is a way to easily extend the markup, but it is also a way to do so:

  • without using additional "paired characters";
  • without knowing the details of the formatter's work.

5.2. Concept of Actions

Actions is a concept that allows the developer to implement some complex functionality (perhaps even requiring a dialog with the user), and the user to easily provide access to this functionality in the right place in the page.


In the body of the Wacko markup, the use of "action" looks like this {{changes page="/Dev"}} -- "paired character" {{ with the indication of the name of the "action" and additional parameters.


The developer writes these functions in the same way as he writes the "highlighters".


It is with the "actions" in WackoWiki a significant part of the functionality to work with pages (building a directory, lists of recent changes / comments, search) is implemented.


This concept is a way to "embed" some complex functionality into a page, without getting into the details of the Wacko Formatter's work.

5.3. Interwiki -- modification of links markup

There is also a way to extend the links markup syntax by adding additional rules like ((repository:address)) or ((repository:address Link text)). In this case the address is specified as two fragments "storage" and "name in this storage". For example, for LiveJournal the address will look like this: ((lj:mendokusee)), and for the link to the translation of an English word you can write ((lingvo:fascinating)).


This concept (as well as much of our markup) is inherited from the wiki, and was originally intended to point pages to other wiki nodes. As we have noticed, it is also very useful for other, less specific purposes.


However, there are other ways to implement a similar modification. The following markup formats are used on different sites:

  • [[ljuser romochka]] (source)
  • [[xpointUser280 Vladimir Palant]] (source)
  • [[php ereg_replace]] (source)

The interwiki variant (links through a colon) seems more correct to us at least because it is used in many systems and is a de facto standard. Besides, interwiki doesn't cause a conflict with the usual [[Link Description]].

6. Technical details

This article is not a place where you can understand the details of how Wacko Formatter works, however, some of the main points are worth to mention.


First, the formatter itself is built on "regular expressions (namely Perl Compatible Regular Expressions) and works on the principle of "recursive parsing". This means that it tries to break up the text into the "largest" fragments of the markup, and then uses itself to parse within them. This, in turn, means that you cannot put the same markup rules: for example, like this -- **example **nested** markup** will not work. But you have to agree that this is not necessary.


Secondly, the formatter works in several passes for the best protection against "accidental triggering". The first step is a shortened set of rules that allows you to "escape" text fragments, prohibiting marking them. In the same pass all "highlighters" are allocated, thus protecting against overlapping of highlighters by other marking. The second pass of the formatter processes all remaining rules.


Thirdly, the formatter allows to carry out caching of the formatted result. As marking contains the functionality demanding interaction with the user, and also links to pages (which can be processed differently for different users) for caching the "semi-finished result" in which there were only the rules concerning links and "actions". The last formatter pass, the fastest pass of all and needing no recursion, is made when displaying the contents of a page based on this "semi-finished result", and makes link markup and insertion of "actions".


If you are interested in the deeper aspects of formatting -- welcome to code format.

7. Caveats

This article was written to explain how and why the rules of Wacko markup were invented, where the "paired characters" came from, and why we would like to hope that new projects that use the Wacko markup editing format will follow the principles that underlie it.


If you have any questions, don't hesitate to ask them in comments or personally to the authors of the article.


With respect, WackoWiki team and the authors of the article:

8. References

  1. The WackoWiki Project
  2. Formatter for the described markup
  3. Visual Editor for the described markup
  4. WakkaWiki -- historical roots WackoWiki
  5. OpenWiki -- ...another historical roots of Wacko markup syntax...
  6. Description of wiki concepts
  7. Wacko-Markup Syntax
  8. Examples of classic markup on c2.com
  9. WikiName -- The concept of wiki links