Jump to content

Talk:VisualEditor/Design/Software overview

About this board

Table start/table end templates

3
Yair rand (talkcontribs)
Trevor Parscal (WMF) (talkcontribs)

Our goal is to for the answer to be no, it will not be disallowed, because all existing Wikitext should still work. However, the editor will still not know that the top, mid and bottom templates are in any way related, so the user experience will suffer greatly compared to using templates with an encapsulated model. Eventually we want to see the ability for the template call/transclusion user interface to be customizable using on-wiki template parameter definitions and gadget code - but the editor will still need templates to be encapsulated for these user interfaces to make sense to the user.

This post was posted by Trevor Parscal (WMF), but signed as Trevor Parscal.

G.Hagedorn (talkcontribs)

Another note on "Template Encapsulation": Yes I agree that the hierarchical way is preferable, but I can provide (anecdotal) evidence that we attempted this design a few years ago (about mw version 1.14) and run against a wall: Templates were simply not allowed to have that much content by the parser, which was throwing error messages. We ended up with a Key Start / Lead / Key End structure (random example: http://species-id.net/w/index.php?title=Fannia_postica&action=edit ).

This may be one reason why Wikipedia-Table templates, provided some lists generated with them are long, often use the serialized version.

Also, I would like to point out, that for all SMW users, Semantic forms strongly favors the serialized version: multiple records (1:n relations) can be conveniently edited that way. While WMF does not use SMW yet, it is very widely used elsewhere.

Thus, while I agree on the preference for hierarchical display, I wonder whether it is truly impossible to find a solution, a kind of virtual hierarchy. I am thinking of providing an ability to extent existing templates with the ability to call a "start nesting" parser function, which nests all further content until the call of some "end nesting" (or the end of the page). This would greatly smooth transitioning existing templates.

Reply to "Table start/table end templates"
(talkcontribs)

I think you're making some heavy mistakes there. I also was thinking about such an live editor and semantic autoformatting, and instead of starting hacking (OK, I did and rewrote Preprocessor_DOM in javascript) I pondered a lot about parsing and editing. I had loved to join in the hackaton, but I had to learn for my tests.

At first I also thought about a top-down document model, but I fastly came to the conclusion that this is only doable at very, very simple pages. A autoformatter that sees an unclosed table/div/whatever never knows what's hidden in the following templates. A live-parser/autoformatter/semantic lexer has to use a bottom-up model, just like the current parser. Steps would be

  1. Getting the xml-like tag hooks, comments and inclusion handlers (what to do if malformed? Current: run to the end)
  2. Parsing headings, templates and tpl-arguments
  3. expanding templates
  4. parsing wikitexts into tables/blocks/images/whatever and doing text annotations
  5. tidy the generated html for output

The current parser does the first two steps together, semantically they could be divided. I'm not sure about the fourth step, I've not dived into the source code yet so maybe I'm writing nonsense about that.

My conclusion is that a semantic lexer has to start at the bottom, a autoformatter or editing transaction needs to run down from the top (generated result) again. Everthing other would narrow the required syntax possibilities.

Of course, I think its right to have the document-block-annotatedText model as a data format for saving pages with parsing possibilities to html4, html5, pdf, rss etc, for quick-generating cached content and, most of all, for creating diffs. But for editing we will have to go deeper into wikitext, which has to stay as uncomfortable as today, and templates should not be a part of the DOM.

Trevor Parscal (WMF) (talkcontribs)

You have some great points and have clearly thought this out. Most of what you are focusing in on has to do with the parser, so you might want to get involved over here. One thing I will say though is that it's important to remember that there are, and will always be, many edge cases that aren't being addressed. What we hope to do is meet in the middle, between supporting exotic cases and content being reformed. While it may not be reasonable for us to support every imaginable edge case, it is quite reasonable for us to provide alternative solutions to the use cases that are causing the edge cases. With careful consideration and research, these alternative solutions can serve the use case and the editor software equally well. It's important to keep a sense of balance in this work, not diving too deep on edge cases, and also not pretending there are none. Hopefully you can help User:Brion VIBBER and others who are focused on the parser to keep that balance and contribute your expertise.

This post was posted by Trevor Parscal (WMF), but signed as Trevor Parscal.

Reply to "Constraints"

Various output formats

2
P858snake (talkcontribs)
Structured content blocks containing annotated text can provide a way to represent WikiText in a sufficiently abstract manner, allowing WikiText to be parsed, modified and rendered back into WikiText without loss of information, as well as rendered into a variety of formats including a variety of styles of HTML, such as HTML4 or HTML5, a simplified form of HTML for mobile devices, or non HTML formats such as PDF or plain text.

Imho, it would be quite interesting to give users including programs a choice of formats to select from for various purposes, including somewhat selective outputs like sections, the table of contents, ressource description formats, references for quotations (aka current Special:Cite), etc. - For instance, if one wants to quote from an article when writing a paper, one could ask for a section in LaTex or .rtf format, paste it in their work, open a footnote, copy the appropriate BibTeX entry from the page's Special:Cite page, close the footnote, and be done without having to worry about converting formats.

Moving comment from page into LQT discussion (Peachey88).
Trevor Parscal (WMF) (talkcontribs)

This, and many other use cases should be easily supported since the official structure will be in a generic and easy to convert format (we are calling WikiDom, but it's just and ordered map tree that's easily encoded into JSON).

This post was posted by Trevor Parscal (WMF), but signed as Trevor Parscal.

Reply to "Various output formats"
115.152.227.60 (talkcontribs)

What’s a line, and why is it being introduced here? How is a heading limited to one line, while a paragraph is not? There’s no such concept in HTML.

The idea that an element contains an element or content, but not both, is also novel and in need of basic explanation.

Jdforrester (WMF) (talkcontribs)

I believe that the first of these is a basic assumption of wikitext; the second is a modelling assumption.

Catrope (talkcontribs)

The second bit is mostly a modeling assumption for simplicity. It basically means there is a hierarchy of what can contain what:

  • branch nodes can have children that are either branch nodes or content branches, but can't contain content directly. Examples of branch nodes are tables and lists
  • content branches can have children, but those children must be content. Examples of content branches are paragraphs, headings and pre's
  • content nodes "are" content, and can't have children. Examples are text nodes (plain or annotated text), images and br's

This means that some things that are legal in HTML are not legal in our model. For instance, in the HTML that we get, it's common for <li>'s to contain text directly. In our model, that's represented as a list item containing a paragraph containing a text node.

Reply to "“lines”"

Linear model and not DOM?

2
189.35.191.61 (talkcontribs)

I don't undertand why not use DOM model, and why use so complex and non-standard "Linear model".

Catrope (talkcontribs)

The editor uses the linear model internally because it's easier to define transactions on. We could've gotten away with using a DOM model or a DOM-like tree instead, but that would have made a future collaborative editing feature a lot harder to implement. It so happens we do actually use a tree, built from the linear model, for some purposes (including rendering and traversal) because it makes more sense to use a tree for those applications.

We also can't use the input DOM directly because it doesn't have a 1:1 mapping to our conceptual nodes, so there has to be some internal data structure that is different from the input DOM, be it a linear model, a tree, or something else. The points where the 1:1 mapping breaks down are mostly "alien nodes" (things we don't understand and render as an uneditable box; in the DOM, these are usually subtrees or sets of adjacent subtrees rather than a single node) and "meta nodes" (things like categories and magic words; these are <meta>/<link> tags in the DOM, not present in the editor, but still need to be restored in the right place on output).

Reply to "Linear model and not DOM?"
Jdforrester (WMF) (talkcontribs)

Our plan for coping with meta-nodes (i.e., non-positional nodes that make page-level changes, but which nevertheless can appear anywhere in the document):

We will have a Meta Linear Model:

  • (ve.dm.Document).meta=[];
  • Transactions & processing
  • Sparsely-indexed array correlating to offsets in the data array
  • Offsets are maintained with splice; insertion splices into meta array with undefined; deletions splice out meta array data and leave meta elements in place.
Reply to "Coping with meta-nodes"
There are no older topics