Wiktionary:Scribunto

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
Accessories-text-editor.svg This is a Wiktionary policy, guideline or common practices page. This is a draft proposal. It is unofficial, and it is unknown whether it is widely accepted by Wiktionary editors.
Policies: CFI - ELE - BLOCK - REDIR - BOTS - QUOTE - DELETE - NPOV - AXX

Wiktionary supports server-side scripting to generate content for pages, using the Scribunto extension. It is used as a complement to templates, in particular parser functions like {{#if:}}, {{#switch:}} and so on. Scripts are divided into modules located in the Module: namespace, and are written in the Lua programming language.

Getting started[edit]

Here are some helpful links to get you started with Lua and Scribunto.

  • Learning Lua - If you're not yet familiar with the language, or with programming in general, this is a good place to start. This doesn't cover any of the parts that are specific to using Lua within a wiki, it's only "generic" Lua.
  • Scribunto/Lua tutorial - A short tutorial to explain how to use Scribunto/Lua within the wiki.
  • Scribunto/Lua reference manual - A reference manual for Lua as it applies to the Scribunto extension. This also lists Wiki-specific things that don't exist in normal Lua.
  • Official Lua reference manual - A quick reference to the language, for more experienced programmers. Again, this is generic Lua, and does not cover specific details about using it on Wiktionary, but it may have information that the Scribunto-specific manual currently lacks.

Information about using Scribunto on Wiktionary specifically:

How Scribunto interacts with the wiki[edit]

A Scribunto module in itself is really a large function: it runs from top to bottom, and is expected to return a value. Normally, the return value of a module is a list of functions and their names, which can then be called from another module or from "wikispace". However, a module could, in theory, return something other than a list of functions. It could return a table of strings, a table containing other tables, or even a single value. However, only a module that returns a list of functions and their names can be invoked from wikitext; anything else can only be imported and used from within Scribunto itself.

A function in a Scribunto module is called from wikitext as if it were a parser function, using this notation: {{#invoke:moduleName|functionName}} The function then returns wikitext as its output. This wikitext can contain HTML-style wikitext (such as <b> and <table> and so on) and wiki-specific markup (such as '''…''' for bolding and [[…]]), but cannot invoke templates, parser functions, magic words or parser extension tags (for example, something like {{!}} will be interpreted as meaning the literal string {{!}}, rather than be expanded out to |). If a module needs to invoke a template or a parser-function, it has to use special functions for this purpose in the mw namespace. But it is hoped that these functions will not be needed too often.

Since, in order to be useful, the function needs information about the context in which it was invoked, the Scribunto extension will pass it a single argument, customarily named frame. This argument can be used to obtain various key bits of information; in particular, if the parameter paramName=paramValue was passed to the template that is invoking the Scribunto function, then inside the function, frame:getParent().args.paramName will be the string 'paramValue'. (Any numbered/unnamed parameters can be accessed by number; for example, the template's {{{1}}} becomes the module's frame:getParent().args[1].) Even in the absence of the frame parameter, Scribunto code can use getCurrentFrame() to obtain its value; therefore, functions don't actually need to pass frame around to each other.

It is also possible (though not usually necessary) for a template to pass further arguments as part of the invocation, in which case these will be available via frame.args. For example, if the module was invoked using {{#invoke:moduleName|functionName|paramName=paramValue}}, then frame.args.paramName will be the string 'paramValue'. (Numbered/unnamed parameters work analogously.)

Debugging and error reporting[edit]

When Lua encounters an error in a script, it aborts the script and shows "Module error" in large red, clickable text on the page. Click on this text in order to see what caused the error.

A module error also adds the page to Category:Pages with module errors. When writing modules or converting templates, it is a good idea to check this category to see whether any pages that use it are triggering errors. It is also possible to trigger errors yourself, using the following:

error("You forgot to supply a required parameter!")

This can be used to check whether a module and its accompanying template(s) are being used correctly, and to show an error to the user otherwise. It is highly recommended that you use this whenever possible, to make your modules more robust and to make it easier to find mistakes.

While you are working on a script, it may occasionally be useful to generate debug messages so that you can see what is going on at particular points in your script. You can do this with the mw.log function:

mw.log("Testing the script. The value of the variable 'a' is : " .. a)

This function will output its argument to the Scribunto debug console if you run the module in the debug console, e.g. by typing p.main() if the function you want to run is named main and takes no arguments. It automatically adds a newline to the end of the message.

The function os.clock can be used for simple benchmarking of a given function. It can be used like this:

function p.foo(frame)
    local start = os.clock()
    -- do whatever the function needs to do here
    mw.log("Function took " .. os.clock() - start .. " seconds.")
    -- return
end

"Frame" and "parent frame"[edit]

There are actually two ways that values can be passed to a Scribunto module. The first is the one shown above, in which the values are passed as parameters directly to the module invocation. So for example, if there is a Lua function LanguageData.getLangName that generates (say) English for en, a template {{langname}} will invoke and pass on the arguments, and other pages will access this function by writing (e.g.) {{langname|en}}. With this approach, the Lua function needs to access the arguments that were passed to #invoke; to that end, it might be written like this:

{{#invoke:LanguageData|getLangName|{{{1}}}}}
function LanguageData.getLangName(frame)
    local args = frame.args
    local langCode = args[1]
    local langName = ... -- some code to determine langName
    return langName
end

However, there is another way, which is recommended because it is fastersource. Every module also has access to its so-called "parent frame", which contains the collection of arguments passed not to the module, but to the template that called it. So rather than invoking the module and pass the values on explicitly, the module is invoked with no parameters. The module itself can access the parameters that were passed to the template, using the parent frame. The example above would then be written like this:

{{#invoke:LanguageData|getLangName}}
function LanguageData.getLangName(frame)
    local args = frame:getParent().args
    local langCode = args[1]
    local langName = ... -- some code to determine langName
    return langName
end

As you can see, the only real difference is the use of frame:getParent().args to get the arguments of the "parent frame" (i.e., the template-call), rather than the arguments of the module invocation itself.

It's possible to write a function that supports both approaches (by using invokeArgs[1] or templateArgs[1]). This may occasionally be useful for simple functions that can be called more or less intuitively from a template as well as another module. But for more complicated functions, it's better to write the "main" code in one function, and write another function that can be invoked from a template, which then gathers the parameters and calls the main function.

Note that an empty parameter passed on from a template "counts"; i.e. the template call {{MyTemplate||MySecondArgument}} will lead to the related condition

if args[1] then
-- <do something>
end

to be satisfied, as an empty string is interpreted as true. The code

if args[1] and args[1] ~= '' then
-- <do something>
end

on the other hand, will only respond to a non-empty first argument.

Efficiency[edit]

The efficiency of Lua can be checked through the template preview feature: After pressing "preview", right click on the preview of the page and request the page source. In the page source, search for "NewPP" in order to see how much time the execution of the Lua module took (example: Lua time usage: 0.004s). Search for "served" in order to see how long time it took to render the entire page (example: Served by mw1035 in 0.498 secs). This latter time can be used to compare with how long it takes to render the same page with or without Lua modules.

Unicode[edit]

Lua itself does not understand Unicode; whereas there are more than a million possible Unicode characters, a "string" in Lua is just a sequence of bytes in the range 0–255. (Unfortunately, the Lua documentation refers to these bytes as "characters", but don't be deceived.)

To address this lack, the Scribunto extension does (at least) four things for us:

  • whenever any text is passed into a Lua module (e.g., as a template parameter), the original character-string is transformed into a byte-string using UTF-8. UTF-8 is a variable-width encoding: some Unicode characters are transformed into just a single byte, while others are transformed into two, three, or four bytes.
  • the text returned by a Lua module is interpreted as UTF-8, and transformed back into a Unicode-character string. This means, for example, that if a module receives a bit of text and returns it unmodified, then all will be well. (Technical notes: (1) In the event that the string passed back from Lua is not valid UTF-8, invalid sequences will be replaced by U+FFFD. (2) In addition to being UTF-8-decoded, the returned string is also passed through the NFC normalizing transformation. This was discovered by testing, not by digging in the code, so there may be some details missing here.)
  • the source-code of a Scribunto module is encoded using UTF-8, so we can use Unicode characters inside Lua string literals.
  • the Scribunto extension includes a mw.ustring ("Unicode string") module, which is always available. This module provides UTF-8-aware analogues of Lua's built-in string functions. In essence, the functions in this module allow you to operate on a UTF-8-encoded byte-string as though it were still the original Unicode character-string.

Even so, when using the mw.ustring library, there are some caveats that you need to pay attention to. Although the library is capable of interpreting a sequence of several bytes as a single Unicode character, there may still be more than one Unicode character in a single logical character. For example, although я́ appears to us as a single logical character, it is really encoded as two distinct Unicode characters: the Cyrillic letter я (U+044F) followed by a combining acute accent (U+0301). Therefore, the code mw.ustring.len("я́") will actually return 2, not 1. More subtly, the following will also return a valid result: mw.ustring.find("я","[я́]"). This happens because the character class given in the second argument to the mw.ustring.find function actually contains two characters; the function treats the base letter and the accent as separate characters, and searches for them individually, finding one of them.

Organizing Lua modules[edit]

Document Lua modules on a /documentation subpage. The documentation will appear at the top of the module page.

Categories cannot be entered into modules directly. Put a category on the documentation page, separated by <includeonly>:

<includeonly>
[[Category:Ukrainian language]]
[[Category:Transliteration modules]]