User:Robert Ullmann/t14

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Tbot creates new entries from the translations tables it is updating.

Note that the concept of "translation" as commonly understood is rarely correct. The naïve question "What is the word in X for Y?" is unanswerable in the general case, languages don't work that way. English speak, talk, chat, gossip "translate" to Swahili kunena, kusema, ongea, but not one-to-one. Almost all combinations of those are possible. (All but gossip to kunena which would never be correct). Then there is announce, proclaim, declaim; and further enunciate and so on.

Even technical terms that one would think are 1-to-1 are often not. (English high voltage, French haute tension, but English high tension)

What we call "Translations" is, and can only be, "words that correspond in some way in the other language."

That said, it is usually useful to create an entry for the FL word, "defining" it as the English word, with the translations gloss. Most of the results are useful, if not entirely correct. Some are amusing, see the original version of אסלה; apparently throne appears in the database before toilet. Even so, the entry as Tbot created it does give you the definition you wanted. (Along with the humour.)

Languages

[edit]

The languages that Tbot will create entries for are controlled by the existence of the language category, for example Category:Tbot entries (French). Tbot only creates entries until the category is "full", and then will not until someone checks one or more entries and removes the {{tbot entry}} tag.

The limit for each category is set by the limit= parameter in the boilerplate text template for these cats, {{tbotcatboiler}}. The limit may be set to zero (one doesn't want to delete the cat if there are entries).

Some languages may have more reliable translations than others; it may be necessary to set limit=0 for a very unreliable language. The initial set of languages I picked was based mostly on what I think active editors are interested in (including Swahili and Kinyarwanda, with limit=1000 because I want to go through all of them. Some dozen people have already looked at a few.

Scripts

[edit]

Tbot recognizes words in various scripts, and adds them to {t} and {infl} when creating an entry. The scripts it knows now are Greek, Cyrllic, Armenian, Hebrew, Syriac, Arabic, Devanagari, Bengali, Georgian, and all the CJKV scripts and variants (including Han Extension B on plane 2). For Arabic, it uses fa-Arab, ur-Arab, and pa-Arab as appropriate, also Hayeren for Armenian, and polytonic for Ancient Greek.

Issues, current status

[edit]

(a number of the restrictions are temporary, as a starting point)

  • Tbot only updates entries that already have {t} templates, or have section references or explicit FL.wikt references that would be good to fix.
  • It only converts a line to use {t} if the (local) word exists, or if it can create it.
  • It doesn't add the template or create an entry if the language is not in the set of 170 that have wikts.
  • It doesn't add language sections to existing entries; this makes it easier to remove bad entries.
  • It only recognizes transliterations for Cyrillic, Arabic, a few in Hebrew etc. It doesn't do this yet for things like Kanji and Hanzi (although it does for kana). If it can't recognize the transliteration, it won't modify the line because it can't know whether the text in parenthesis is the transliteration or a qualifier.

Also see User:Tbot/tbot entry.