Wiktionary:Beer parlour

Definition from Wiktionary, the free dictionary
(Redirected from Wiktionary:BP)
Jump to: navigation, search

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives edit


May 2016


Is there any particular reason why the snowclones are in appendices? I would have though they fit the main namespace nicely, except the "X" and "Y" in the page titles are a little odd.

For reference, here are all the current snowclone pages:

Note: I edited all the snowclone pages to make them use the normal entry layout. For example, many had a "Origin" section which I renamed to "Etymology". Example diff: link. --Daniel Carrero (talk) 04:40, 2 May 2016 (UTC)

It's precisely because of the X's (and Y's and Z's) that they are in the appendix. --WikiTiki89 17:00, 2 May 2016 (UTC)
Who would look up I am (something), hear me (do something)? I would like to see how this could be made into something that was demonstrably useful. If not, the Appendix is fine. DCDuring TALK 17:27, 2 May 2016 (UTC)


Previous discussion: User talk:Romanophile#Latin

Can we add definitions to forms of various words. This would allow definitions of words to be more easily accesible, especially when the internet is slower. Bman1230 (talk)

We already give definitions to forms of words. For example, chairs is defined as the plural of chair. —CodeCat 18:54, 2 May 2016 (UTC)

I mean include some info on derived terms page. Generally if someone knows what a chair is, they dont need to look up the definition for chairs. /n Maybe something like this. (Imagine better formatting) Bman1230 (talk)

That would duplicate all the information on every form-of page, and it would be a nightmare to maintain. —CodeCat 21:39, 2 May 2016 (UTC)
I was thinking maybe there would be some way to load info from the other page, it would probably have to be changed by wiktionary. Bman1230 (talk)
It would be "easy" enough to substitute plural forms for the singular forms in the definitions, but one would also have to make sure that there was number agreement with any verbs, pronouns, or other nouns that required it. Either AI or extensive tagging would be required, I think. DCDuring TALK 23:35, 2 May 2016 (UTC)
To me it seems like in the majority of cases "An item of furniture used to sit on or in comprising a seat, legs, back, and sometimes arm rests, for use by one person." is a more useful definition than "Plural of chair" despite the disagreement of number"Bman1230 (talk)

would a mouseover system like this work?Bman1230 (talk)



  1. plural of chair
Not as text. Someone would have to enter the text, and any edit to the entry for chair would make the two entries different. I suppose you could transclude the entry at chair into the title tag, but there are limits to how much you can put in a title tag (I only see up to the first half of item 6 in your example). There are also all kinds of complications such as multiple etymologies (wound and winded are past tense forms of different words spelled as wind and wounded is the past tense of wound- but not of the past tense of wind) that would mean you would have to set things up carefully in the main entry so that only the relevant part would be transcluded- which would be subject to getting fouled up whenever someone rearranges the main entry
The Achilles' heel of anything you could come up with is the inability to predict or control what is done to either or both entries after you've set everything up- people add, delete, rearrange and otherwise mess with just about every aspect of entries all the time. Multiply that times thousands and thousands of main entries, and it quickly becomes impossible to maintain. Chuck Entz (talk) 02:15, 3 May 2016 (UTC)


FYI, I created Appendix:Repetition. It feels to me that this is a concept with verifiable semantic value, like Appendix:Capital letter. That's why I formatted it like an entry, too. --Daniel Carrero (talk) 19:47, 3 May 2016 (UTC)

In linguistics, reduplication is considered a kind of affix, but its meanings are language-specific, not translingual. In Indonesian, for example, reduplication is used to form plurals. In some languages, it's used to indicate a diminutive; in others, a wide variety or a large number. In older Indo-European languages (and Proto-Indo-European itself) reduplication of the initial consonant or consonant cluster of a verb root is used to form the present stem of some verbs as well as the perfect stem of most verbs that have a perfect stem. I like the idea of having an entry for the reduplication morpheme, but I doubt it should have a Translingual section. —Aɴɢʀ (talk) 21:11, 3 May 2016 (UTC)
Also, don't confuse reduplication with lengthening. Daaaaaniel is an example of lengthening of the vowel, which in writing is indicated by repeating a letter. --WikiTiki89 01:59, 4 May 2016 (UTC)
Disclaimer: This is just a first draft, I wouldn't mind having a separate language section for plural-forming use in Indonesian that @Angr mentioned, or explaining any better the difference between lengthening of the vowel and actual reduplication that @Wikitiki89 mentioned. --Daniel Carrero (talk) 03:37, 4 May 2016 (UTC)

Tagging entries missing a headword template[edit]

There are still some entries without a headword template that linger around here and there. I would like to tag these for cleanup, using the template {{rfc-head}}. For the bot, I'll use a relatively simple heuristic to determine if a part-of-speech section is missing a template. If the header is not immediately followed, on the start of the next line, by any template, then insert the cleanup template at the start of that line. This may result in false negatives (entries without a headword template that aren't tagged) but it shouldn't give any false positives that I can think of.

To determine which headers are part-of-speech headers, I'll use a precompiled list of all the headers I've come across in a dump, and (painstakingly) split between POS and non-POS headers. If a POS header is used erroneously, like say a Verb header where it's used as something other than a POS header, it will also be tagged as missing a template (false positive), but since we are presumably going to fix the tagged entries manually, the erroneous usage will be noticed during this process.

The bot run will not actually fix any entries, but it should be relatively easy to fix some of the entries with a bot, once they are tagged. —CodeCat 21:07, 5 May 2016 (UTC)

It wouldn't surprise me to find a large number of entries with a file ("[[File:" or "[[Image:") on the line right after the POS header, before the headword template. I consider this suboptimal, but it'd be a false positive in your work, I think. Other than that, this sounds like a good idea. Many other entries put {{wikipedia}} after the POS header, before the headword template, but this won't affect the script you describe. - -sche (discuss) 05:31, 6 May 2016 (UTC)
On a related note, there are a large number of Latvian adjectives with two headword-template lines but only one POS header, as described here, complete with proposed bot fix. - -sche (discuss) 05:31, 6 May 2016 (UTC)
Similar careful logic could be used to revert the erroneous mass insertion of {{l}} with inappropriate language codes. DCDuring TALK 10:57, 6 May 2016 (UTC)
I oppose such tagging since they are fairly easy to identify for anyone being serious about making the list shorter. I generally oppose tagging without fixing, especially in huge volumes. --Dan Polansky (talk) 08:19, 7 May 2016 (UTC)
If done, I suggest it is done "quietly", e.g. adding entries to a category but not having red warning text showing up in the entry. Equinox 11:25, 7 May 2016 (UTC)
I fully support the proposal, with the suggestion that it'd be nice if the script automatically recognized [[File: and [[Image: if possible. --Daniel Carrero (talk) 23:26, 7 May 2016 (UTC)


vCat is a tool on wmflabs that can graphically display category structures. More specifically it can show all ancestor categories (that is, parents and parents' parents and so on) or all descendants. We could have two links for the generated parents and children images on category pages.

Look what it generates for ka:Sports - descendants, Georgian language - ancestors.

I do not have very strong opinion about this but others may have. --Giorgi Eufshi (talk) 09:45, 6 May 2016 (UTC)

Gothic words that are attested only in Runic inscriptions[edit]

Was wondering whether a Runic or Gothic script entry should be created for Gothic words that are attested only in Runic inscriptions. There are not many words of this sort which have one mostly undisputed reading, but there certainly are some. When I added ᚱᚨᚾᛃᚨ ‎(ranja) a while ago I had not yet found any others of this sort, so I went ahead and added the entry in Runic script, the word only being attested in that script. However, recently I came across 𐌷𐌰𐌹𐌻𐌰𐌲𐍃 ‎(hailags), which is also attested only in one Runic inscription (Wulfila, interestingly, prefers 𐍅𐌴𐌹𐌷𐍃 ‎(weihs) to mean holy - if anyone knows why, let me know!) but was added by another user in the Gothic script. Which are preferable here, Gothic or Runic lemmata? Would be interested to hear your thoughts. — Kleio (t · c) 17:38, 11 May 2016 (UTC)

Anything that's attested can be added, so there's no preference. However, Gothic doesn't currently have Runic listed as one of its scripts, so autodetection won't work. —CodeCat 17:47, 11 May 2016 (UTC)
It does now. --WikiTiki89 17:58, 11 May 2016 (UTC)
Good call! — Kleio (t · c) 18:15, 11 May 2016 (UTC)
But if there is no preference, could you not end up with two lemma entries, or, well, situations like this, where it is inconsistent and feels messy? Seems to me it is best to settle for one lemma (imo for the attested script), with, if the attestation is only in a rare script like Runic, a redirect in the Gothic script à la the romanization of redirects we have for most regular Gothic entries. — Kleio (t · c) 18:15, 11 May 2016 (UTC)
When there are multiple lemmas that represent the same basic word, we call them alternative forms. So we can call the runic lemma an alternative form of the gothic lemma. —CodeCat 18:26, 11 May 2016 (UTC)
But the Gothic script forms are not attested in these listed cases. Should there not, in any case, be a consistent approach to what form in these cases (got words only attested in Runic) should be the main entry, and which should be considered the alternative form? Because if I understand you correctly, whoever creates the lemma for this kind of Runic-only Gothic word would decide whether to make the main entry a Runic or Gothic script one, and the other script would then be considered the alternative form. We would, then, have 𐍂𐌰𐌽𐌾𐌰 ‎(ranja) as an alternative form of the lemma ᚱᚨᚾᛃᚨ ‎(ranja), and ᚺᚨᛁᛚᚨᚷᛊ ‎(hailags) as an alternative form of the lemma 𐌷𐌰𐌹𐌻𐌰𐌲𐍃 ‎(hailags) - a completely opposite way of dealing with this, despite them being both attested only in Runic. I might be a bit OCD about this, but it just seems messy to leave that up to an arbitrary decision by the editor, instead of having a consistent approach. — Kleio (t · c) 18:44, 11 May 2016 (UTC)
  • Chiming in from the sidelines, I agree that messy and inconsistent is undesirable. My 2p would be to have all Gothic lemmata in the Latin script, with notes in the etymologies to indicate if a given term is only attested in Runic. Gothic entries in the Runic script would all be soft redirects to the Latin-script entries, much as we have for pinyin entries for Chinese, or romaji entries for Japanese, etc. We have a possible analogous precedent in the handling of Pali entries, which historically were not written in the Latin script until relatively recently, but for which (I think) all EN WT lemmata are given in the Latin script. ‑‑ Eiríkr Útlendi │Tala við mig 18:55, 11 May 2016 (UTC)
  • Why should we do it for this one non-Latin language and not for all others? Korn [kʰũːɘ̃n] (talk) 19:45, 11 May 2016 (UTC)
  • Admittedly, most contemporary students of Gothic and indeed virtually all books published on Gothic use the Latin script, unlike for example Ancient Greek where the Greek alphabet is consistently used. See also this discussion and the past votes that are linked there. Generally speaking though, the issue of Latin script usage, romanization and so forth seems to be a bit of a mess. — Kleio (t · c) 20:01, 11 May 2016 (UTC)
  • @Korn [kʰũːɘ̃n] -- if, by "this", you mean "we should use the Latin script for all lemmata, for all languages", that presents some real organizational difficulties for some languages. For Japanese, have a look at [[せい]] (sei) -- the sheer number of homophones is overwhelming, and our entry doesn't even include all of the entries applicable to this phonetic rendering. Were we to move all of the applicable lemmata to a single heading under [[sei#Japanese]], we would have a substantial challenge in organizing and presenting all of this information in a useful fashion. This is a big part of why Japan has not retired kanji, the borrowed Chinese characters used in writing: these provide much-needed disambiguation. Written and spoken Japanese can be quite different in terms of style and vocabulary, largely because of the writing system. ‑‑ Eiríkr Útlendi │Tala við mig 01:02, 12 May 2016 (UTC)
  • We should record all languages in the language that they are printed in. That is, for Gothic, the Latin script. I see no reason we should be messing around with manuscript traditions at all.--Prosfilaes (talk) 08:08, 12 May 2016 (UTC)
  • Because that is what the Goths wrote their language in and because we rely on primary sources where we can rather than copies from non-native speakers who take editorial liberty. And I'm not saying we should Latin for all lemmas, I'm saying when something gets special treatment it needs a two way justification for why it is done in this of all cases and why it is not done in the others. Korn [kʰũːɘ̃n] (talk) 13:33, 12 May 2016 (UTC)
  • The Gothic script is what Bishop Ulfilas wrote Gothic in, at least. We should rely on printed copies because printed copies are normalized script-wise and can be reliably transcribed and searched for, whereas
    Pepys' Diary
    Pepys' diary as-is can't, and because printed copies are what the users of Wiktionary are going to be using, not the manuscripts.--Prosfilaes (talk) 07:51, 13 May 2016 (UTC)
  • Ad absurdum: If non-native speakers of Russian would start using exclusively romanised versions of Russian books, would that affect your stance on where to lemmatise Russian? (Assuming that it wouldn't won't, that would be the point for giving the argument for the exception that I asked for.) Korn [kʰũːɘ̃n] (talk) 19:20, 14 May 2016 (UTC)
ps.: Shorthand diary writing is not meant for consumption by others. I hope we can all agree that our decisions on writing should be made on the basis of those texts which were actually meant to be read by people. Korn [kʰũːɘ̃n] (talk) 19:22, 14 May 2016 (UTC)
pps.: We have romanisations. The argument to be made is why they should be lemmas rather than links to Gothic as was used by Goths. Korn [kʰũːɘ̃n] (talk) 08:22, 15 May 2016 (UTC)
  • If the speakers of Russian started using exclusively Latin script for reading Russian, we should record Russian in the Latin script. If Russian in Cyrillic script was found only in museums, and most universities had only examples of Russian in Latin script, then yes, we should record Russian in the Latin script. All of the speakers of Gothic who might use Wiktionary, including all the non-existent native speakers, use Latin to write it.
  • I think editorial liberty is a red herring. Latin transcription of Gothic is letter for letter. Whether we cite Gothic in Latin script or Gothic script makes no difference to the accuracy of the original. Whether or not a text is meant to be read by other people doesn't change the fact that bringing Pepys' Diary in a Latin script changes it more than transliterating any Gothic work.--Prosfilaes (talk) 06:59, 17 May 2016 (UTC)

French capital letters with diacritics[edit]

Is diff correct? If so, the usage note should be changed or removed from all entries; there's no sense changing just one entry. - -sche (discuss) 22:14, 11 May 2016 (UTC)

The usage note should be kept, but modified to provide more information. I don't know the details, but I'm pretty sure the practice varies by country and that the common practice in France of dropping accents on capitals originated from typewriters not having keys for the accented capitals. I wouldn't be surprised if the Academy recommends keeping the accents. --WikiTiki89 22:32, 11 May 2016 (UTC)
It is a general typographic rule to keep the diacritics on capital letters. Some bad newspaper title errors arose from the lack of diacritics, e.g. « UN POLICIER TUE »: is it tue (" A policeman kills") or tué ("A policeman killed")? It only remains a common issue because current French keyboards can't write capitalized letters with diacritics easily (at least on Windows), and some people can't be bothered to learn how to use diacritics properly. — Dakdada 09:25, 12 May 2016 (UTC)
I'll also add that when I started taking French in middle school, our French teacher actually taught us that we were supposed to drop the diacritic on capital letters (although, I personally found it illogical and refused to comply). --WikiTiki89 15:02, 12 May 2016 (UTC)
@Wikitiki89 My own high school French teacher had been telling students that diacritics were optional on capitals. I didn't believe it either. Hillcrest98 (talk) 23:33, 15 May 2016 (UTC)
I've read a novel or two in French that did not contain a single accent on a capital letter, so it's certainly optional. The only diacritic that isn't is the cedilla, and even that gets dropped sometimes. Andrew Sheedy (talk) 04:40, 16 May 2016 (UTC)
Before the age of computers and Unicode fonts, it was usual practice that French Canadian retained diacritics on all-caps, while European French preferred caps without diacritics. I have not kept up with trends since 2000. —Stephen (Talk) 09:17, 22 May 2016 (UTC)
Not only French? It was said for a long time for Spanish, but I checked the RAE recommendation: only acronyms and siglation don't take graph acc point 7, at the end. Sobreira (talk) 09:31, 27 May 2016 (UTC)

Creating standards for GML[edit]

Over here, During said out that I shouldn't just decide to put the lemma of Middle Low German words on their attested rather than normalised form without discussion. To my knowledge, I'm the only current editor of Middle Low German, so discussion honestly didn't occur to me, despite my Wiktionariandom. So here it is, go forth and discuss
The marking of umlauts happens in the early MLG period with ø, y and slashed u (see also this question), as well as digraphs, but for the longest part of the period is so overwhelmingly absent that the leading authority of the 19th century (Lübben) was stoutly convinced that umlaut didn't occur in the language. The next standard work on the language (Lasch) does prove him wrong, but points out that "ü is hardly/likely not, ö rarely to be taken as an umlaut" ("ü ist wohl kaum..."), and the examples she gives for ö are actually spelled oe, without superscript. So following our conventions for Latin, I figured to change the lemma from e.g. vögen to vogen. Korn [kʰũːɘ̃n] (talk) 23:15, 11 May 2016 (UTC)

Just to make it clear: you would have the lemma entry be at [[vogen]], but with the inflection line and conjugation table on that page showing vögen. Would [[vögen]] be an alternative form entry for vogen? DCDuring TALK 00:42, 12 May 2016 (UTC)
You understood me correct, yes. The circumflex and the trema are a modern scholarly annotation for clarity, standardly applied like macron to Latin texts. I would say that anything attestable can be an alternative form, anything unattested can not. From what I understand from the grammars (I don't have access to corpora myself), the rendition of an umlaut as Ö and Ü was generally unknown in the period, though, and can be expected to be unattestable. Korn [kʰũːɘ̃n] (talk) 13:30, 12 May 2016 (UTC)
Hence your analogizing to how we handle Latin macrons etc. rather than to how we handle German tremas, which is what I thought the analogy would be. If I actually knew anything substantive in this area, I probably would not have resorted to procedure. I suppose a discussion, preferably with more than just the two of us participating, my contribution being minimal, would be good for Wiktionary talk:About Middle Low German to memorialize the decision. DCDuring TALK 15:09, 12 May 2016 (UTC)
I support the lemmas being the non-diacriticized forms. I wish we would do this with Ancient Greek as well. --WikiTiki89 15:15, 12 May 2016 (UTC)
It's this way too in many cases for Middle English, where "u" ( = /y/), is nowadays oftentimes written "ü" to clarify the pronunciation, but "ü" was never actually used in Middle English orthography (tmk). Middle English "u" could also represent /u/. I would support the same for gml Leasnam (talk) 15:39, 13 May 2016 (UTC)

I am copying this debate to Wiktionary talk:About Middle Low German. Please give further input there. Korn [kʰũːɘ̃n] (talk) 19:27, 14 May 2016 (UTC)

Standard layout of adjective tables?[edit]

For inflection tables of languages with cases, our normal practice is for the singular to appear in one column, then the plural in another column to the right. For languages with a dual, like Slovene, there are three columns. But when a language also has gendered adjectives, there are two dimensions to the table: number and gender, giving a total of 9 combinations in the case of dual languages, and 6 for most others. There's different ways to make the layout of adjective inflection tables in this case:

  • Put them all into one row, singular in all genders first, then plural in all genders. This is what we use for Latin, Russian, Polish, German and it tends to get rather wide especially in the Latin case. Not really good for mobile.
  • Put them all into one row, masculine in all numbers first, then feminine in all numbers, etc. This is apparently what we use for Proto-Germanic. The downside to this layout is that it doesn't keep singular and plural forms together, which often resemble each other in different genders.
  • Have number distinguished by row, and gender by column. This is used for Serbo-Croatian and Slovene.
  • Have gender distinguished by row, and number by column. This is essentially three noun inflection tables stacked on top of each other, so it's more consistent in that way. However, for languages with two numbers, you end up with 3 rows and 2 columns, which gives a rather tall table without using the width much. This is definitely the best layout for mobile though, for this reason.

My question is whether there is a preferred layout for these. Specifically, what should be used for the Proto-Indo-European adjective table I intend to develop? PIE has three numbers and three genders, so it would be either a 3x3 table or a table with 9 (!) columns. —CodeCat 20:54, 13 May 2016 (UTC)

I wonder if there's a way to create tables like these with some CSS magic so that the ultimate layout is determined by the browser based on window size. Anyway, I think without space considerations, having cases on one side and everything else on the other makes the most sense to me, because each number/gender combination can sort of be interpreted as its own sub-lemma with its own declension. --WikiTiki89 21:20, 13 May 2016 (UTC)
But would the columns be the numbers, or the genders? —CodeCat 21:29, 13 May 2016 (UTC)
To clarify, if screen space is not a problem, then I would prefer if cases were rows and numbers and genders were columns. If that makes the table too wide, then you can make the numbers sort of independent tables like for Serbo-Croatian and Slovene. --WikiTiki89 21:38, 13 May 2016 (UTC)
So you think the columns should be the genders, then? —CodeCat 22:04, 13 May 2016 (UTC)
Not necessarily. I think the columns should be gender/number combinations. How those combinations are arranged depends on the particular set of combinations that a language has. --WikiTiki89 22:08, 13 May 2016 (UTC)
There's 9 combinations for PIE... —CodeCat 22:27, 13 May 2016 (UTC)
First of all, there should be no standard layout, since different languages have different considerations, and there are differences in lexicographic traditions and educational standards. The community of editors for a specific language should be allowed to decide on the layout that's best for their languages. Consider also that, once you leave the more archaic post-Anatolian Indo-European languages, you're not going to be able to use any of this consistently across languages.
Now, as far as Indo-European, you have no strong lexicographic traditions to deal with, so you can start from scratch. I think your sorting should be gender first, subdivided by number:
  1. The PIE genders are more like separate declension classes, coming as they do from different origins.
  2. Morphologically, gender morphemes (if you can call them that) tend to occur between the root and the number endings, which makes them more basic- you have a derived stem, to which all the number endings are added.
  3. Semantically, gender is more closely tied to the identity of the referent: numbers can be changed by combining referents in groups, but each of those referents will still have the same gender. That makes gender a more basic category: when you're talking about something you will use the same gender for it in all the different forms you use to talk about it, so you will want to have all the forms for that gender in the same place (of course, something feminine can become grammatically masculine by being grouped with something masculine, but it's still intrinsically feminine).
Of course there are also aspects where number is more basic, and there might be some way to minimize or hide the dual columns if they're next to each other, which would save space in the basic display (the dual number is rather secondary, in many ways, and has mostly disappeared in the daughter languages).
As for horizontal vs vertical arrangements: I wonder if there's any way to get the groups of columns to wrap as a block instead of by line? That is, the neuter singular, dual and plural, with all of their cases, would be a separate table that would move below the tables for the masculine and feminine with their respective numbers and cases if the page wasn't wide enough, and the feminine table would move between the masculine and the neuter if the page was only wide enough for one gender block. If you could do that, you would have the both the horizontal arrangement for wide screens and the vertical arrangement for narrow screens. In the vertical configuration, it would be somewhat like the arrangement at Sanskrit विशाल ‎(viśāla), to give an arbitrary example. Chuck Entz (talk) 03:05, 14 May 2016 (UTC)
I cast my lot with "let the editors decide". Korn [kʰũːɘ̃n] (talk) 19:46, 14 May 2016 (UTC)
I am an editor. I'm asking others to decide. :p —CodeCat 19:54, 14 May 2016 (UTC)
Wir sind das Editor! Wir sind das Editor! - Rephrasing my stance: I have no problem with a non-uniform layout across different languages and thus think discussion of the respective tables should take place in the communities of those actually working on the language. Coming from a Germanic language, for Germanic languages I prefer this way: With a uniform plural it should be case by row, rest by column. That is male/neuter/female/plural × cases ordered NAGD. Languages with gendered plural should follow the same pattern but have two separate tables for singular/plural. Korn [kʰũːɘ̃n] (talk) 08:38, 15 May 2016 (UTC)
The most commonly spoken languages (e.g. German) have a single plural form for all genders, so they need 4 columns, not 9. How many languages need more than 4 columns? And how many words do we currently cover in such languages? Are the weird cases few or many? -- LA2 (talk) 19:35, 15 May 2016 (UTC)
Many Slavic languages preserve distinct plural forms for each of the genders. Slovene is an extreme case because it has dual forms too, so it needs 9 columns (see dober), but Serbo-Croatian is much more widely spoken and needs 6 columns (dobar). The two Baltic languages have no more neuter gender, but the plurals of the two genders remain distinct, so there are 4 columns (geras). Icelandic, a Germanic language, does not have full syncretism of the genders in plural, so 6 columns remain necessary (góður). —CodeCat 20:36, 15 May 2016 (UTC)


I edited the entry my to replace the small list of 4 senses with a link to the more complete list of senses at Appendix:Possessive. See diff. What do you think?

First, I moved User:Msh210/English possessives to Appendix:Possessive and edited the page further. Previous discussions: Wiktionary:Tea room/2011/April#English possessives, Wiktionary:Beer parlour/2016/January#English possessives. What do you think, @msh210?

I'd like to do the same for other entries at some point, like your, etc. --Daniel Carrero (talk) 02:17, 14 May 2016 (UTC)

The Appendix should be a supplement not a replacement. I'd simply revert the elimination of the definitions and add a reference to the Appendix under See also. DCDuring TALK 11:57, 14 May 2016 (UTC)
Whoa. "A possessive used before a noun" doesn't distinguish my from other possessives like your and her. Equinox 11:42, 15 May 2016 (UTC)
I reverted my edit. --Daniel Carrero (talk) 18:26, 16 May 2016 (UTC)

Surprising Homographs?[edit]

As I was explaining our redirection policy earlier, I used my favorite example of a word that has the same spelling, but is completely unrelated in any way: Indonesian air, which means water. I thought it might be fun to come up with a list of these to use in our documentation. I found a few others to start the list. Can anyone think of more?

  1. air ‎(water) (Indonesian)
  2. ball ‎(organ) (Irish)
  3. beach ‎(bee) (Irish)
  4. bean ‎(woman) (Irish)
  5. fear ‎(man) (Irish)
  6. here ‎(testicle) (Hungarian)
  7. millet ‎(nation) (Turkish)
  8. take ‎(bamboo) (Japanese- Romanization of たけ)
  9. pint ‎(penis) (Low German)
  10. teach ‎(house) (Irish)

Different capitalization and place names:

  1. Gift ‎(poison) (German)
  2. Lizard ‎(peninsula in Cornwall) (English)
  3. Mist ‎(manure) (German)
  4. Speck ‎(bacon) (German)
  5. Split ‎(city in Croatia) (Serbo-Croatian, English)
  6. Sexmoan

Feel free to edit/add to my list. Thanks! Chuck Entz (talk) 05:41, 16 May 2016 (UTC)

  • ball ‎(organ)
  • bean ‎(woman)
  • teach ‎(house)

(All Irish)

  • Gift ‎(poison)
  • Mist ‎(manure)

(Both German) --Catsidhe (verba, facta) 06:28, 16 May 2016 (UTC)

I've added those to my list, with the German ones in a separate list for different capitalizations. @Korn: Thanks for pint. There are traces of the same word in English: see cuckoopint. Chuck Entz (talk) 12:46, 16 May 2016 (UTC)
Wiktionary:Foreign Word of the_Day/2013/April#2, Wiktionary:Foreign Word of the_Day/2013/July#15, Wiktionary:Foreign Word of the_Day/2013/October#26, Wiktionary:Foreign Word of the_Day/2014/April#5, Wiktionary:Foreign Word of the_Day/2014/September#2, Wiktionary:Foreign Word of the_Day/2015/April#3, Wiktionary:Foreign Word of the_Day/2015/November#1. — Ungoliant (falai) 14:10, 16 May 2016 (UTC)
Also one of my personal favourites: Mirandese you ‎(I) (coupled with Danish I ‎(you)). — Ungoliant (falai) 14:11, 16 May 2016 (UTC)
I think Gift and Speck deserve a category of their own, since they are in fact exact cognates with the English homographs, whose meanings diverged quite far. --WikiTiki89 14:37, 16 May 2016 (UTC)
कट ‎(kaṭ) /kəʈ/ sounds like "cut" and has the same meaning. —Aryamanarora (मुझसे बात करो) 18:18, 23 May 2016 (UTC)

OK, my gift (not Gift!) to User:Chuck Entz: a chain reaction of false friends....

meaning Galician
"hand" man
"man" home

This actually made that, when opening the first Zara Home shops in the country of origin of the company, (some) people thought that they would be the men's department.

And you may wonder... how to render English "home" into Galician? casa. Galilove... Sobreira (talk) 09:47, 27 May 2016 (UTC)

This reminds me, although these are homohophones and not exactly homographs, of this saying in English about Hebrew: [aˈni] is [mi], [mi] is [hu], [hu] is [hi], and [hi] is [ʃi]. --WikiTiki89 14:55, 27 May 2016 (UTC)

Wiktionary:About Akkadian[edit]

I just created Wiktionary:About Akkadian, but I do not actually know very much about our practices for Akkadian. Can someone who works with Akkadian help fill in the information? --WikiTiki89 15:41, 16 May 2016 (UTC)

@ObsequiousNewt, JohnC5, DerekWinters, Angr: Pinging people who expressed some knowledge of cuneiform in a recent discussion on Hittite lemmas. --WikiTiki89 19:40, 16 May 2016 (UTC)
I'd love to comment, but the discussion above was never satisfactorily resolved, and most of my concerns with my various proposals were never actually addressed. Since the orthography is similar here, my comments apply also. —ObsequiousNewt (εἴρηκα|πεποίηκα) 15:36, 19 May 2016 (UTC)
As before, I would like to know what to do with determinatives (are they part of the lemma or not?). I don't believe they should be because they have no phonetic realization. Should we have a vote about cuneiform lemmatization? —JohnC5 18:39, 19 May 2016 (UTC)
I believe forms with determinatives should be alternative forms, unless the forms without the determinatives are much less common (OTOH for Ancient Egyptian, I would say determinatives should be part of the lemmas, since they were essentially required for most phonetically spelled words). Whether they have a phonetic realization or not is irrelevant. But anyway, there are a lot more basic things to decide first, such as which dialect to use. I'm biased towards Old Babylonian, because I'm going Huehnergard's grammar, but most of our transliterations seem to be in later dialects without the final -m. Also, it seems that most of our entries are for logograms, whereas logograms were usually less common than than phonetic spellings for most words. --WikiTiki89 18:55, 19 May 2016 (UTC)
I think this discussion would benefit by looking at the example of the Han characters used in Chinese, Japanese, Korean and Vietnamese: they are, like the cuneiforms, a very long-lived mixed logographic/phonemic system adopted and used by unrelated languages. The Chinese and Japanese editors, especially, have had a great deal of experience working with variations on some of these very issues.
While you won't find purely semantic independent characters to correspond to determinatives, the vast majority of the characters contain some combination of recognizable semantic and phonemic elements. A very transparent example is (originally both he and she), and , which was created by replacing the semantic element meaning "person" with one meaning "woman".
As for Sumerograms and other variations in the ways that cuneiforms can be interpreted: in Japanese there are usually a number of readings for any given character, which are classified into a series of etymologically-based named types: the on readings are borrowed from Chinese, with several subdivisions for the topolect and/or period of Chinese the word was borrowed from, and the kun readings are native Japanese.
Of course, with cuneiforms we don't have a corresponding body of native scholarship nor the knowledge of modern speakers to draw from, so they're a lot messier. Chuck Entz (talk) 03:13, 20 May 2016 (UTC)

A new "Welcome" dialog[edit]

Hello everyone. This is a heads-up about a change which has just been announced in Tech News: Add the "welcome" dialog (with button to switch) to the wikitext editor.

In a nutshell, later this week this will provide a one-time "Welcome" message in the wikitext editor which explains that anyone can edit, and every improvement helps. The user can then start editing in the wikitext editor right away, or switch to the visual editor. (This is the equivalent of an already existing welcome message for visual editor users, which suggests the option to switch to the wikitext editor. If you have already seen this dialog in the visual editor, you will not see the new one in the wikitext editor.)

  • I want to make sure that, although users will see this dialog only once, they can read it in their language as much as possible. Please read the instructions if you can help with that.
  • I also want to underline that the dialog does not change in any way the current site-wide configuration of the visual editor. Nothing changes permanently for users who chose to hide the visual editor in their Preferences or for those who don't use it anyway, or for wikis where it's still a Beta Feature, or for wikis where certain groups of users don't get the visual editor tab, etc.
    • There is a slight chance that you see a few more questions than usual about the visual editor. Please refer people to the documentation or to the feedback page, and feel free to ping me if you have questions too!
  • Finally, I want to acknowledge that, while not everyone will see that dialog, many of you will; if you're reading this you are likely not the intended recipients of that one-time dialog, so you may be confused or annoyed by it—and if this is the case, I'm truly sorry about that. This message also avoids that you have to explain the same thing over and over again—just point to this section. Please feel free to cross-post this message at other venues on this wiki if you think it will help avoid that users feel caught by surprise by this change.

If you want to learn more, please see https://phabricator.wikimedia.org/T133800; if you have feedback or think you need to report a bug with the dialog, you can post in that task (or at mediawiki.org if you prefer).

Thanks for your attention and happy editing, Elitre (WMF) 16:47, 16 May 2016 (UTC)

Would it really have been impossible or hard to switch this off for registered and logged-in users? DCDuring TALK 17:16, 16 May 2016 (UTC)
The task says so. I'm also here for a reminder—this wiki features a Single Edit Tab system; if you're not sure you know or remember how that works, you can read the guide (which details, among other things, how to switch between editors from the buttons on the toolbar); you can change your editing settings at any time, by the way. (I had also written a very quick intro to the visual editor, in case anyone is interested). Best, --Elitre (WMF) (talk) 14:36, 17 May 2016 (UTC)

Looking for someone to help with FWOTDs[edit]

Hey folks. Due to personal and health issues, I’m unable to spend as much time on Wiktionary as I’d like to. Sooner or later, I won’t have time to keep track of foreign words of the day consistently anymore. For this reason, I really need someone who can share the burden of maintaining the project. While technically anyone who wants to set FWOTDs is free to do so, within the limits of the guidelines, if you want my sanction as an “official” maintainer, you must meet the following criteria:

  • know how to find your way through linguistic literature: half our featured words are pilfered from various articles published in god-forsaken periodicals and magazines, and you will have to be able to find this stuff if we are to prevent FWOTD from being a rotation of major western European languages;
  • have common sense: if you think it would be funny to feature penis or something like that, you’re out. FWOTD is serious motherfuckin’ business!
  • willingness to take the blame: why do you think I’m asking for help anyway? I need someone who can be officially blamed for not noticing my mistakes!

Note that maintaining FWOTD involves lot more than just picking words to feature and updating the templates. Spontaneous nominations are only enough for 10-20% of all featured words, despite my bias towards choosing them; the rest is words that I find or create myself, and this takes a lot of time. I also have to create and upload images for words with unusual scripts, keep an eye out for vandalism on featured words during their day of featuring.

If anyone is interested, reply here and I will send you detailed instructions. — Ungoliant (falai) 02:54, 20 May 2016 (UTC)

Those of you who remember the original FWOTD vote will know that I have slacked off to an embarrassing degree, and I feel very guilty about this. If someone who really wants to work on this and has the requisite skills speaks up, I would be happy to let them do it, but I should take up this burden to make up for leaving so much work on Ungoliant's plate. I do have enough time, and I knew how to run it (although I may have forgotten some of the details by now). —Μετάknowledgediscuss/deeds 03:58, 20 May 2016 (UTC)
I'm happy to help out as time permits, but I definitely don't want to be the primary responsible for FWOTD. —Aɴɢʀ (talk) 14:25, 20 May 2016 (UTC)
Likewise, could for example help to clean up nominated articles, do research etc. Thanks Ungoliant for your work on this, FWOTD is one of the things I enjoy most here, both as a reader and contributor. – Jberkel (talk) 10:02, 21 May 2016 (UTC)
Thank you so much guys! I think you’re already familiar with the technical and regulatory aspects, so I’ll just mention the unwritten rules that I try to follow:
  • hard limit: no more than 2 FWOTDs in the same language per month;
  • soft limit: no more than 1 FWOTD " " " (I’ve only been able to pull this off a few times);
  • prefer featuring other people’s nominations over your own;
  • keep FWOTDs that are in the same language, or in chronological variants (i.e. Spanish and Old Spanish), somewhat far apart;
  • check the history page of entries; if the information was added by someone who you’re not sure is trustworthy, try to check the references and citations to see if they’re accurate;
  • no more than one focus week per month (I’ve only had this option once though);
  • wait for at least a day before featuring words that are posted in the nominations; (sometimes I had to ignore this rule because there were no other options that wouldn’t break the hard limit);
  • add {{was fwotd}} to the page immediately after featuring the word. You are going to forget it otherwise, have no doubt about it;
Thanks again. — Ungoliant (falai) 16:16, 21 May 2016 (UTC)
Thanks, Ungoliant. I'll start tidying up and setting words. @Angr, Jberkel: Please feel free to add words as you wish, or just nominate more if you prefer. I'm fine with being responsible, as long as you guys help out! —Μετάknowledgediscuss/deeds 16:52, 21 May 2016 (UTC)

Stress positioning in Estonian IPA[edit]

I changed something in küll only to notice there's the same thing in tool#Estonian. I'm correct that [t'oːl] implies that there is a syllable break between /t/ and /oː/ and hence this IPA practice is wrong? Korn [kʰũːɘ̃n] (talk) 09:35, 21 May 2016 (UTC)

There is no syllable break, but the difficulty for the template (and humans) is knowing where the syllables are broken up. —CodeCat 18:09, 21 May 2016 (UTC)
The stress sign isn't there by accident, it gets triggered by the actual input, which was "k`üll". So I think someone wanted that. Korn [kʰũːɘ̃n] (talk) 18:11, 21 May 2016 (UTC)
That's the pronunciation format used by ÕS, specifically to avoid having to be specific about where the syllable breaks are. The backtick ` indicates an overlong syllable, and is placed before the vowel. Why do we need to know syllable breaks to indicate stress in IPA? The vowel is the nucleus of the syllable, that's where stress should be placed. —CodeCat 18:12, 21 May 2016 (UTC)
See the documentation of {{et-IPA}} for the notation, btw. It's a simplification of what ÕS uses. —CodeCat 18:18, 21 May 2016 (UTC)
We don't need to know it, but in IPA, the character ˈ does not imply a long vowel but primary stress. So what we're currently displaying is not an overlong vowel but a long vowel carrying a primary stress which starts after the initial consonant. So the display ending up with the user doesn't make sense. Korn [kʰũːɘ̃n] (talk) 21:59, 21 May 2016 (UTC)
The overlength is not displayed in IPA because it's not obvious how. It's not the vowel that gets lengthened, but the syllable coda as well. Even consonant clusters can be lengthened, though I don't know exactly what that entails phonetically. In any case, the feature is suprasegmental, it exists not on the phoneme level but on the syllable level. That said, overlength is always accompanied by stress, so it's always ok to assume that a syllable indicated as overlong is stressed. That's what the template does. —CodeCat 01:24, 22 May 2016 (UTC)
Estonian has initial stress anyway, regardless of initial syllable weight, so that still ends up being uninformative.
A simple way to notate overlength would be doubled length marks for long vowels and geminates (e.g. küll [külːː], tool [toːːlʲ]); it's only clusters that are more of a problem. --Tropylium (talk) 19:41, 13 June 2016 (UTC)
Estonian doesn't always have initial stress, loanwords can have noninitial stress, so it needs to be indicated. —CodeCat 20:27, 13 June 2016 (UTC)
OK, rephrasing: Estonian has always initial stress in native vocabulary, so this is not directly related to overlength; so marking overlength as stress does not actually provide any sensible information. --Tropylium (talk) 03:02, 18 June 2016 (UTC)

New logo 2[edit]

I created Wiktionary:Votes/2016-05/New logo 2, to start in a week. It proposes a derivative of the tile logo for the English Wiktionary logo. A rationale is at Wiktionary talk:Votes/2016-05/New logo 2#Rationale. Let us postpone the start of the vote if required by discussion. --Dan Polansky (talk) 08:01, 22 May 2016 (UTC)

Merge all Prakrits[edit]

I think all the Prakrits should be merged into a single language for organizational purposes; do we really need ~5 languages all with the same entry and meaning at 𑀅𑀕𑁆𑀕𑀺 ‎(aggi)? For all intents and purposes, the Prakrits are just dialects. —Aryamanarora (मुझसे बात करो) 18:51, 23 May 2016 (UTC)

We'd need more information to decide this. How different are they? Mutual intelligibility? —CodeCat 19:12, 23 May 2016 (UTC)
[1] (see bottom of page 8, top of page 9) – they are mutually intelligible, but learning a little Sanskrit greatly helped communication. They were similar enough to be used interchangeably in the same works; see Dramatic Prakrits. Of course, there were minor orthographical differences in inflection, but we can settle on Maharashtri Prakrit as a standard (it's the best documented) and build off of it. —Aryamanarora (मुझसे बात करो) 19:28, 23 May 2016 (UTC)
A good analogy is Vulgar Latin, spoken by the common people and thus having many dialects and varying spelling systems. —Aryamanarora (मुझसे बात करो) 19:31, 23 May 2016 (UTC)
How do other sources handle it? I'm reminded of the situation with Ancient Greek, where there are sometimes quite striking differences between dialects (Doric -onti vs Attic -ousi(n)). But for Ancient Greek, Attic is mostly the standard form, except in a few cases (τέσσαρες ‎(téssares), which is apparently not the form of any dialect?). —CodeCat 20:11, 23 May 2016 (UTC)
Maybe not the form of any older dialect, but it is the Koine form (it's in both LXX and NT). —Aɴɢʀ (talk) 21:11, 23 May 2016 (UTC)
@CodeCat: Most dictionaries and grammars focus on Maharashtri Prakrit and detail the Dramatic Prakrits second, and often exclude the lesser Prakrits. We can use {{lb}} to differentiate between dialects. —Aryamanarora (मुझसे बात करो) 23:31, 23 May 2016 (UTC)
  • I don't really have an opinion, but I presume that @-sche would probably like to be made aware of this discussion. —Μετάknowledgediscuss/deeds 04:06, 24 May 2016 (UTC)
Thanks for the ping. I'm more knowledgeable of the other kind of Indian language than this kind. I'm intrigued that Wikipedia's article on w:Prakrit says Ardhamagadhi is the definitive Prakrit, but the literature supports Aryamanarora's statement that it is rather "Maharashtri, which [...] with orthodox Jain scholars generally, is Prakrit proper" (Ramananda Chatterjee, 1927, in The Modern Review, volume 41), "Maharashtri [is] considered the Prakrit par excellence" (Thomas R. Trautmann, 2006, Languages and Nations: The Dravidian Proof in Colonial Madras, ISBN 0520244559). Does the Wikipedia article need to be corrected?
A. C. Woolner (in his 1986 Introduction to Prakrit) says "it may be understood that the different Prakrits were mutually intelligible among the educated"; G. C. Pande (1990, Foundations of Indian Culture) says "the Prakrits were mutually intelligible". - -sche (discuss) 07:27, 24 May 2016 (UTC)
@-sche: I think I understand the discrepancy now; Maharashtri is the main Prakrit of Jainism, Ardhamagadhi is for Hinduism, and Pali is for Buddhism. (Yes, Pali is a Prakrit, but is considered a separate language for sectarian reasons). —Aryamanarora (मुझसे बात करो) 13:46, 24 May 2016 (UTC)

@CodeCat, -sche, Metaknowledge So is this a yes? —Aryamanarora (मुझसे बात करो) 00:14, 25 May 2016 (UTC)

Yes, merge the ones which have "Prakrit" in their names once we decide on a code. Should Pali also be merged, in your view? Authorities have traditionally treated Pali differently from the Prakrits, but for non-linguistic reasons, as you note. - -sche (discuss) 02:49, 25 May 2016 (UTC)
We should leave Pali separate; there are too many entries and Pali has some of its own developments that set it apart from the rest of the Prakrits (multiple scripts, strong East Asian Buddhist influences, etc). —Aryamanarora (मुझसे बात करो) 16:37, 25 May 2016 (UTC)

Also, we could use pra as a language code; it is the collective code in the ISO standard for all Prakrits. —Aryamanarora (मुझसे बात करो) 13:50, 24 May 2016 (UTC)

I'll point out that currently the Prakrit languages are acting as the ancestor languages for several different branches of the Indo-Aryan family (seen here if you scroll way down). We've had some issues in the past of people trying to say words are inherited from Sanskrit when Sanskrit has no direct descendants. If we do merge them, we definitely should have etymology only languages for them. —JohnC5 14:48, 24 May 2016 (UTC)
@JohnC5: Um, (Vedic) Sanskrit is the direct ancestor of all the Indo-Aryan languages; Classical Sanskrit seems to be what you're talking about. Anyway, we definitely need the current codes to remain intact, as many entries reference certain Prakrits (CAT:Hindi terms derived from Sauraseni Prakrit). —Aryamanarora (मुझसे बात करो) 00:14, 25 May 2016 (UTC)
Sorry, yes, Vedic is apparently what I meant. —JohnC5 00:29, 25 May 2016 (UTC)
Is pra a family code? If so, we shouldn't reuse it as a language. —CodeCat 00:43, 25 May 2016 (UTC)
pra is both an ISO-639-5 family code and an ISO-639-2 language code. If we merge the Prakrits, do we still need it as a family code? If not, we could use it as a language code, like nah. Otherwise, how about "inc-pra"? - -sche (discuss) 02:49, 25 May 2016 (UTC)
Both of them would work for me, but pra is shorter, and a family code wouldn't be needed if we merged all of the Prakrits. —Aryamanarora (मुझसे बात करो) 16:37, 25 May 2016 (UTC)
We already have Proto-Indo-Aryan inc-pro for general ancestor needs (and which is marginally distinguishable from Vedic in a few features); I am not sure how much benefit there is in maintaining further ancestor stages? Ardhamagadi as the ancestor of Easter IA (Assemese et al.) and Maharastri as the ancestor of Southern IA (Marathi et al.) is probably at least defensible, but my understading is that there's not a whole lot of consensus on the genetic classification of the New A varieties, including also the exact definition of the Eastern and Southern groups. --Tropylium (talk) 08:23, 27 May 2016 (UTC)
@Aryamanarora I would like to weigh in on the matter and say that the Prakrits should not be merged all together. They are independent languages with different grammars, even though they are very similar. The old Sanskrit plays that often incorporated all the Prakrits did so because they knew that their audience was of the class that would have knowledge of the various languages and their differences. There is a reason that these prakrits have been named separately and given individual grammatical treatises by the various Indian grammarians. And, to argue against merging them over mutual intelligibility, Scots is kept as a separate language despite extremely high levels of intelligibility with English. DerekWinters (talk) 21:06, 25 May 2016 (UTC)
@DerekWinters Their grammars are not that different; they have the same cases, numbers, genders, and inflections. The only differences are spelling, e.g. third-person singular indicative in verbs is marked by -aï in Maharashtri but with -adhi in Sauraseni. They are more similar to each other than the Ancient Greek dialects. There were no "old Sanskrit plays"; the plays were all Prakrit (see Dramatic Prakrits), but certain characters spoke different dialects. Finally, it would make entries so much easier if we merged all of them; do we really need 5-6 entries with the same meaning at "aggi" and "hattha"? —Aryamanarora (मुझसे बात करो) 22:41, 25 May 2016 (UTC)
Also, Prakrit was a vernacular; the people who spoke Sanskrit (Brahmins) simply ignored it as a lower-class language; they would have little knowledge of it. —Aryamanarora (मुझसे बात करो) 22:43, 25 May 2016 (UTC)
@Aryamanarora Sorry I meant the old Indian plays (but also, do look at Sanskrit drama, I believe the Mṛcchakatika is quite famous). And also one could say the same about the cases and numbers and all regarding Avadhi, Braj Bhasha, Kannauji, Hindi, etc. yet they are certainly separate languages. And again, we maintain Scots as separate regardless of its similarities to English. And it's not really a valid reason to say it would make the editor's job easier, because nothing is required of the editor. If you wish, all you need add are the Maharashtri prakrit ones, and someone else someday will add the others. But I do maintain that they are indeed separate languages. DerekWinters (talk) 00:51, 26 May 2016 (UTC)
Also I do believe that Magadhi was quite divergent from the other two Dramatic ones. DerekWinters (talk) 00:52, 26 May 2016 (UTC)
Also, regarding the Brahmins and the prakrits being a vernacular, they were thus spoken by the people, which would include a lot, if not all of the Brahmins. Classical Sanskrit was a very artificial register and during the prakrit era was most certainly only taught as a second language. Also, there are numerous grammars on the prakrits by native grammarians, so they certainly were not ignored. DerekWinters (talk) 00:57, 26 May 2016 (UTC)
@DerekWinters All your points are very good, and I realize some of my claims were false. However, I still think we should merge them. This is an analog of situation of the Ancient Greek dialects, where many dialects diverge from the traditional form (Attic Greek) but ultimately we classify them as one language. Our entries are very well organized as a result; see τέσσαρες ‎(téssares), which is what I think a good unified Prakrit entry would look like. Yes, Magadhi diverges quite a bit, and Gandhari uses a different script, and Elu somehow made it to Sri Lanka. However, they all have such similar characteristics that the would be decently comprehensible among monolingual Prakrit speakers. See the text example at Magadhi Prakrit#Pali and Ardhamāgadhī; even though Pali is a wholly earlier and more divergent stage from Prakrit, the two texts are nicely comparable.
Also, the inflection is more than just similar; it is often the same: See this grammar comparing Sauraseni and Maharashtri declensions of "putta" (son, < Sanskrit पुत्र ‎(putra)). —Aryamanarora (मुझसे बात करो) 01:56, 26 May 2016 (UTC)
@Aryamanarora I see where you are coming from with the inflections, but I believe this may be something like the unification of Chinese. Written, they seem similar (although I would argue that the grammars are much more divergent for the Chinese languages), but spoken, a monolingual speaker of one prakrit would have difficulties understanding the speech of a monolingual of another prakrit. I personally believe this is grounds enough to keep them separate, but I understand if the community doesn't agree. But I must caution, if we are to have entries for unified prakrit, we should have inflection tables for all the varieties attested, and we are likely to have citations for the various varieties. Furthermore, with the phonetic differences among certain words, I believe this would lead to very cluttered and messy entries. I believe that all of this information could be better handled in individual entries. DerekWinters (talk) 02:51, 26 May 2016 (UTC)
@DerekWinters While there would be some difficulty in comprehension, I doubt a monolingual Prakrit speaker wouldn't be able to at least understand the gist of another Prakrit. Literature agrees with me; see -sche's references above. We should definitely make inflection tables for all the Prakrits; we have enough information to do so. The phonetic differences aren't too bad. Mainly, there's a little bit of consonant dropping and sibilant mergers between Prakrits, but IMHO it isn't so bad. I can make some inflection tables right now if needed. —Aryamanarora (मुझसे बात करो) 22:26, 27 May 2016 (UTC)
@Aryamanarora I agree that there are similarities, but we also maintain such differences in several languages here, like Portuguese, Galician, and Fala; Spanish, Asturian, Leonese, and Extremaduran; Persian and Tajik; German and Yiddish, etc. You could definitely argue that they are individual languages, but one could also argue that they are simply dialects of one larger language. And you are correct, a monolingual prakrit speaker would probably understand somewhat another prakrit, especially in the educated, but I do not think that is a fair metric, as the educated would have learned Sanskrit, enabling them significant comprehension of any of its immediate daughter language. But, regardless, we have no way of truly knowing, and as such I think we should maintain the separation that has been held by the writers of the prakrits. They viewed them as separate, and I believe for decent enough reason. DerekWinters (talk) 02:21, 28 May 2016 (UTC)
@DerekWinters Primary sources aren't always reliable for language distinction; look at modern day Serbo-Croatian, Romanian-Moldovian, Hindi-Urdu, etc. You're right though, we really have no way of knowing. I'll stick with the status quo for Prakrit, and continue to treat them as seperate languages. —Aryamanarora (मुझसे बात करो) 15:22, 28 May 2016 (UTC)


Since the proposal at Wiktionary:Beer parlour/2016/March#Etymology section for non-lemmas was inconclusive, I've instead created this template to place in non-lemma etymology sections. The displayed text may need improvement, feel free to propose or make changes. —CodeCat 20:23, 23 May 2016 (UTC)

Perhaps it could say something less jargony like "See etymology on main entry/entries." rather than just "Non-lemma forms." (which wouldn't mean much to most readers) Pengo (talk) 10:45, 24 May 2016 (UTC)
I wholeheartedly concur with Pengo. I like their phrasing as well. —Μετάknowledgediscuss/deeds 07:45, 26 May 2016 (UTC)
I think you could say "See etymology on main entry." as a user has only one in mind. Why wouldn't there be a link to the appropriate L2 section or even the appropriate Etymology section? Presumably there is a language parameter in the template. DCDuring TALK 10:53, 26 May 2016 (UTC)

Rename Category:Fictional abilities to Category:Metaphysical abilities[edit]

The title says one half of the proposal. The other is it to move it to Category:Parapsychology. --Lo Ximiendo (talk) 21:22, 26 May 2016 (UTC)

Parapsychology is a real (pseudo)science that investigates actual events; that term would not be applied to deliberately fictional superpowers like those in comic books. Equinox 21:55, 26 May 2016 (UTC)
By the way, "metaphysical" means "beyond physical". And performing a metaphysical ability, such as telepathy, IS a paranormal activity. --Lo Ximiendo (talk) 21:59, 26 May 2016 (UTC)
One of the best places to hide things you don't want people to take seriously is in fiction. --Lo Ximiendo (talk) 22:00, 26 May 2016 (UTC)
@Equinox: Posted a belated reply. --Lo Ximiendo (talk) 10:26, 27 May 2016 (UTC)

Initialisms of proper nouns that wouldn't meet CFI[edit]

What is our criteria for including these? Should they be in lemma categories? DTLHS (talk) 23:49, 26 May 2016 (UTC)

Initialisms are lemmas regardless. They are full noun lemmas after all, and can have their own inflections. —CodeCat 00:16, 27 May 2016 (UTC)
They are not SoP; someone unable or too impatient to work it out from context might want to know what they mean. Whether they are truly useful is more questionable, but by that criterion many entries would be in trouble. DCDuring TALK 01:03, 27 May 2016 (UTC)

cuprum from Cyprium or from Κύπρος [edit]

Shouldn't we have entries for expressions like aes Cyprium? Cyprus does come from Κύπρος, but cuprum does not directly, it actually comes from aes Cyprium or at least Cyprium. I stated cuprum as derivative in Cyprium. Sobreira (talk) 08:50, 27 May 2016 (UTC)

I see nothing wrong with having an entry for aes Cyprium. It's not SOP, as "Cyprian brass" does not obviously mean "copper". —Aɴɢʀ (talk) 09:14, 27 May 2016 (UTC)
I think, given the difference in the vowel, that cūprum is an older borrowing. —CodeCat 12:36, 27 May 2016 (UTC)

User-friendly reference sheet[edit]

I'm on for some years now and every now and then I encounter a problem and think: How will anyone solve that without having to ask? Now, asking or looking up the help archives is not bad - for us - but we're on the internet and people are not necessarily super willing to invest time to figure out things that seem obscure to them. Wiktionary should be intuitive and easy to start with and not confusing, because at least I know some people who will have fledging interests instantly destroyed if they think oh, that looks too complicated/confusing for me.
What I wish for since years is a well-visible link on the front page to a how to containing the most basic information for entry editing:

  • Link to the list of language codes
  • a (curtly!) commented table of every section of an entry, it will probably suffice to just list the non-optional ones and then link to Wiktionary:Entry_layout#Additional_headings
  • quick explanations of how templates work and how to make them
  • explanation of wiki-formatting and how to create tables
  • short overview of namespaces and what you can expect to find there
  • short information that "Category: Language" and "Wiktionary: About Language"-pages exist and one or two sentences what they contain
  • Mentioning of Wiktionary:Discussion_rooms
  • An overview of the utmost important templates - noted by Jberkel
  • and whatever else absolute basics you can think of

Of course we have most of this information (for template workings, you have to look up MediaWiki, on the other hand), and my proposal here is not meant to imply that all of this information is presented to the user badly. But it's scattered across respective section, it's sometimes stuffed with detailed explanations that are good to have if you want to get a thorough understanding, but are hindering the overview of the actual how to do-part. And claim nobody they are absolutely necessary for a new user, because, let's face it, a very good deal of new users probably just hits edit and does copy-paste. These detailed explanations should be of course reachable from the reference sheet I'm proposing, but what I want for Wiktionary is that a user can do a single click, take 5 seconds to look and think: Oh, I could do that. I'm willing to make a draft of what I have in mind if you don't heavily oppose my idea. (Draft below.) I'm all ears for opinions and arguments. Korn [kʰũːɘ̃n] (talk) 09:26, 31 May 2016 (UTC)

Great idea, this comes up over and over again, and because of this we're definitely losing contributors. A first step would be to identify the key templates and concepts new users need to be familiar with. I also have some ideas how we could make the documentation more accessible and user-friendly, I hope I get to work on some of them at Wikimania next month (anyone else going to be around by chance?) – Jberkel (talk) 11:10, 31 May 2016 (UTC)
Should Help:How to edit a page not cover this? In any case it would be a good starting point for any new guide such as you propose (and which I do support). — Kleio (t · c) 18:41, 31 May 2016 (UTC)
I think the goal is something shorter, not requiring much reading. If it is feasible, it would be useful. We should make sure that any content added in accord with the reference sheet, does not get excessively rough handling from patrollers. Perhaps users should get a free pass if they insert a template saying they were trying to operate in accordance with the reference sheet. Their entries could be reviewed with an eye to tutoring them.
Most good contributions now are additions to or corrections of existing entries: translation, additional definitions, related terms etc. IMO these should be a focus of the reference sheet. DCDuring TALK 19:39, 31 May 2016 (UTC)
Yes, I agree. The current help page does not truly give a clear and concise tutorial to adding information properly to an article, it reads like rather dry documentation. That said, while additions to existing lemmata for some languages (especially English) constitute a majority of constructive edits, not all languages are quite as well represented on Wiktionary. For example, dead languages like Old English, Old Norse and Gothic are three languages I know of that would usually benefit more from additional lemmata than additions to existing ones. This is also true for most minority languages, which similarly often lack basic vocabulary. That may be something to keep in mind. — Kleio (t · c) 22:30, 31 May 2016 (UTC)
I expect that the dead-language folks would tend to be more willing to put up with and more capable of handling some complexity. In any event there is some language-specific knowledge required as well.
But because this is English wiki we tend to get (and need to get) as contributors native English speakers, who are often not very sophisticated in their understanding of language matters and are often monolingual. I think they are important targets for at least one reference sheet. Folks for whom English is a second language seem to typically have at least a bit more language sophistication, but they also have different needs, eg, the need to write definitions (or select glosses) in English, not exclusively relying on old bilingual dictionaries that use words not really suitable for a defining vocabulary for a current dictionary. DCDuring TALK 00:21, 1 June 2016 (UTC)
Unfortunately, languages like Old Norse, Old English and Gothic tend to attract people with agendas, or for whom the languages are just an intellectual game. For instance, we have a French IP user who's very well-versed in a number of difficult languages, but they don't like to be limited by things like attestation and historical authenticity. We have to watch constantly to keep them from adding Gothic and Old Church Slavic translations for things like television and Esperanto.
Then there are the people who find obsolete references like this and [this and this, so they feel qualified to create entries and add etymologies. Making things easier for such people to start editing may not be a good idea... Chuck Entz (talk) 03:28, 1 June 2016 (UTC)
People with an agenda and the will to dig for obscure references are people with enough time and passion to get into this anyway. We need to make things look simple and non-threatening for what I always call passer-bys. Korn [kʰũːɘ̃n] (talk) 11:18, 1 June 2016 (UTC)

I completely agree with Korn (talkcontribs), Even for the experienced user it's oftentimes tedious to find the documentation and keep up with the everchanging template zoo. Matthias Buchmeier (talk) 20:39, 31 May 2016 (UTC)

  • User:Korn/draft This is the rough sketch of what I was thinking of. Actually, "how to make a core entry" already has too much text for my taste, but I don't think one can subtract any information without creating a lack. Korn [kʰũːɘ̃n] (talk) 10:00, 1 June 2016 (UTC)

June 2016

Colored box around closed votes?[edit]

I think it would be useful if we put a colored box around votes after there closed, the way we do with RfDs when they're archived. Purplebackpack89 13:47, 1 June 2016 (UTC)

It might be nice, but you could overlook that, too, like you overlooked the "Status/Votes" column. An absent-minded mistake by you doesn't mean we have to rearrange everything to make it impossible for you to make the same absent-minded mistake again. Most people would just say "oops- my bad" and let it go. Chuck Entz (talk) 01:29, 2 June 2016 (UTC)
Purplebackpack89 also overlooked the decision at the end of the vote page. --Daniel Carrero (talk) 01:59, 2 June 2016 (UTC)
@Chuck Entz I'm not the dullest tool in the shed. If I make that absent-minded mistake, it's likely others would too. It's unlikely I'd notice something like the whole vote being shaded red, blue or green. @Daniel Carrero I don't think you're seeing the problem. Because the decision is at the bottom instead of the top, it is BEYOND the voting section. Purplebackpack89 16:47, 2 June 2016 (UTC)
Yeah, but it also has the close date near the top of the vote. Maybe you should give the "I'm not the dullest tool in the shed" thing a rest. —Μετάknowledgediscuss/deeds 17:49, 2 June 2016 (UTC)

Closed votes still in "current votes" section[edit]

Should votes that are closed be removed from the "current votes" section and put in a "recently closed" section? It seems bad form to have open votes and closed votes both as "current" Purplebackpack89 13:49, 1 June 2016 (UTC)

No. Not worth the extra visual clutter and the extra work. It's hard enough to make things idiot-proof- do we have to make things Purplebackpack89-proof, too?
Please note that I'm not calling you or likening you to an idiot (an idiot would be easier to anticipate). Chuck Entz (talk) 01:43, 2 June 2016 (UTC)
@Chuck Entz They have these fail-safes on Wikipedia. It's not like I'm suggesting anything revolutionary. And you ARE kinda suggesting that mistakes I make are mistakes nobody else could ever conceivably make, while defending a very confusing process. Purplebackpack89 16:48, 2 June 2016 (UTC)
Absent-minded mistakes aren't related to level of intelligence- if anything, people with more going on in their minds are more likely to make them. I also am not saying that you make mistakes that nobody else makes (except with regards to misinterpreting others' intent- but that's different). No, my point was that your response to your mistakes is different: there's no need to find someone or something to blame for an absent-minded mistake- we all make them, and no one would bat an eyelash at your admitting to one. You're not going to be singled out from the midst of the flock and eaten by wolves if you show signs of weakness. Chuck Entz (talk) 02:40, 3 June 2016 (UTC)
This isn't about blame, though, it's about improvement. IMO, there are a lot of ways in which Wiktionary is organized that could be better. This is one of them. Purplebackpack89 04:11, 3 June 2016 (UTC)

Sending thanks[edit]

This may seem bloody obvious to some, but…! On the history page you are given the option of thanking an editor. When chosen you are asked "Do you want to send public thanks? Yes or No" - the question could be taken as ambiguous. If I choose "No" am I (1) cancelling the thanks, or (2) sending thanks privately. What does "public thanks" mean? Where are they published?   — Saltmarshσυζήτηση-talk 06:38, 2 June 2016 (UTC)

Special:Log/thanks has a list of them. Wyang (talk) 06:46, 2 June 2016 (UTC)
Yes, I wasn't too sure what it meant when I first thanked someone for an edit. I went with "Yes" just in case, but I imagine many newer users are also confused by the ambiguity. Andrew Sheedy (talk) 07:41, 2 June 2016 (UTC)
@Saltmarsh "Send public thanks for this edit?" If you click no, you are cancelling the thanks.
The question probably should be: "Do you still want to send thanks, with the full knowledge that it will be public? (Yes/No)"
Just to make sure, I clicked "Thanks" in your Beer Parlour edit and then clicked No. If you received my thanks, then I was wrong. --Daniel Carrero (talk) 12:40, 2 June 2016 (UTC)
That's a bit long, though. "Really send public thanks?" would suffice. Equinox 13:38, 2 June 2016 (UTC)
Am I missing something, or does the thanks log not in fact indicate the specific edit that was "thanked"? Equinox 13:40, 2 June 2016 (UTC)
If we want to configure this, it seems the message to edit is mediawiki:Thanks-confirmation2.​—msh210 (talk) 14:22, 2 June 2016 (UTC)
It was "Send public thanks for this edit?" and I've changed it to "Send thanks for this edit? It will be public.". How's that?​—msh210 (talk) 15:52, 2 June 2016 (UTC)
I like Equinox's version ("Really send public thanks")--Dixtosa (talk) 15:54, 2 June 2016 (UTC)
Personally, I don't like when software uses the word "really". It sounds too colloquial. --WikiTiki89 16:01, 2 June 2016 (UTC)
(You should see the appalling slang in Office 2013!) Alternatively, we could just reduce it to "Send thanks for this edit?", and document the fact that thanks are public elsewhere. We don't warn about public-ness for other common wiki operations. Equinox 16:57, 2 June 2016 (UTC)
I've seen it, and I don't like it. --WikiTiki89 17:40, 2 June 2016 (UTC)
Equinox's Send thanks for this edit? seems to solve the problem succinctly.   — Saltmarshσυζήτηση-talk 04:59, 3 June 2016 (UTC)
Yeah, but it doesn't solve the problem of notifying the user that the thanks will be public. --WikiTiki89 14:27, 3 June 2016 (UTC)
Why is it necessary to double-check that a user really wants to do what he just said to do? I can understand having that sort of failsafe in place for something potentially damaging, but not for something as innocuous as sending thanks. Can't we just eliminate the message altogether and allow clicking on "thank" to immediately do what it says it does? —Aɴɢʀ (talk) 14:23, 3 June 2016 (UTC)
It's right next to "undo". Really not the kind of situation where you want a slip up. Korn [kʰũːɘ̃n] (talk) 17:16, 3 June 2016 (UTC)

Potential Bot for Adding LSJ and L&S Links to Ancient Greek and Latin Entries[edit]

Hello. In the last few days I have edited the L&S and LSJ templates and modules so that links to the dictionaries resolve correctly from the page names, without use of arguments, in a very large proportion of cases. The exceptions mainly involve proper nouns, affixes, non-lemma forms, and alternative spellings which are not precisely bugs. I have tested a robot called OrphicBot to add LSJ external links to the subset of 4,062 of Wiktionary's approximately 7,000 Ancient Greek entries which are not already linked, which are lemmas, and for which the bare template is tested to produce a valid result. Since, for example, almost all German entries link to the Duden dictionary, it seems consistent to include a link to a freely available dictionary for Greek. I also think it could be quite helpful, since too much inconvenience, perhaps, in Hellenistic pursuits is merely typographical in nature. Equivalently, the Wiktionary Latin section is much more developed, with nearly 30,000 lemma entries, as I recall. If it seems reasonable to others, I would like also to add links to the L&S dictionary via template where these are not already present. The source code (albeit grossly formatted, and in perhaps a still rough iteration) is linked in the user page of OrphicBot, and a small test run can also be seen in the catalogue of that user's contributions. If these edits seem reasonable to make to others here, I will put the bot user status question to vote in the voting area. Thank you. Isomorphyc (talk) 03:04, 3 June 2016 (UTC)

I don't know about L[ewis] & S[hort], but we already have a template {{R:LSJ}} that makes links to Liddell and Scott. The problem is the large number of Ancient Greek entries that don't use it, but instead have merely a link to Wikipedia's article on the Liddell and Scott dictionary. What I'd like a bot to do is go through and change all instances of *[[w:LSJ|LSJ]] to *{{R:LSJ}}, adding any necessary arguments as well. —Aɴɢʀ (talk) 14:28, 3 June 2016 (UTC)
Edited for clarity: this is a problem I have felt as well. Here are options by increasing aggressiveness:
1) just add {{R:LSJ}} to External Links where valid, even if an LSJ-mention exists.
2) replace all LSJ-mentions with LSJ-templates where valid; potentially this effaces bibliographical information (negligibly, I think: if an LSJ mention happens to imply the paper dictionary where it differs from the Perseus version, or where it implies the preface rather than the headword entry, for example.) This is close to my preference.
3) move all existing LSJ links to External links for consistency. This consistent and easy to use, but it destroys far too much bibliographical information.
Additional options/issues:
- Add an additional template to categorise lemmas with no valid entry in LSJ for manual linking. Mostly these are a few hundred non-Attic dialectical spellings and some number of Byzantine words. The former will usually be in LSJ with Attic spellings and the latter will not. A few other examples are prefixes and suffixes. The number is not large and I think this is worth doing.
- I would want to skip over inflected forms; given there are literally millions potentially, to destem and link seems like clutter.
Are there other Greek desiderata that can be addressed?
Isomorphyc (talk) 01:55, 5 June 2016 (UTC)
Hello @JohnC5, @Wikitiki89, @Angr, @Metaknowledge, @Chuck Entz -- thank you all for participating in my small discussion about my robot. I have opened a vote on this topic in the voting [area]. I would respect any of you if you chose not to support me in this, or to abstain, especially since I am so new here; but I would also be gratified should any of you choose to vote. Naturally I would be exceedingly gratified for any of your support. I hope that my recent activity has given some sense of the types of contributions I like to make to Wiktionary. I would still be very grateful for any further concrete References desiderata; I have posted a few blocks of samples on the user page of User:OrphicBot should it bring anything to mind that anyone might like-- or indeed might not like about the presentation. Thanks. Isomorphyc (talk) 07:27, 16 June 2016 (UTC)
Hi @JohnC5, I've been working on the pronunciations a bit. Does the new robot edit on diff:χρηστότης look worth proceeding with? I'm posting this here mainly so anyone interested knows I am working on this and can object if desired. It seems 1/8th of the grc-ipa-rows usages have all unambiguous vowels (approx. 235), and can be replaced with no arguments. If this looks reasonable I'll proceed with the following steps: 1) test for a,i,u in diphthongs and call grc-IPA with no arguments 2) test for breves or macrons in head=... and generate arguments 3) look in to finding head=... arguments from LSJ or possibly flagging ambiguous vowels missing head=... arguments, either from the grc-noun (and similar) or with a robot. Also: I noticed grc-ipa-rows produces unexpected output pretty regularly, so I am not using it to test the correctness of or generate any grc-IPA arguments.Isomorphyc (talk) 18:06, 20 June 2016 (UTC)
@Isomorphyc: The diff for χρηστότης looks good to me. I agree that all unambiguous entries should be changed, and your plan for proceeding seems logical. If we can cut down the number of ambiguous ones to a few hundred, we can fix the rest by hand. —JohnC5 00:07, 21 June 2016 (UTC)

bot status vote[edit]

Planned, running, and recent votes [edit this list]
Ends Title Status/Votes
Jun 17 User:Whymbot for bot status passed
Jun 27 New logo 2 Symbol support vote.svg16 Symbol oppose vote.svg7 Symbol abstain vote.svg6
Jun 30 User:OrphicBot for bot status Symbol support vote.svg4 Symbol oppose vote.svg1 Symbol abstain vote.svg0
Jul 6 Spaces in links Symbol support vote.svg9 Symbol oppose vote.svg0 Symbol abstain vote.svg0
Jul 19 label → lb Symbol support vote.svg6 Symbol oppose vote.svg1 Symbol abstain vote.svg1

Is no one watching the floating {{votes}} box? Some people are wondering why the vote on User:UT-interwiki-Bot has not been closed out. It appears to have passed a week ago. —Stephen (Talk) 14:43, 3 June 2016 (UTC)

Since you noticed it, could you not have closed it out? Anyway, I just closed it out. --WikiTiki89 14:53, 3 June 2016 (UTC)
That's the issue with having the box require manual updating, which I seem to remember opposing back when @Daniel Carrero instituted it (but I thought he would deal with it). —Μετάknowledgediscuss/deeds 18:11, 3 June 2016 (UTC)
You are referring to Wiktionary:Beer parlour/2016/January#Vote counter. Adding the result in the box was @Benwing2's idea, I just implemented it. In that discussion, I did not even formally support the idea, I "voted" abstain. That is, I don't really care if we have the result in the box or not. --Daniel Carrero (talk) 18:21, 3 June 2016 (UTC)
@Metaknowledge: The problem here is not that the box wasn't updated, but that the vote wasn't closed at all. --WikiTiki89 18:42, 3 June 2016 (UTC)
My mistake. —Μετάknowledgediscuss/deeds 18:49, 3 June 2016 (UTC)
I used to close out most of the votes, but the last time I did it, User:DCDuring began crying corruption! corruption! (or words to that effect), and, try as I might, I was never able to get an explanation of his accusation, so I stopped handling votes. I would not even have mentioned this unnoticed vote here, except that someone asked me to close it out, which I will no longer do. —Stephen (Talk) 18:22, 3 June 2016 (UTC)
I think you are talking about Wiktionary:Votes/pl-2014-07/Allowing well-attested romanizations of Sanskrit and Wiktionary:Beer parlour/2015/July#Persistent extensions of votes.
If you change your mind and decide to close votes again in the future, it would be fine by me, for what it's worth. --Daniel Carrero (talk) 18:37, 3 June 2016 (UTC)
I have four main objections to past practice on voting:
  1. Votes should rarely be extended and never by initiators of proposals or those who have strong opinions. Exceptions might be made by following some reasonable procedure.
  2. Substantive votes should not be too abundant or complex. Some special procedure might be needed to allow a large number of votes or a complex voting structure.
  3. There should be a minimum number of participants, including abstainers, for a vote to be closed out with an outcome that changes the status quo ante.
  4. "Technical" changes that have broad implications should not occur without a vote.
These are essentially due process objections. Does anyone have any reason why I should not have these objections? DCDuring TALK 18:47, 3 June 2016 (UTC)
You should have listed your objections instead of crying corruption! over and over. At first I thought you were accusing Dan of being corrupt because he likes to extend votes. Eventually it dawned on me that you were accusing me of corruption, but could not imagine what you were referring to. In any case, let’s not rehash this here. As far as I’m concerned, the matter ended a year ago and I’m not interested in reliving it. I’m only explaining to Wikitiki why I did not close out the vote. —Stephen (Talk) 18:57, 3 June 2016 (UTC)

Constructed languages and Foreign words of the day[edit]

Per the opinion poll taken alongside the original vote that established the Foreign word of the day project, we have only featured words in attested natural languages. However, part of the purpose of FWOTD is to exhibit the content that we have to offer, which includes excellent coverage of various constructed languages and reconstructed languages. Personally, I would like to see mainspace constructed languages like Esperanto be featured, and also well-referenced reconstructed languages like Proto-Germanic (so nothing in the Appendix namespace would be featured). Of course, they would still need to meet all the requirements that terms normally need to meet. What do you think? —Μετάknowledgediscuss/deeds 21:17, 3 June 2016 (UTC)

I don't mind including words in constructed languages already approved for mainspace (e.g. Esperanto), but I'd be opposed to including words in protolanguages. There's a reason protolanguages aren't in mainspace, and that reason should apply to FWOTD as well. —Aɴɢʀ (talk) 23:07, 3 June 2016 (UTC)
I support the featurability of both. — Ungoliant (falai) 02:23, 4 June 2016 (UTC)
Ditto   — Saltmarshσυζήτηση-talk 04:34, 4 June 2016 (UTC)
I also have no real objection to either being featured. It could be interesting to include appendix-only constructed languages as well, but that might limit our credibility as a serious dictionary in the eyes of some. Andrew Sheedy (talk) 05:38, 4 June 2016 (UTC)
It feels as though these would be mainly of interest to linguists and editors, and less so to mainstream language users and learners. Equinox 05:45, 4 June 2016 (UTC)
I was thinking no more than one a month of each, for that reason. But then again, more than 400,000 people are learning Esperanto on Duolingo, so these clearly aren't so unpopular as you might think. —Μετάknowledgediscuss/deeds 05:54, 4 June 2016 (UTC)

Deprecation tags for language codes[edit]

@-sche I've added a bit of code to Module:languages that checks for a deprecation tag on the language data, and includes a tracking template if found. This can be used to easily track down uses of a code that is being phased out, without generating errors everywhere when the code is removed. To use it, just place deprecated = true in the entry in the relevant language data module. Pages that use the code will then appear in Special:WhatLinksHere/Template:tracking/languages/deprecated as well as "Special:WhatLinksHere/Template:tracking/languages/deprecated/(language code)". See Special:WhatLinksHere/Template:tracking/languages/deprecated/dlc for an example where this has been used. I hope it's helpful! —CodeCat 12:35, 4 June 2016 (UTC)

Automatic transliteration for Thai has been disabled for now[edit]

Previous discussion: User talk:Wyang#Module:links

I disabled the automatic transliteration for Thai, because Module:th-translit isn't generating the right transliterations. Apparently, the code to generate the correct transliteration is located in Module:th in the getTranslit function, so this needs to be added to the transliteration module so that it generates the correct transliterations. User:Wyang had added workaround code to Module:links instead, but this is inappropriate, especially considering the code to generate a proper transliteration already exists, so I removed it again. Module:th-translit should be modified so that such workarounds are no longer necessary; then the automatic transliteration can be reinstated. —CodeCat 12:59, 4 June 2016 (UTC)

What you are doing is a perfect manifestation of your arrogance, ignorance and mindlessness. "So this needs to be added to the transliteration module so that it generates the correct transliterations." – while Module:th-translit is working perfectly fine with phonetically respelled words. You are suggesting that I should turn a transliteration module into a module that actually parses the entire entry's Wikitext and extract certain parts of the text, because "this is what a transliteration module is supposed to do". Sigh! So much for Eurocentric hubris on Wiktionary. "I shall break it, and ask you plebs to explain to me why things broke after this." Wyang (talk) 13:23, 4 June 2016 (UTC)
It's supposed to work with as many words as possible, not just phonetically respelled ones. The getTranslit function is capable of generating better transliterations, so this needs to be integrated into Module:th-translit. Right now, Module:th-translit only correctly transliterates a subset of the words that it could, in theory, but adding custom code to Module:links is not the way to fix that. Modifying Module:th-translit is the right way. User:Wikitiki89 even did so yesterday, and you just reverted it. Why? —CodeCat 13:36, 4 June 2016 (UTC)
Because those codes do not belong to a transliteration module page. How many times do I need to iterate that? Wyang (talk) 13:43, 4 June 2016 (UTC)
Yes they do. And they certainly do not belong on Module:links instead. —CodeCat 13:47, 4 June 2016 (UTC)
Which definition of "transliteration" is for this? Wyang (talk) 13:58, 4 June 2016 (UTC)
The same definition we apply across Wiktionary: generating a Latin-script version of a word, that can be understood by people who don't know the script. The accuracy of the transliteration, or its nature (pronunciation or spelling based) is up to the editors of the language and of the transliteration module. However, under no circumstances should a generic language-agnostic module be used to work around a deficiency of the transliteration module. —CodeCat 14:05, 4 June 2016 (UTC)
In that sense Module:th-translit is working perfectly well. It's just that your Module:links failed to take into account the fact that some languages require another level of phonetic respelling extraction, and it is that phonetic respelling, rather than the entry title itself, that needs to be fed to the transliteration modules. Wyang (talk) 14:17, 4 June 2016 (UTC)
Yes, and in those cases, we use the tr= parameter that is available on countless templates. But let's stick with the situation here. You have a function getTranslit that is clearly capable of generating the correct transliteration, albeit that it has to parse the page's content in order to extract it. The method used is completely irrelevant. It is clear that there exists a function that is capable of doing the transliteration better than Module:th-translit is currently doing. Therefore, it seems obvious that this function should be added to Module:th-translit so that its transliterations become more accurate. This is what Wikitiki89 tried to do, so what is your objection against having better transliterations? And why do you insist on putting inappropriate workarounds in Module:links instead? —CodeCat 14:29, 4 June 2016 (UTC)
Regarding your latest attempt at editing Module:links, the edits are completely unnecessary. This module doesn't have to account for this "phonetic extraction". The transliteration module can perform "phonetic extraction" instead. So please, for the nth time, add it to Module:th-translit and stop edit warring in Module:links. —CodeCat 14:32, 4 June 2016 (UTC)
I just fixed your Module:links, which you again reverted. Module:th-translit is functioning perfectly, given the right inputs. Stop insisting that this belongs at Module:th-translit; it does not. This is not transliteration.
Transliteration is not concerned with representing the sounds of the original, only the characters, ideally accurately and unambiguously. (Wikipedia)
It belongs at Module:links, which is lacking this new functionality of extracting the phonetic respelling to feed into the transliteration module. So for the nth time, please mend your Module:links so that it is fully language-agnostic, not just European language-agnostic. Wyang (talk) 14:49, 4 June 2016 (UTC)
The transliteration module itself should extract this information if it needs it. —CodeCat 14:55, 4 June 2016 (UTC)
Then it is not a module that does transliteration any more. This is exactly why the transliteration module should not be responsible for extracting this. Transliteration module is for transliteration, which is faithfully and systematically converting one writing system to another. Module:th-translit is fully functional at what it does, which is transliteration. A module that tries to extract phonetic respellings is a pronunciation module, which would have to be defined in Module:languages/data2 and have the infrastructure built around it, i.e. mending Module:links. Either way Module:links has to incorporate additional functionalities for non-phonetic languages. Wyang (talk) 15:03, 4 June 2016 (UTC)
I don't care if it doesn't do transliteration according to your narrow idea of what a transliteration is. Nobody else on Wiktionary cares either, I'd bet. What we all care about is that it generates transliterations according to what Wiktionary's idea of transliteration is, and has been for years, not what your idea of it is. —CodeCat 15:07, 4 June 2016 (UTC)
You are arguing whatever you believe in is what Wiktionary believes in, allegedly in opposition to what I believe in. A bit tongue-tied, probably? Wyang (talk) 15:17, 4 June 2016 (UTC)
I have restored automatic Thai transliteration. Remember that what you are doing is against the goal of this project - rather than improving the pages, removing information from numerous entries. Wyang (talk) 13:36, 4 June 2016 (UTC)
I have removed it. It's still not fixed. Stop edit warring and reach a consensus first. —CodeCat 13:37, 4 June 2016 (UTC)
Edit warring? Or undoing highly destructive edits to the project? Wyang (talk) 13:39, 4 June 2016 (UTC)
You added unnecessary custom code to Module:links, and when reverted, you keep reinstating it over and over despite a clear lack of agreement. That is edit warring against consensus. Reach a consensus for your edit first, then it can be reinstated. —CodeCat 13:40, 4 June 2016 (UTC)
It has been there for months. You abruptly removed it, causing all the Thai links to malfunction, prompting Thai editors to ask me to look into the problem and restore the original functionality. Can you be even further from the truth? Wyang (talk) 13:43, 4 June 2016 (UTC)
It never should have been added in the first place. Not in a highly visible and widely used language-generic module like Module:links. Language-specific code belongs in language-specific modules. —CodeCat 13:45, 4 June 2016 (UTC)
User:Wyang, again, please reach a consensus for your edit to Module:links rather than forcing the issue. Do not edit war to push your opinion through. Wait until there is a general agreement that your code belongs in the module. —CodeCat 13:53, 4 June 2016 (UTC)
Stop vandalising the page! Your removal simply wiped out thousands of correct Thai transliterations from Wiktionary pages. Where is your protest when I added it back in February? And where is your explanation when you suddenly removed the code 6 days ago? If you would like to maintain the status quo, at least get the version right. Wyang (talk) 13:58, 4 June 2016 (UTC)
Is there a time limit for contesting something? How long ago should an edit be before it's considered an automatic consensual status quo? Do we have a policy for this? I am contesting your edit now, as have two others so far, but you continue to ignore them and push your edit through. That is edit warring against consensus and I wouldn't be surprised if it got you blocked, though I won't be the one to do it because I'm involved in the dispute and people won't like that. —CodeCat 14:01, 4 June 2016 (UTC)
Did you forget that your edit had been reverted twice [38649499][38650974] by someone other than me? Taking out the block card now? A step-up from your threat to disable on my talk page? Four months seem like a much longer time than 6 days. Wyang (talk) 14:07, 4 June 2016 (UTC)
Reverts aren't the only way to contest an edit. But in any case, your edit was reverted first by me, then by Wikitiki, then by me again, then you started edit warring, and Dixtosa has also contested your edit. In comparison, only you and Metaknowledge have supported it. According to our common practice, consensus requires a 67% majority in favour, which is clearly not the case. So your edit has no consensus. —CodeCat 14:09, 4 June 2016 (UTC)
So stop your vandalism. The reason you dare to tackle Thai specifically is you simply don't care. You just don't care about what Thai editors think at all, hence destroying thousands of Thai entries is perfectly justified in your opinion. Wyang (talk) 14:17, 4 June 2016 (UTC)
Please stop using personal attacks. Reverting an edit that has no consensus is not vandalism. Reinstating that edit over ten times despite being notified that your edit has no consensus is vandalism. —CodeCat 14:29, 4 June 2016 (UTC)
Are you denying that your edit effectively eliminates valid Thai transliterations from thousands of entries? Repeatedly removing any one of those thousands of transliterations would lead to someone being blocked. So not vandalism you say? Wyang (talk) 14:34, 4 June 2016 (UTC)
Only for as long as the transliteration module hasn't been fixed to compensate. The fact that you refuse to do so does not suddenly make my reversions vandalism. In fact, you also reverted Wikitik89's edit to Module:th-translit, which did fix (or attempt to fix) the module. So it appears you are not actually interested in fixing the transliterations. —CodeCat 14:37, 4 June 2016 (UTC)
I have now reinstated User:Wikitiki89's edit to Module:th-translit. Reverting this again would re-break the transliterations, thus doing the exact same thing that you accuse me of doing. So if you revert this too, then I can only assume you are not interested in finding a solution for this problem. —CodeCat 14:41, 4 June 2016 (UTC)
It looks like พลเรือน ‎(pon-lá-rʉʉan) once again has the correct transliteration. Why you reverted the edits by Wikitiki89 that restored this is beyond me. But please do not break it again. —CodeCat 14:45, 4 June 2016 (UTC)
As I said numerous times before, this is not transliteration. It does not belong in a transliteration module. Transliteration is the faithful letter-to-letter correspondence performed between writing systems, which is obviously not the process you and Wikitiki89 would like to see implemented in Module:th-translit. Which is hence something that more properly belongs elsewhere, i.e. at your Module:links. Wyang (talk) 14:49, 4 June 2016 (UTC)
Transliteration on Wiktionary is not the faithful letter-to-letter correspondence, and it never has been. Many languages have non-orthographic transliterations. Hindi, Chinese, Russian, just to name some. You cannot just unilaterally redefine what "transliteration" means on Wiktionary to suit your purposes, and then demand that everyone else accepts your edits to a generic module to work around it. It seems that this isn't a workaround for code, but a workaround for your own mental idea. —CodeCat 14:58, 4 June 2016 (UTC)
Well, there has never been a Module:zh-translit! Because a Chinese-English transliteration system is never possible. Hindi and Russian have two sets of transliteration and pronunciation modules: Module:hi-translit vs Module:hi-IPA, and Module:ru-translit vs Module:ru-pron, with the former doing fairly strict transliteration and the latter IPA interpretation based on transcription. Thai also has two: Module:th-translit vs Module:th-pron. And yet you are suggesting that th-translit should take on the role of the latter. It is never my "own mental idea" - it is what the definition of transliteration is, and it forms the basis for its distinction from "transcription", whether you are willing to accept it or not. Wyang (talk) 15:17, 4 June 2016 (UTC)
  • I do not think any module that is to be invoked in mainspace should EVER take content from the entry and parse it, because the entry can get arbitrarily large and introduces very difficult dependency. It is abusing Lua. As for code placement, it is about how you look at *-translit modules. CodeCat views (shared by me) them as the general transliteration modules which should work independently (i.e. not necessarily through Module:links). But, again, I disapprove the parsing part. @Wyang, why do not you just pass them as arguments? --Dixtosa (talk) 13:41, 4 June 2016 (UTC)
  • I apparently disagree. There are huge benefits from the use of parsing, as Wiktionary's system is inherently cumbersome and unsuitable for building a dictionary without parsing. See {{zh-forms}} for an example. Wyang (talk) 13:52, 4 June 2016 (UTC)
    Wait, so you've done this for other languages too?? —CodeCat 13:58, 4 June 2016 (UTC)
    Ignorant - you must be reading European language entries only in this year and a half. Wyang (talk) 14:03, 4 June 2016 (UTC)

Recap for us outsiders: Did I understand correctly that the way Thai editors handled the situation worked but was incompatible with some stuff CodeCat's robots do, so CodeCat changed it to make it comply with his/her robots, which in turn broke it for Thai editors? And now you can't decide which way to go because you do not agree whether or not the module should scan the entire entry or not? Korn [kʰũːɘ̃n] (talk) 15:00, 4 June 2016 (UTC)

No this has nothing to do with bots. What happened, it seems, was that Wyang insisted that transliteration modules should only give letter-for-letter transliterations. But doing that would generate incorrect transliterations in many entries because Thai script is rather haphazard. So rather than adjusting their dogma - and the transliteration module - they instead made an edit to Module:links, a generic language-agnostic module, to entirely bypass the defective transliteration module. This code was noticed a few months later by me, and removed, then removed by Wikitiki89 again, then removed a whole lot more by me again. Wikitiki made edits to Module:th-translit and Module:th which fixed the transliterations after removing the Thai-specific code from Module:links had broken them. However, this seemed to go against Wyang's dogma that transliteration modules must transliterate letter-for-letter (even though they don't, and never have, on Wiktionary), so he reverted the edits and again reverted me when I tried to reinstate the fixes Wikitiki made. —CodeCat 15:05, 4 June 2016 (UTC)
The transliteration system for Thai was fully and well functional since its implementation in February, until it was abruptly removed by User:CodeCat six days ago. A bit of investigation led to User:CodeCat's edits which basically led to all Thai transliterations on Wiktionary non-functional. Wyang (talk) 15:07, 4 June 2016 (UTC)
Did you miss the fact that Wikitiki fixed the problem, and you undid his edits? Your undoing broke the transliterations again, but instead of putting Wikitiki's edits back in, instead you insisted that Module:links be edited to fit your dogma instead. —CodeCat 15:10, 4 June 2016 (UTC)
  • You both claim that your way produces correct results and the system of the other breaks it. Can you each provide a specific example which works with your system and say how it gets broken by your opponent's method? Korn [kʰũːɘ̃n] (talk) 15:13, 4 June 2016 (UTC)
    Both methods work. However, I object to having extra code in Module:links that handles deficiencies in Module:th-translit, deficiencies that were readily remedied by Wikitiki. The issue seems to be that Wyang dislikes Wikitiki's remedies, but to undo them he has to reinstate the extra code that I object to. I think that problems with Module:th-translit ought to be fixed in that same module, as Wikitiki did, rather than introducing workarounds in another module that has nothing to do with Thai. —CodeCat 15:16, 4 June 2016 (UTC)

For many languages transliteration and transcription/pronunciation are very different concepts, and Thai is one of these languages. One can generate a transliterated outcome for a Thai word (Module:th-translit), but oftentimes this is different from the pronunciation. The core issue here is that Module:links provides no support for these non-phonetic languages, which is why I added the new functionality in the module. Such information does not belong to individual transliteration modules, as this is a widespread linguistic phenomenon and the addition would greatly benefit many non-European languages (for example, Chinese and Japanese). The lack of transcription support in the central linking templates/modules is exactly the reason these languages have been moving away from the standard linking templates, resulting in much confusion and repetition during editing. Wyang (talk) 15:43, 4 June 2016 (UTC)

That's irrelevant. Module:links needs no additional support, transliteration modules (for Wiktionary's use of the word) are sufficient. If they are not, then you have to show why. So far you have failed to do so, since Wikitiki's edits (which you reverted) proved you wrong, it's perfectly possible for the existing infrastructure to handle Thai. Perhaps you don't want to be proven wrong? —CodeCat 16:31, 4 June 2016 (UTC)

I don't know what's going on. I am native and I only can say that direct auto transliteration from a "Thai word" could never be done due to complexity uncertainity of spelling. That's why we do it on basic syllables (which are more certain); it has been tought in school either. --Octahedron80 (talk) 15:23, 4 June 2016 (UTC)

Rather than this constant revert war that's going on: is it not possible to apply the code fixes in one single operation that will make the transliterations continue to work as they did before? Fixing only part of it, while leaving Thai users without useful content, seems like a problem. Equinox 16:39, 4 June 2016 (UTC)
As of right now, things work just fine. Wyang keeps reverting it. —CodeCat 16:41, 4 June 2016 (UTC)
Here's how it started(from User talk:Wyang):

I don't understand it either. Why do those edits change the transliterations, even though none is given in the entry? —CodeCat 20:49, 2 June 2016 (UTC)
Even after looking at the section above, I don't see what this edit does. In fact, it seems like it would break cases that have alt forms. --WikiTiki89 14:29, 3 June 2016 (UTC)
I've undone it until we can establish how the special treatment actually changes anything. —CodeCat 14:35, 3 June 2016 (UTC)
It looks like it, somehow, for some reason, changes the transliteration of พล ‎(pon) between "pol" and "pon". But I have no idea why. I think the problem is with the Thai transliteration module here, not Module:links. —CodeCat 14:37, 3 June 2016 (UTC)
I think I fixed the problem with these edits. --WikiTiki89 18:57, 3 June 2016 (UTC)

This was part of a topic where it had been asked what the code was for, and everyone was waiting for Wyang's response. CodeCat acted without knowing what the consequences would be, without waiting to find out what the code was for. That was clearly wrong, and Wyang was understandably upset. Wikitiki89 helpfully came up with an alternative that seems to work.
This whole episode is painful to watch: we have two strong-minded people who have both done great things for the project, but are now butting heads instead of discussing rationally.
Wyang has a history of coming up with ingenious ways to make our system do things that no one would have thought possible. Our Chinese entries are infinitely better than they were, and they're getting better all the time. There are, however, times when the system gets brought to its knees, as at .
CodeCat is as responsible as anyone for the current template, module and category infrastructure that runs this site. This prodigious work ethic and expertise is, however, marred by a willingness to break things in order to force people to fix things she sees as wrong (case in point: Module:parameters). She also has a tendency to ramrod things through, which has created deep resentment in some quarters that has poisoned a number of discussions on unrelated issues.
On the one hand, we have Wyang, still furious about CodeCat's behavior and unwilling to allow anything that would let her get away with it. On the other hand, we have CodeCat, who has gone into Orwellian DoubleSpeak mode to shift attention from her initial, destructive action, portray Wyang as a dangerous loose cannon and portray herself as an innocent victim
We need to get past all of that and look at the merits of how we want to structure this. Our architecture isn't set up to handle the use of respellings in transliteration, so Wyang came up with a kludge to work around this. At the moment, the debate seems to be over where to put the kludge, not on whether there's a better way to do this. My question is: can we come up with a way to get the respellings to the modules without having the modules swallow an entry whole and rummage through it to find them (please forgive the mixed metaphor)? Chuck Entz (talk) 20:05, 4 June 2016 (UTC)

Wiktionary does not have a JSON-style dictionary system, which is why there is so much formatting nuisance with the use of different headers, headword templates, reduplicating etymologies, ectopic related terms and unsystematic pronunciation notations. Each word in a language should be defined by a JSON set, containing a series of qualities indicating the nature and relationships of subordination of various parts of the text. All the Wikitexts on a Wiktionary entry should be generated from scratch, from that JSON set using pre-defined formatting codes which tells the entry how the original core information should be displayed. All the JSON information from entries should be made rapidly indexable to other entries, so that there is no need to repeatedly define what the pronunciation of another word in the etymology is, or what the meanings of that word are.

What Wiktionary has is a very different system. A system that tends to make people think about "what are we eating for tonight" rather than "how can we most efficiently make dinner for the next 20 years". You can create a magnificent, all-encompassing entry for a word in a language if you put into the entry everything that is known on Earth about the word, when in actual fact you should not have to do most of what you did because they can already be found elsewhere in the dictionary and should have been "extracted" rather than "generated or provided de novo". Say you want to link to another word in your perfect entry. Then in the perfect entry on Wiktionary you would have to put in: (1) the word you wish to link to, (2) the transliteration/transcription of that word, (3) the definition of that word, and (4) qualifiers for the definitions (e.g. derogatory, obsolete), although points (2-4) have already been stored in your destination entry. Previously all of (2-4) would have to be provided in the internal link. Things have improved in that point (2) is sometimes no longer necessary, as Module:links will attempt to generate the transliteration from a series of transliteration modules. This is a great leap forward, as we start to realise some of what we previously wrote were not necessary at all. However, the source of that omitted information (i.e. the regenerable information) is misunderstood. It is not the transliteration modules that are ultimately the source of the regenerable information; rather it is the destination entry where the regenerable information is stored. For languages where transliteration approximates fairly well the transcription/transliteration system we use for that language, this is an acceptable and quite efficient way of regenerating the information, despite the non-zero failure rate (e.g. link in коэффицие́нту ‎(koefficijéntu) to коэффицие́нт ‎(koefficijént)). But for languages where transliteration approximates the transcription system we use very poorly (Thai etc.), or where transliteration is intrinsically impossible (Chinese, Japanese etc.), our hub of Module:links simply gives up, telling editors of these languages "sorry, there is nothing I can help you with here", when in fact it should have been set up to facilitate the extraction of the phonetic pronunciation in the destination entry. Languages lie on various parts of this transliteration–transcription continuum and it is outright inappropriate to call this process of phonetic extraction "transliteration" for languages that fall towards the transcription half of the continuum (Thai, Chinese, Japanese etc.), as that is an obvious oxymoron, and/or transcription vs transliteration are contrastive concepts for these languages. Mixing these two very different concepts or intentionally confusing them to achieve minimal effort could be very dangerous. Wyang (talk) 01:46, 5 June 2016 (UTC)

Word Transliteration outcome Transcription outcome What should be returned if transliteration is desired What should be returned if transcription is desired What should be returned if IPA is desired
พล (Thai) pol pon pol pon /pʰon˧/
십육 (Korean) sib.yug simnyuk sib.yug simnyuk /ɕʰimɲjuk̚/
བརྒྱད (Tibetan) brgyad gyaew brgyad gyaew /cɛʔ˩˧˨/
(Chinese) none dòu nil dòu /toʊ̯˥˩/
(Japanese) none mizu nil mizu /mizɯᵝ/
ရှည် (Burmese) hrany she hrany she /ʃè/


Word Transliteration outcome Transcription outcome What should be returned if transliteration is desired What should be returned if transcription is desired What should be returned if IPA is desired
дли́нный (Russian) dlínnyj dlínnyj dlínnyj dlínnyj /ˈdlʲinːɨj/
ტორტი (Georgian) ṭorṭi ṭorṭi ṭorṭi ṭorṭi /tʼɔrtʼɪ/
κέντρον (Ancient Greek) kéntron kéntron kéntron kéntron /kéntron/
^According to the table, transliteration of Thai would be useless and would result in problem on difficult words, such as เศรษฐศาสตร์, รัฐธรรมนูญ. You could try to replace letter-by-letter but no one will understand it. I prefer transcription. --Octahedron80 (talk) 09:46, 5 June 2016 (UTC)
On Wiktionary, the term "transliteration" encompasses transliteration, transcription and general romanization. It's just a historical accident that we call it "transliteration", but it's not transliteration in the strict sense. See Wiktionary:Transliteration and romanization. So it is not an argument that the module can only supply transliterations in the strict sense just because it's called a transliteration module. It's a romanization module, but it's called a transliteration module for historical reasons. —CodeCat 12:35, 5 June 2016 (UTC)
It's not just a historical accident. It is Eurocentrism in Wiktionary at its best. As a consequence of this historical confusion, the central system just assumes that all languages use transliteration as their romanisation method, and Module:links sends words of all languages indiscriminately to their transliteration modules to generate their romanisations. This leaves languages with both transliteration and transcription outcomes unsupported. Thai already has a functioning transliteration module (Module:th-translit), and in addition it also has a transcription module (Module:th). Module:links should relay the 'tr' parameter to the correct place so that it is truly language-agnostic, and this includes distinguishing between the transliterative and transcriptive modules used for a particular language and rendering support to languages that use a transcriptive method of romanisation. Wyang (talk) 12:54, 5 June 2016 (UTC)
The correct place for transcriptions is Module:th-translit, so there is no need for additional code. Are you suggesting that we setup an entirely separate system to deal with transcriptions as opposed to transliterations, and have separate transcription and transliteration modules for languages? What's the benefit? And if you are so passionate about it, why don't you start a vote to change the current practice of including transcription in transliteration, rather than edit warring over it for days? Right now you have yet to display any kind of consensus for your views. —CodeCat 13:00, 5 June 2016 (UTC)
Never mind, I've done it for you. —CodeCat 13:08, 5 June 2016 (UTC)
For whom? You ignored the points I raised above and therefore completely misunderstood what I was saying. Again I feel like I am talking to someone who did not care to read my comments at all. The answers to your questions are: No and no, and you would not have asked these questions if you had read my replies above. I'm not suggesting that we set up an entirely separate system to deal with transcriptions as opposed to transliterations, nor am I interested in having separate transcription and transliteration modules for any other languages that do not differentiate between the two concepts on a romanisation level. Likewise I am absolutely uninterested in changing the current practice of including transcription in transliteration. Wyang (talk) 13:19, 5 June 2016 (UTC)
Then why do you keep edit warring? All Wikitiki's edits did was change Module:th-translit to supply a transcription. If you are fine with this practice, your edits say otherwise. —CodeCat 13:22, 5 June 2016 (UTC)

Poll: Should there be separate systems for transcription and transliteration?[edit]

Currently, both orthographic transcription and phonetic transliteration are subsumed under the term "transliteration" on Wiktionary. Wyang seems to be arguing that we should use "transliteration" only for the strict definition, and have an entirely separate system for transcriptions, allowing them both to exist side by side. Presumably, links and headwords would also show both, if both are available. Do you agree with this change? —CodeCat 13:05, 5 June 2016 (UTC)



  1. Symbol oppose vote.svg Oppose There is no need for separate systems, this overly complicates things without any obvious benefit. The point of transliteration modules currently is to supply a version of a word in the Latin alphabet, without regard to how closely it maps to the orthographic form of the original language. In other words, they are romanization modules, that are called transliteration modules by historical accident. I see no value in being pedantic about the meaning. If we want to display transliteration and transcription side by side whenever applicable, we should be able to demonstrate that users benefit from this information overload. —CodeCat 13:05, 5 June 2016 (UTC)
    What's the use of phonetic transliteration when we already have dedicated Pronunciation section?--Dixtosa (talk) 13:13, 5 June 2016 (UTC)
    The point here is that this involves a change in the status quo. If we want transliterations to be strict transliterations, then we have to change the practices of all languages whose transliteration is not a strict transliteration currently, and make changes to Wiktionary:Transliteration to reflect the new practice. Russian editors have strongly opposed this in the past. @Benwing, Atitarev. —CodeCat 13:19, 5 June 2016 (UTC)

This is the most stupid response I have ever seen. You would not have acted so bizarrely if you were more attentive and respectful, and this includes completely misconstruing my reasoning and thus creating this poll, and abusing your admin rights to block me. I would like to request to have your admin rights reviewed. Wyang (talk) 13:19, 5 June 2016 (UTC)

I blocked you because you keep making edits that have no consensus. This poll is an attempt to establish a consensus, but you continue to revert without awaiting the results of the poll. —CodeCat 13:23, 5 June 2016 (UTC)
Pot, meet kettle; kettle, pot. DCDuring TALK 13:42, 5 June 2016 (UTC)
Aha. Whatever. I'd rather not be compared with this crazy user. Next time I'll just redirect all the Thai complaints to her. Wyang (talk) 13:59, 5 June 2016 (UTC)
  • It seems the status quo ante of Module:links is what CodeCat reverted to, and that therefore CodeCat's edit should be reinstanted but probably not by CodeCat. Wyang should be prevented from reinstating his edits. Wikitiki's edits to Module:th and Module:th-translit should be reinstated and then we should see which Thai entries, if any, display a problem with transliteration or transcription. --Dan Polansky (talk) 20:47, 5 June 2016 (UTC)
Not really. Wyang added the code in February, and CodeCat removed it in May as part of an extensive rework of the module. Wyang was just restoring it under the assumption that it had been removed accidentally. Thai editors have been basing their edits for three months on the presence of that feature. Wyang made his June edit only because Thai editors were complaining about it not working any more. Chuck Entz (talk) 21:21, 5 June 2016 (UTC)
Wyang's February edit cannot be traced to a discussion showing consensus, AFAIK. The edit is now challenged. The status quo ante is the status before the challenged edit. Three months have elapsed between the edit and its challenge, probably because the challenging editor did not notice the edit earlier. Now as before, I propose that CodeCat and Wikitiki edits are reinstated, and that specific problems in Thai entries that are a result of that are clearly stated, including stating at least one Thai entry that has the problem. --Dan Polansky (talk) 21:32, 5 June 2016 (UTC)
The word "consensus" is thrown around here too much. —Aryamanarora (मुझसे बात करो) 23:55, 9 June 2016 (UTC)

Languages of Sweden[edit]

The fact that Elfdalian now has an official ISO-639 code reminds me that we have several pages, at least in the Reconstruction: namespace, on which the language names Westrobothnian, Jamtish, and Scanian are used. These languages have neither ISO-639 codes nor Wiktionary-specific ad-hoc codes. What do we want to do with them? Should we make ad-hoc codes (e.g. gmq-vas, gmq-jmk, and gmq-scy) for them? Shall we consider them Regional Swedish dialects? —Aɴɢʀ (talk) 13:13, 4 June 2016 (UTC)

I think Scanian is a dialect rather than a language, I'm not sure about the others. DonnanZ (talk) 13:17, 4 June 2016 (UTC)
I don't agree, at least not the historical Scanian language. Indeed, Scanian has recently been under heavy influence from Standard Swedish and most Scanians today speak the Scanian variety of Standard Swedish due to recent language standardization in Sweden. But Genuine Scanian had it's own grammar, sound developments, own vocabulary etc., differing well from Standard Swedish, see for yourself at [2]. Same situation with Jamtish and Westrobothnian.-- 13:27, 4 June 2016 (UTC)
In addition, we list Gutnish (in a similar situation) as a separate language, even though it has come under heavy influence from Standard Swedish and is slowly dying out, in the meanwhile there are projects to revive it ([3]). Furthermore, Swedish Wiktionary uses gmq-bot for Westrobothnian. -- 13:37, 4 June 2016 (UTC)
gmq-bot is fine with me. I only suggested gmq-vas because Linguist List's ad hoc code is swe-vas. Are these three lects as different from Standard Swedish as Elfdalian and Gutnish are? If so, then I'm for giving them their own codes. —Aɴɢʀ (talk) 19:29, 4 June 2016 (UTC)
We've discussed this before. Can anyone come up with links to the previous discussions so we don't have to start from scratch? Chuck Entz (talk) 20:11, 4 June 2016 (UTC)
The discussion was at [4], but was left unresolved. -- 20:27, 4 June 2016 (UTC)
It looks like no one objected to giving all these languages their own codes; the discussion stalled over the truly trivial issue of whether or not to prefix the codes with gmq-. I don't care if we leave the prefix off, but I thought it would confuse the HTML if we did. —Aɴɢʀ (talk) 20:51, 4 June 2016 (UTC)
I've created gmq-bot, gmq-jmk, and gmq-scy, so entries can now be made for those languages, and links to them in the Reconstruction namespace can now use {{l}} instead of bare links. —Aɴɢʀ (talk) 12:35, 6 June 2016 (UTC)
P.S. I'm not touching Category:Scanian Swedish, because I'm not capable of saying what's Scanian language and what's Scanian dialect of Standard Swedish. I leave that for someone who knows these languages. —Aɴɢʀ (talk) 12:58, 6 June 2016 (UTC)
Great, thank you. I'll clean up the links. -- 18:09, 6 June 2016 (UTC)

Parameter in quotation templates for earliest attestation[edit]

Should we have a parameter in quotation templates for the earliest attestation that can be found? This is not the same as the earliest quotation that might be in the entry- this would specifically indicate that someone had searched for earlier quotations and found none. It would hopefully be a replacement for {{defdate}}, which I've always disliked since it gives no reference for its claim. It could also categorize by century or by a more granular period of time. DTLHS (talk) 21:36, 4 June 2016 (UTC)

what does eminant boot agreement mean[edit]

what does eminant boot agreement mean

For BOOT, see [5]. The "eminent" might have something to do with eminent domain...? Equinox 08:53, 5 June 2016 (UTC)

Proposal: Desysopping of User:CodeCat[edit]

Reason: Abuse of admin rights – misusing her admin power to block the other party of a personal dispute. Block log: [6]. Wyang (talk) 13:28, 5 June 2016 (UTC)

I blocked you to put an end to the continuous edits which forced Wyang's point of view without a consensus for that view. We block other editors for such behaviour, so why not Wyang? —CodeCat 13:29, 5 June 2016 (UTC)
Well your edit simply removed thousands of correct Thai transliterations on Wiktionary and caused uproar among our Thai editors, which is why it was reverted. Repeated removal of any one of those thousands of transliterations is sufficient to warrant a block. Wyang (talk) 13:31, 5 June 2016 (UTC)
No it didn't. The edits you've been edit warring on for the past day did not break any entry. Please demonstrate that Wikitiki's edits, which you continued to revert, broke or removed thousands of transliterations. —CodeCat 13:34, 5 June 2016 (UTC)
I have once again reapplied Wikitiki's edits. Please show an entry that is currently broken. —CodeCat 13:35, 5 June 2016 (UTC)
Why have you undone Wikitiki's edits yet again? There is no consensus for having transliteration and transcription separate. You should wait for the poll to finish. —CodeCat 13:37, 5 June 2016 (UTC)
I ask that Wikitiki's edits be restored until 1. it is established that a consensus exists for separating transliteration from transcription, or 2. it is established that Wikitiki's edits break anything. —CodeCat 13:39, 5 June 2016 (UTC)
Nor is it appropriate or does it have consensus. You seem to be in denial of your repeated vandalism – let me refresh your memory: diff, diff, diff, diff. These are the first four of your edits - did they remove useful content en masse? Wyang (talk) 13:40, 5 June 2016 (UTC)
Wikitiki also made that same edit diff, so should he also be blocked? Wiktiki in fact made additional edits to fix the problems caused by this edit, and you then reverted his edits too. —CodeCat 13:43, 5 June 2016 (UTC)
Circumventing the question huh? Did your edits repeatedly remove useful content en masse? Wyang (talk) 13:46, 5 June 2016 (UTC)
No, they did not, once Wikitiki had provided an appropriate fix. Which you then reverted. So again, please demonstrate that Wikitiki's trio of edits to Module:links, Module:th and Module:th-translit broke something, and that it is therefore warranted to desysop me for restoring those edits. You have yet to show even a single entry that was broken by it, yet you continue to revert these edits. —CodeCat 13:47, 5 June 2016 (UTC)
Go to the time points (1) 12:56, 4 June 2016; (2) 13:34, 4 June 2016; (3) 02:22, 4 June 2016 and (4) 01:01, 4 June 2016. Preview the page พลเรือน. Were the Thai romanisations there? Wyang (talk) 13:51, 5 June 2016 (UTC)
Please stop dodging the question. Did Wikitiki's trio of edits break any entries? Please restore his edits and then show us a broken entry. If you can't demonstrate that his edits broke an entry, how can you ask me to be desysopped for restoring them? —CodeCat 13:53, 5 June 2016 (UTC)
Looks like you are unable to answer my question. You did not restore his edits. You restored your edit, which wiped out thousands of Thai transliterations. Wyang (talk) 13:57, 5 June 2016 (UTC)
For the past day, you have been reverting those three edits Wikitiki made, one of which included the edit I also made. I have been trying to restore those edits because there is no consensus for your views and no evidence that those three edits break anything. —CodeCat 13:59, 5 June 2016 (UTC)

Your continued edit warring shows a severe lack of professionalism and responsibility. You both are perfectly aware that edit warring warrants an admin stepping in if the users can't get a hold of themselves. You both seem to be admins and abuse your positions to keep ranting where other users would long have been shut up. (Read: Prevented from editing the entry in question.)
You both continuously accuse the other of having no consensus, but your endless bickering makes it harder and harder for people to get an overview over the situation, and thus makes it more and more difficult for the community to actually reach a consensus. Please keep your hands still for a while so that the rest of the community, or at least those parts who understand the techno babble, can actually debate this matter. Korn [kʰũːɘ̃n] (talk) 15:28, 5 June 2016 (UTC)

  • +1. I can't even figure out what the primary point of contention is. (I agree very strongly with Dixtosa's point above that no module invoked in the mainspace should ever take content from the entry and parse it, though. Seriously, the devs are going to regret ever giving us Lua if we go in that direction.) Can someone please explain the difference between transliteration and transcription, and where they're each used in entries? --Yair rand (talk) 20:38, 5 June 2016 (UTC)
Whether we want to allow modules invoked from the main namespace to parse other entries should be a separate discussion, if anyone wants to start it. I believe the Chinese modules extensively use this paradigm. DTLHS (talk) 21:45, 5 June 2016 (UTC)
The distinction which seems to be being made by those who are making a distinction is : transliteration takes a set of characters and renders them letter-for-letter into another script (in this case, the Latin script), whereas transcription renders the word itself into another script; the difference being that e.g. cannot be 'transliterated' per se, but it can be transcribed (as dòu, IPA: /toʊ̯˥˩/), and that if e.g. พล is transliterated, it is pol, but if it is transcribed, it is pon (in IPA it is /pʰon˧/). In practice, the argument here seems to be (1) not over which of these systems should be used (since I haven't actually seen someone suggest that พล should be rendered pol), but over which word should be used, and (2) not over whether or not a module should parse a page, but over which module should host the code. - -sche (discuss) 21:01, 5 June 2016 (UTC)

Module:links is protected so that only administrators can edit it; this prevents non-admins from editing or edit-warring over it, and it means the edit-war between admins User:CodeCat and User:Wyang is a wheel war. If the two of you continue to wheel-war, I will ask a bureaucrat such as User:Chuck Entz or a global 'crat to make emergency and hopefully temporary desysoppings to stop the war. - -sche (discuss) 21:14, 5 June 2016 (UTC)

I was already considering doing so, but I've been hoping they would start acting like adults without being forced to. Unfortunately, the action has been taking place while I've been offline (I do sleep, occasionally), so I'm left to wonder whether it's over or it's just waiting to flare up again when both are back online. Chuck Entz (talk) 21:36, 5 June 2016 (UTC)
  • A proposal for desysopping amounts to harassment, in my opinion. DonnanZ (talk) 22:04, 5 June 2016 (UTC)
    Preventing such proposals would seem to be creating an untouchable ruling elite... Equinox 22:21, 5 June 2016 (UTC)
Yes, but no one seems to be backing the proposal, so it's not the brightest of ideas, just a desperate measure. DonnanZ (talk) 22:37, 5 June 2016 (UTC)
  • Each party has suggested the other's desysopping (above at at [7]) — and given that both parties are wheel-warring using admin tools/privileges, and that one blocked the other while edit-warring with him (as noted above), following both proposals and emergency-desysopping both may be in order if the warring continues. - -sche (discuss) 22:40, 5 June 2016 (UTC)
  • So blocking the other side of the argument is completely justified and one should not lodge a complaint after such abuse of rights? Ridiculous. Very disappointed in the Wiktionary community; seems to be a place for admin bullies who wilfully block others and maintain their modules without the slightest consideration of the consequences. Will greatly reduce the amount of time spent here. Considering quitting. Wyang (talk) 23:14, 5 June 2016 (UTC)
No one is excusing CodeCat's behavior, but de-sysopping is a very serious step, and one best not considered in the midst of a dispute, unless circumstances demand it. Chuck Entz (talk) 03:37, 6 June 2016 (UTC)

Thai Transliteration Debate Explained (I think)[edit]

This all revolves around what Latin text should be used to represent the letters of the Thai script when templates link to a Thai entry. The Thai script is mostly phonemic, but there are exceptions where the same letters can be read as different sounds, depending on the term. A true transliteration always represents the same letter or sequence of letters with the same Latin letter or sequence of letters, no matter how it's pronounced. A transcription represents the sounds of the text.

The transliteration can also be forced to be more like a transcription by using a respelling: a sequence of letters that can only be interpreted as the actual sounds of the term. That would be like spelling cathouse as "cat-houss" so the "th" doesn't get read as a digraph like it is in cathode and the "se" doesn't get read as a "z" like it is in "rouse". The template {{th-pron}} is used in Thai entries to display pronunciations, and the input often has to be respelled to get the right results.

The module that does the linking (Module:links) will show a transliteration for a term in a non-Latin script if we pass it as text using the |tr= parameter. If there's no |tr= parameter, it next checks whether there's a transliteration module listed for the language in our language data modules. If there is, it gets the transliteration from that module. Perhaps I should use quotes here, because we sometimes stray from transliteration to transcription when the sounds depart from the actual letters in odd or unexpected ways.

Thai has a transliteration module listed, (Module:th-translit), but this just calls the same module that {{th-pron}} uses(Module:th-pron) - the one that requires respelling to work right.

What happened[edit]

Back in February Wyang put code into Module:links that checked for Thai, then called a function in a different module than that used for the transliteration. This function basically checked if there was an entry for the term, and if there was, looked in the source of the entry for the {{th-pron}} wikicode. If it found the template, it took the template's (respelled) parameters and substituted them for the the actual spelling of the entry name, then called the same module that the transliteration module did. Whatever the module returned was returned in turn to Module:links (sorry), which used that instead of calling the regular transliteration module.

Nobody but the Thai editors noticed this for 3 months, until, at the end of May, CodeCat reworked that part of the module and, in the process, removed Wyang's code- perhaps without realizing it had been there. Thai editors asked Wyang why the link transliterations weren't working right anymore, so he put his code back in to fix the problem.

This time, CodeCat noticed the code and couldn't immediately figure out what it did, so she left a message on Wyang's talk page. In the meanwhile she reverted Wyang's edit. Soon after that, Wikitiki89 came up with a compromise that incorporated Wyang's code from Module:links into the Thai transliteration module.

When Wyang responded to the comments on his talk page 11 hours later, he explained his code and the rationale for it in detail, and expressed his annoyance at CodeCat's reverting his edit before finding out what it did.

Having explained himself, he went back and reverted CodeCat's revert to reinstate his edit.

CodeCat then responded by explaining on Wyang's talk page why she thought it was a bad idea to put custom code in Module:links, but then went on to say that the problem was all due to deficiencies in the transliteration module and tell him that his code wouldn't be allowed back until she was convinced it was necessary. She then reverted his revert of her revert of his edit.

If you don't already have a headache from this- it gets worse. They then proceeded to revert-war back and forth, stopping every once in a while to argue and denounce each other angrily (see above). Then CodeCat blocked him for edit-warring- which accomplished nothing, since he immediately unblocked himself. Then Wyang called for CodeCat to be de-sysopped, and CodeCat called for Wyang to be de-sysopped.

The issues[edit]

Filtering out the misunderstandings and trash talk, here's what I see the basic core arguments are (my formulation, not theirs):

  1. A general-purpose, high-traffic module like Module:links shouldn't have special cases hardwired into it- language-specific code should go in the language-specific modules.
  2. The transliteration modules aren't just for transliteration- they can provide transcriptions, if that's what's right for the language.
  1. Thai and other languages like it need special treatment, because they need transcriptions rather than transliterations
  2. The version of the modules that CodeCat keeps reverting to isn't the same as his version.
Concerns from others
  1. Modules getting data from entries is a very bad idea.

My 2 cents[edit]

I agree more with Wyang's view of the events, but agree more with CodeCat on the substance.

CodeCat was wrong to revert Wyang's edit without knowing what it did. Her response to Wyang was too confrontational and demanding. Her poll wasn't really an accurate reflection of what Wyang was asking for, and the block did nothing but make things worse- much worse. On top of that, her characterization of the dispute is rife with spin and trash talk.

Of course, once the revert-war started, Wyang was a full partner in the mudfight, so I'm not giving him a pass, either.

I think the place to deal with Thai's peculiarities is in the Thai transliteration module, not in Module:links. Is there any module other than Module:links that gets the name of the transliteration module from our language data modules (in this case Module:languages/data2)? If not, we should take the function called by Wyang's code (Module:th.getTranslit) and use it as the basis for the transliteration module that Module:links calls (basically what Wikitiki89 did).

Except... I'm not qualified to say much about the concerns expressed over going to other entries to get data. After thinking about it, I can see why Wyang felt he needed to do it: most people linking to Thai entries know nothing about respelling, so it's unrealistic to require passing it as a parameter, and creating a data module with all the terms needing respelling would be a monumental and possibly fruitless task. Still, I think the module should eliminate as much as possible of the straightforward stuff before resorting to such tactics, in order to keep them to an absolute minimum.

Sorry for the encyclopedic length of this, but I wanted to make sure I didn't miss anything. Chuck Entz (talk) 04:17, 6 June 2016 (UTC)

This is a fairly good summary of the past events. By looking at the Thai frequency list, I think it is safe to say that more than half of the 4000 most commonly used Thai words require some phonetic respelling. This number will only go up if we consider the entire set of Thai words, meaning that only relying on the Thai title linked to is quite hopeless at generating the correct transcription. So it boils down to the problem of whether to analyse the link destination to extract the correct pronunciation, or make it compulsory to supply the romanisation every time. I'm highly biased towards the former as I think page parsing is the best functionality on Wiktionary, and I would imagine the natIve Thai editors to be not very welcoming to the idea of the latter either.
Regarding transliteration vs transcription, this is an issue that extends to many languages beyond Thai. Tibetan and Burmese are good examples that come to mind. I wrote Module:bo-translit (Tibetan) and Module:my-translit (Burmese) a while back, which form the backend for the Wiktionary transliterations of these two languages. The schemes used are the Wylie transliteration and MLCTS schemes respectively, both of which are transliteration schemes, and transliterated outputs of Tibetan and Burmese texts from these schemes have been used wherever the native script appears, whether it be in a Tibetan or Burmese language entry, in the etymologies of other languages or in translation sections.
The universal use of these transliteration schemes is confusing to many unfamiliar with the languages, especially casual visitors to the site. Consequently, there should be additional transcription modules developed for the two languages, used to generate the appropriate romanisation in some circumstances on Wiktionary. The most important circumstance under which transcriptions are desired is probably in translation sections. At the moment someone looking to say "eight" in Tibetan would be absolutely clueless when the person saw the following result on the page eight:
བརྒྱད ‎(brgyad)
Same with someone trying to say "long" in Burmese:
ရှည် ‎(hrany)
The pronunciations of these two words are /cɛʔ˩˧˨/ (Transcription: gyaew) and /ʃè/ (Transcription: she), which the person reading the pages eight and long would not have guessed if (s)he only stayed on those pages. For other circumstances, such as ordinary inter-entry linking, the use of a transliteration method of romanisation is probably better (especially in etymologies), although the decision is to be made by all active editors. The realisation that romanisations used in translation sections should resemble the pronunciation as much as possible has been present on Wiktionary. Compare the Wikitext in the Russian translation of catheter:
This is despite the fact that there is a Russian transliteration module on Wiktionary, which in this case would generate a correct transliteration but an incorrect transcription outcome. On a whole, the distinction between transliteration/transcription in Western languages is very minor compared to languages of the East, for which no infrastructure for this distinction is provided on Wiktionary at the moment. This is how Module:languages/data2 appears currently:
m["tt"] = {
	canonicalName = "Tatar",
	scripts = {"Cyrl", "Latn", "Arab", "tt-Arab"},
	family = "trk-kip",
	translit_module = "tt-translit",
This works well with alphabetic languages. For many languages of the East, the section should be more detailed:
m["bo"] = {
	canonicalName = "Tibetan",
	scripts = {"Tibt"},
	family = "tbq",
	ancestors = {"xct"},
	translit_module = "bo-translit",
	transcript_module = "bo-...",
	transcript_in_links = false, --optional
	transcript_in_translations = true,
This is the reason I regarded this problem as a lack of support from the central modules, and did not consider changing Module:th-translit into a transcription module as an appropriate way to tackle this. Wyang (talk) 08:36, 6 June 2016 (UTC)
@Wyang: One thing I'm confused about, is if you are planning to use the transcription instead of the transliteration, why do you need a transliteration module? --WikiTiki89 18:21, 6 June 2016 (UTC)
Different languages have different uses of transliteration modules. For Thai, editors have agreed on the use of transcriptions in translation sections and in normal links, although transliteration may be the better option of romanisation of Thai terms in etymologies of other languages, when the module calling Module:links is Module:etymology. For Tibetan and Burmese, transcription should be used in translations, whereas transliteration is the better mode of romanisation in generic links, as there is good one-to-one script correspondence and makes etymologies much more apparent. The modules should be kept and named accordingly for languages where the distinction is important on a romanisation level. Wyang (talk) 00:47, 7 June 2016 (UTC)
@Wyang: Ok, now I understand better what your intentions are. However, I don't think it's a good idea to use different transliteration/transcription systems in different places. This is something the Wiktionary community should agree on as a whole, and not just the Thai editors (and the Tibetan and Burmese editors). The other issue is that parsing a linked-to entry to determine the word's phonetic transcription is a really bad idea for a number of reasons that have already been pointed out in the above discussions. What would be wrong with manually supplying these transcriptions? You can even add the manual transcriptions with a bot, which is similar to what User:Benwing2 did for Russian accent marks. Changing the logic of Module:links is not the right solution to either of these problems. --WikiTiki89 14:21, 7 June 2016 (UTC)
From the experience with parsing in the past one and a half years, I would say that the associated harm is very minimal and benefits are extensive. This is somewhat similar to the case of the deletion of Template talk:str index (used in py-to-ipa then) that I contested about five years ago, well before the advent of this Lua system, and the difference is that the benefit-to-harm ratio in this case is even higher. People were not even that warm to the idea of automatic transliteration back then. The earliest and most important use of parsing is in {{zh-forms}}, and it has resulted in dramatic changes in the way that Chinese entries are formatted. Code is much more succinct, and as a consequence efficiency and productivity have exponentially increased (examples of use: 安眠藥, 暗物質, 報酬遞減定律).
Tools should only be used in situations where they must be. In the case of parsing for transcriptions, it is irrelevant to most of the languages hosted on Wiktionary and therefore most editors on the site. Most people have no experience and will have no experience with this. People tend to show aversion to the unfamiliar, and when the aversive mentality is voiced collectively by similar-minded peers, the disinclination is irrationally amplified and may as well convincingly mask the reality, which may only be visible to those centrally involved. (This may well underlie some political phenomena and explain the difficulty experienced with the Chinese entry format change here.) I would be arguing that new technology should be actively embraced and not feared (Wikipedia:Don't worry about performance). Likewise, transcription should be achieved automatically and people/bots should not have been manually supplying the transcriptions since the infrastructure is fully functional with no demonstrated risks. Even if there are, the focus should be on how to solve it, not on how to disable it.
With regard to the partial change to transcriptive romanisation, I argued for what I consider as appropriate for Tibetan and Burmese and would be happy to hear about other ideas. On a historical note, before the creation of Module:my-translit, most formatted Burmese entries were using the BGN/PCGN system for romanisation, which is a transcription system, and the change to a transliteration system (MLCTS) occurred due to the higher success rate of automation of the latter, which allowed a much wider coverage of romanisation for the Burmese content. It is a decision to be made by Burmese-language editors collectively, and people should have the freedom to choose a practice of romanisation that is most appropriate for the language, with modules using the two modes (transliteration and transcription) of romanisation for this language already recorded in the backend database, and infrastructure in place for determining which system should be used where. For instance, if Burmese uses transcription in links I would still suggest that any calls to Module:links by Module:etymology use the Burmese transliteration module to generate romanisations, as Burmese transcriptions are much less informative for this purpose. Wyang (talk) 08:53, 8 June 2016 (UTC)
You make some good points. I'll need to think about this for a bit. But also note that {{Wikipedia:Don't worry about performance}} does not apply here. The page states "You, as a user, should not worry about site performance. In most cases, there is little you can do to appreciably speed up or slow down the site's servers. The software is, on the whole, designed to prohibit users' actions from slowing it down much." But the concern is not slowing down site performance, but that since the site's performance is protected by time and memory limits, we have frequently seen on Wiktionary these limits being reached and producing errors. Thus, performance is still an issue, even though its consequences do not affect the site's performance overall. --WikiTiki89 14:40, 8 June 2016 (UTC)
So, what happens now? Can we please get rid of the Thai code from Module:links now, or do we need some more edit warring? —CodeCat 12:06, 11 June 2016 (UTC)
Do you have any constructive suggestions? DCDuring TALK 14:25, 11 June 2016 (UTC)
Reinstate Wikitiki's original 3 edits and be done with it. —CodeCat 15:30, 11 June 2016 (UTC)
I not that Wikitiki's comment of three days ago made it seem that he hadn't come to that final conclusion. DCDuring TALK 00:21, 12 June 2016 (UTC)
  • User:Chuck Entz has described the situation very well. User:Wyang has created a working code for Thai transliterations/transcriptions and character sequencing. It is another commendable achievement of his. Few people attempted to work with scripts of such complexity as Thai. The majority of developers think that Thai is simply not transliteratable, even the phonetic respelling. User:CodeCat has broken the code for the reasons she mentioned. So, Thai transliteration modules stopped working and no alternative was offered. Thai editors were left wondering what was going on. User:Wikitiki89 has provided a workaround (later). I don't really know if it's a good fix. it should, of course, be considered but Wikitiki89 is not sure himself. There could be other solutions for many solutions but breaking an existing code without really offering a working solution is wrong. It seems CodeCat simply doesn't care about thousands of Thai entries, translations, editors and tremendous work put into this. I fully understand Wyang's frustration. I hope this conflict will be resolved peacefully. I don't want anyone desysopped but I encourage more consideration of other people's work. I'll leave the final technical solution to the people who understand it better. I don't see a huge reason for Module:links not to take some of the work (language-specific customisations) and/or accommodate handling of complex scripts with various levels of possible transliteration/transcription. For example, we capitalise transliterations of Korean proper nouns with a symbol "^" using the module.
  • As for the transliteration/transcription for Thai - a graphical (literal) transliteration for the Thai script is not used anywhere, no Thai dictionary uses non-phonetic transliteration, it would produce nonsensical garbage, even for many words with regular or predictable spellings, just like many English words would if they were transliterated graphically into another script, e.g. "light" (l-i-g-h-t) - Cyrillic лигхт ‎(ligxt). A phonetic Thai transliteration is not only popular but it's also standard. There are various Thai transliteration standards but none of them is graphical (showing sequence of symbols). A graphical spelling can also be provided, please see กรรเชียง ‎(gan-chiiang), which shows the actual orthography (including the phonetic respelling of the term - "กัน-เชียง). The one adopted here is based on Paiboon publisher of dictionaries, phrasebooks and textbooks. Royal Thai General System of Transcription is also phonetic but not very useful for learners - no tones, no long vowels, etc. --Anatoli T. (обсудить/вклад) 04:27, 14 June 2016 (UTC)

Google Scholar[edit]

Can we use Google Scholar for attestation? --Daniel Carrero (talk) 05:16, 7 June 2016 (UTC)

We can use Google Scholar to locate permanently archived journal articles, so I'd say yes. —Aɴɢʀ (talk) 07:28, 7 June 2016 (UTC)
We have traditionally counted it at RFV. —Μετάknowledgediscuss/deeds 08:13, 7 June 2016 (UTC)

Case order in German declension tables (others too probably)[edit]

German declension tables are vertically split by case. The cases are ordered nominative, genitive, dative, accusative. This makes no sense to me! It would be better if it was nom, acc, dat, gen:

  1. Conceptually, nominative and accusative are the most fundamental, and then dative is a variation on accusative. Genitive is then its own thing.
  2. The forms of practically everything (articles, adjective declension etc) tend to match in either nom+acc or acc+dat, and sometimes dat+gen. This ordering would place them next to each other.

A similar but more minor thing occurs with gender: it's ordered MFN, when usually the masculine and neuter forms are more similar, or sometimes F+N, but rarely M+F.

Why is it in this order? Would people support it being changed? Issues with this I'm imagining:

  1. There's some (stupid!) tradition that it's written in this order.
  2. It'd have to be changed across all languages or none.

This is how it would look the way I'm suggesting.

Fedjmike (talk) 07:44, 7 June 2016 (UTC)

You seem to think traditions are stupid. We have to cleave closely to traditions to be taken seriously as a scholarly work. Admittedly, some German grammarians do have a different order, but I would say that the one we use is probably the most traditional. Changing things up because you like them better is not a convincing argument. —Μετάknowledgediscuss/deeds 08:12, 7 June 2016 (UTC)
Yeah, guilty as charged wrt tradition. But I'm not saying change it because I don't like it, I gave what I think are good reasons for that order. Which sources use the current order, and why? I'd like to at least read about it and understand why they use it. I'm not sure I understand your argument about needing to match tradition; whose approval is Wiktionary trying to get, and why would it matter to them if it were to use a less conventional ordering of cases in tables? Fedjmike (talk) 08:43, 7 June 2016 (UTC)
Switching to nom-acc-dat-gen order has been proposed before a number of times. I am in favor of it. - -sche (discuss) 08:29, 7 June 2016 (UTC)
As am I. Leasnam (talk) 00:02, 10 June 2016 (UTC)
I don't really care which order the cases are in as long as nominative is first, but the advantage to sticking with tradition is that it's what readers will expect. I would be thrown off by an adjective declension table that put the gender columns in the order masculine-neuter-feminine, because over the years I have come to always expect masculine-feminine-neuter, and not just for German but for all languages with those three genders. I have no doubt we would get a lot more complaints about a declension table that put neuter between masculine and feminine than we get about the current order. —Aɴɢʀ (talk) 09:10, 7 June 2016 (UTC)
  • As someone who favours monolithic integrated tables over clear but repetitive tables, I'm also in favour of ordering the tables so that the number of cells is as small as possible. As such I'm giving strong support for NADG and having n/m and f/p next to each other. Korn [kʰũːɘ̃n] (talk) 09:18, 7 June 2016 (UTC)
  • Wikipedia uses the order NAGD (en and de, as well as fr.wikt). However de.wikt uses NGDA, and fr.wikipedia NADG. I am personally more familiar with NADG (I learned German in a French school). All that to say that the order of German declension seems to be far from being cast in stone, so we may as well choose the one that makes the most sense to learn the language. — Dakdada 11:11, 7 June 2016 (UTC)
  • FWIW, my German learning books mostly use NADG (presumably since that's the order that learners come across them). It depends whether we want to go for the scholarly one or the German-as-a-second-language one. Smurrayinchester (talk) 14:04, 7 June 2016 (UTC)
Awhile ago I proposed using NADG. This is what I find in my German books and it definitely makes the most sense to me. The NGDA order is only done in imitation of Latin. Perhaps this should be voted on. Benwing2 (talk) 01:28, 8 June 2016 (UTC)
For Slovene, the common order is also NGDA but we use NAGD here. For old Germanic languages we seem to use NAGD order, while for modern Icelandic and Faroese we use NADG. I personally find NGDA order to be really annoying and counterintuitive (given that nominative and accusative are the most common cases and often identical) and would favour abandoning it for all IE languages, Latin included. —CodeCat 12:27, 8 June 2016 (UTC)

What's the difference between a journal and a magazine?[edit]

We have both {{quote-journal}} and {{quote-magazine}}, with identical parameters. Could we combine these into {{quote-periodical}}? Is there a reason to distinguish journals and magazines, and if so what criteria could be used? DTLHS (talk) 23:59, 8 June 2016 (UTC)

Hmm. To me, a magazine is usually a mainstream popular publication you can find in shops, while a journal (unless we're talking about a personal diary) is usually an academic thing that gets published in volumes and issues. If you look at the APA academic style for citing the two things, there isn't much difference apart from the fact that journals come out in volumes and issues. They don't even require the publisher and city for either of them, despite requiring it for books. Equinox 00:07, 9 June 2016 (UTC)
Periodicals Agreed that the difference is mostly popular perception and occasionally a title will cross over, such as National Geographic which is certainly scholarly but also available in popular locations such as bookstores and dentists' offices. There's no particular reason to have separate templates and certainly many popular magazines have "volumes" and "issues" amongst those volumes. I agree with rolling them into one and having the other two templates redirect to it. —Justin (koavf)TCM 00:12, 9 June 2016 (UTC)
Okay. Magazines are more a subset of journal than vice versa (I think?), so shall we propose that we keep quote-journal (with volume/issue optional, since some magazines only have a month&year) and drop quote-magazine as redundant? Equinox 00:21, 9 June 2016 (UTC)
An even better idea: call it quote-periodical because "magazines are journals" is open to some debate but "magazines and journals are both periodicals" is not. Equinox 00:22, 9 June 2016 (UTC)
@Smuconlaw Do you have any input here? DTLHS (talk) 02:30, 10 June 2016 (UTC)
Actually, the primary template is {{quote-journal}}; {{quote-magazine}} and {{quote-news}} are just redirects to it. I suppose we used "quote-journal" by analogy to "cite journal" at the English Wikipedia. (According to the OED, a journal is "[a] daily newspaper or other publication; hence, by extension, Any periodical publication containing news or dealing with matters of current interest in any particular sphere", while a magazine is "[a] periodical publication containing articles by various writers; esp. one with stories, articles on general subjects, etc., and illustrated with pictures, or a similar publication prepared for a special-interest readership". A usage note adds: "The use of the word (rather than periodical) typically indicates that the intended audience is not specifically academic.") — SMUconlaw (talk) 02:42, 10 June 2016 (UTC)
Periodical seems the most generic of the candidates and therefore seems the least confusing for new users. But the redirects solve most practical problems. It is only when reading documentation that a user is likely to notice what the "real" template is. DCDuring TALK 10:42, 10 June 2016 (UTC)
I should also add that the template accepts the parameters |journal=, |magazine=, |newspaper=, |periodical= and |work=. — SMUconlaw (talk) 17:39, 10 June 2016 (UTC)
That's handy. But users might expect there to be a parallel in name between the template they want and {{quote-books}}. It wouldn't much inconvenience us to have a few redirects to {{periodical}}, would it? DCDuring TALK 17:59, 10 June 2016 (UTC)
We could create {{quote-periodical}} as a redirect to {{quote-journal}}. It may be a good idea to retain {{quote-journal}} as the primary template for consistency with other Wikimedia projects, as I suspect that many editors work on multiple projects. — SMUconlaw (talk) 00:04, 11 June 2016 (UTC)

{{hu-verb}} - no links in multi-word entries[edit]

Even though {{hu-verb}} is connected to {{head}}, it does not create links for each member of a multi-word entry. I can't figure out why. Can someone please help? It contains only a single line. Thanks. --Panda10 (talk) 12:47, 9 June 2016 (UTC)

Pagename is automatically treated as the argument in |head=. It should be fixed now. Wyang (talk) 12:58, 9 June 2016 (UTC)
Thanks so much! :) --Panda10 (talk) 13:04, 9 June 2016 (UTC)

Phrasebook vs. phrases categories[edit]

Is there a way to place phrasebook expressions/sentences only to the phrasebook category and remove them from the phrases category? In the past, I tried to solve this by using {{head|hu|phrasebook}}, but that was changed by a bot to {{head|hu|phrase}}, so it seems this is not accepted. Is there another way? The phrases category is cluttered up with sentences that really belong to the phrasebook only. Thanks. --Panda10 (talk) 15:57, 9 June 2016 (UTC)

Actually, Category:English phrases has 1,776 entries and Category:English phrasebook has 358 entries. Removing all phrasebook entries from the phrases category would mean a change of 20%. Just my opinion: I don't think the phrases category is too cluttered by phrasebook entries, and I don't think it would be much more improved by that change of 20% to justify the work to do it.
If we had some sort of distinction between "phrases" and "phrasebook", a few examples like how are you and good night would still fit both categories; and hello is both an interjection and part of the phrasebook. (currently, the POS header of good morning is Interjection and that of good night and good afternoon is Phrase, and that of good evening is Noun). --Daniel Carrero (talk) 16:20, 9 June 2016 (UTC)
I see your point. However, the percentage will be different for every language. Also, the 20% for English is true today, but may change in the future. I would still be interested to find out if there is a way to do this within the policies of this wiki. --Panda10 (talk) 16:45, 9 June 2016 (UTC)
More to the point, {{head}} and other headword templates categorise entries by part of speech. "Phrasebook" is certainly not a part of speech. —CodeCat 16:46, 9 June 2016 (UTC)
@CodeCat: Are you saying that phrase is a part of speech? --Panda10 (talk) 11:54, 10 June 2016 (UTC)
Many of our multi-word expressions are not phrases and not constituents. They are sometimes designed to simply be the target of a long list of redirects or to appear at the top of a no-entry search list. Because we never have an explicit "not elsewhere classified/categorized" category, inevitably some category or categories becomes the junk-catching category. In English grammar, "adverb" has long been one such. For us, "interjection" and "phrase" serve similar functions.
"Interjection" is a misnomer as we apply it. How does hello fit our definition of interjection in most of its normal uses? Collins uses "sentence substitute" (read "prosentence" if you'd prefer a technical word) for hello for example.
"Phrase" would benefit from a similar kind of split into one or more categories, though "phrasebook" is not any kind of grammatical category and would probably not be part of a long-term solution. DCDuring TALK 22:44, 9 June 2016 (UTC)

bot status vote 2[edit]

Planned, running, and recent votes [edit this list]
Ends Title Status/Votes
Jun 17 User:Whymbot for bot status passed
Jun 27 New logo 2 Symbol support vote.svg16 Symbol oppose vote.svg7 Symbol abstain vote.svg6
Jun 30 User:OrphicBot for bot status Symbol support vote.svg4 Symbol oppose vote.svg1 Symbol abstain vote.svg0
Jul 6 Spaces in links Symbol support vote.svg9 Symbol oppose vote.svg0 Symbol abstain vote.svg0
Jul 19 label → lb Symbol support vote.svg6 Symbol oppose vote.svg1 Symbol abstain vote.svg1

Some are asking that the vote on User:RobokoBot be closed out. —Stephen (Talk) 12:47, 10 June 2016 (UTC)

Yes check.svg Done --Daniel Carrero (talk) 15:56, 11 June 2016 (UTC)

Category:Form-of templates[edit]

It says that the templates are used in the definition line, but some of the templates can be (and, sometimes, can ONLY be) used in the etymology section. This may happen when the etymology word does not mean the same as the derived word rendering the link to etym word useless. Am I right? So, I guess we need to add a new parameter to all of the templates to inform them if they are used in the etymology section. Also if this change is implemented the templates should be aware that the word it comes from may be a different language (like {{compound}}). For instance, English lb is an abbreviation of Latin libra. --Dixtosa (talk) 13:43, 11 June 2016 (UTC)

Just found that there are two versions of the templates already for Russian. {{ru-etym abbrev of}}, {{ru-etym acronym of}}, {{ru-etym initialism of}},* {{ru-abbrev of}},* {{ru-acronym of}},* {{ru-clipping of}},* {{ru-initialism of}}--Giorgi Eufshi (talk) 06:30, 22 June 2016 (UTC)

Conversation about origin of words[edit]

In conversation, when someone says "You're using that word wrong. When the word was first used, the meaning was different." Or: "That word came from (Ancient) Greek and the Greeks used it to mean something else. Therefore we should all use the original (?) meaning invented by the Greeks." What would you say to that person? --Daniel Carrero (talk) 22:24, 11 June 2016 (UTC)

Language changes. I would find some examples where the person, not being a linguist at all, didn't know about the change ("sad" is a good example), and ask them why they are not using the word in its original sense; or why they speak English at all, when it isn't the oldest language ever invented. Equinox 22:26, 11 June 2016 (UTC)
That they are falling for the w:etymological fallacy. Enosh (talk) 15:09, 12 June 2016 (UTC)
That sounds like a perfect Quora question. ("perfect" in a sense that it is perfectly characteristic to Quora). --Dixtosa (talk) 15:19, 12 June 2016 (UTC)
Folk etymology For that matter, a lot of folk etymologies are just wrong. You could ask the person, "If it turns out that the actual original meaning is [X] instead of [Y], then would you change your behavior...?" and the answer is no. —Justin (koavf)TCM 22:55, 12 June 2016 (UTC)

t:cite-meta author format[edit]

While discussing the new template {{R:M&A}}, I noticed the strange fact that we use semicolons to delimit the authors in {{cite-meta}} as opposed to the traditional “A, B, & C”. To that end I created a template {{format list}} which will take parameters and write them out in the normal list format (yes, it could be done more elegantly in Lua, but I wanted to conserve Lua runtime and memory for more important stuff). User:Smuconlaw and User:Isomorphyc then pointed out that there might be some concerns about changing the citation format, so I thought I'd ask here. —JohnC5 02:42, 14 June 2016 (UTC)

I cannot think of a conventional reason to use semicolons rather than commas; I do favour the change suggested by User:JohnC5, though since the template is very widely used, I thought it would be reasonable to ask around a bit first. Isomorphyc (talk) 02:56, 14 June 2016 (UTC)

Category:English merisms and Category:English lexical doublets[edit]

What is the difference between these two categories? Do we need both? Is the definition of merism "a pair of contrasting words" in linguistics? --Panda10 (talk) 13:05, 14 June 2016 (UTC)

Merisms seem, on the face of it, to be a subset of lexical doublets, just as reduplicated doublets are. DCDuring TALK 15:15, 14 June 2016 (UTC)

Wikimania 2016[edit]

Draft of a talk about Wiktionary at Wikimania.

Hi, English-speaking Wiktionarians!

Wikimania is an annual meeting to discuss global issues in the Wikiverse. This year Wikimania take place in Italy, June 22 to 26 and the programme is here. Three nice French contributors plan to be there to talk about Wiktionary! Yes, our little-known project by non-English speakers. Is it not intriguing?

We already mentioned here our proposal in January 2016 and we are now in the process of organizing our slides. We are not ready yet but we want to make the building of the talk as collaborative as our projects. So, feel free to have a look at it and point out every mistake in the language or part you want more details on.

+ we want to meet you guys! So, if some of you come to Wikimania, please come to our talk or to a meetup later on the evening! Come to have a glass of Italian wine with us and discuss our amazing projects! If you plan to travel to France, for instance to see a football game, tell us, we'll be glad to host you!Noé (talk) 13:55, 14 June 2016 (UTC)

When the quotations are in phonetic transcription[edit]

Do we have any established customs regarding what do with quotations that aren't written in a phonetic transcription rather than the usual orthography of the language in question? I have a book of Burmese proverbs that writes all the Burmese in transcription, not in Burmese orthography; likewise Die araner mundart has lots of usage examples for Irish written in phonetic transcription rather than conventional orthography. So far, I've just been putting these things in conventional orthography, but that goes against our usual custom of transcribing quotations exactly as they're written in the source. —Aɴɢʀ (talk) 17:35, 14 June 2016 (UTC)

The ideal would be to find a native Burmese/Irish source written in the native orthography and quote from it. --WikiTiki89 17:53, 14 June 2016 (UTC)
I do do that when possible for Irish, but when I'm working through Die araner mundart to make sure we have entries for all the words it lists, it's easier to give the same examples. Also, it's a good source for unalloyed dialectal Irish rather than standard "school" Irish. —Aɴɢʀ (talk) 18:12, 14 June 2016 (UTC)
  • I might leave a note that the orthography of the text doesn't match the standard one, and maybe give both for the Irish example. I found one such time I did that, a while back, at mo'ai. —Μετάknowledgediscuss/deeds 18:07, 14 June 2016 (UTC)
    The difference there is much smaller than what I'm talking about. At aithrí, for example, I just added the usex "Mara ndéanfaidh muid aithrí inár bpeacaí, tá muid ar fad caillte", but what the source actually says is "mar ə ńīnə myȷ æŕī ə n-r̥ bȧkī, tā myȷ əŕ fad kāĺcə". —Aɴɢʀ (talk) 18:12, 14 June 2016 (UTC)
Is it a quotation or a usage example? Shouldn't you cite the source if there is one? DTLHS (talk) 18:16, 14 June 2016 (UTC)
It's a quotation of a usage example. The book I'm using is a reference work about this dialect; volume II is the dictionary, which provides usage examples taken from the author's fieldwork among native speakers. They're sentences that he heard spoken while he was living among Irish speakers, so this book is the only form in which these sentences have been published. And rather than writing them in conventional orthography, he writes them in his own ad-hoc phonetic transcription. —Aɴɢʀ (talk) 18:23, 14 June 2016 (UTC)
I suppose I could format it as a quotation along the following lines:
  • 1899, Franz Nikolaus Finck, Die araner mundart, Marburg: Elwert’sche Verlagsbuchhandlung, vol. II, 28:
    mar ə ńīnə myȷ æŕī ə n-r̥ bȧkī, tā myȷ əŕ fad kāĺcə.
    conventional orthography: Mara ndéanfaidh muid aithrí inár bpeacaí, tá muid ar fad caillte.‎
    Unless we do penance for your sins, we are all lost.
That would make it clear, wouldn't it? —Aɴɢʀ (talk) 18:30, 14 June 2016 (UTC)
That looks good to me. Maybe you want to make a special quotation template for this if you're going to be citing it a lot. DTLHS (talk) 18:36, 14 June 2016 (UTC)
Yeah, I was thinking about doing that. —Aɴɢʀ (talk) 18:47, 14 June 2016 (UTC)

Abbreviations in etymologies[edit]

Sometimes I see etymologies with abbreviations. Example: ferruminate contains "ferruminatus, p.p. of ferruminare".

I don't remember if it was discussed before, but based on Wiktionary:Todo/unhelpful abbreviations, I suppose abbreviations like p.p., q.v., Gr., and so on are disallowed in etymologies. Am I right? --Daniel Carrero (talk) 21:50, 14 June 2016 (UTC)

Since we are not a print dictionary, we don't need to save space. We decided (although I don't know when or where) that it's better not to use abbreviations in etymologies because not everyone will know or be able to guess their meanings. --WikiTiki89 21:57, 14 June 2016 (UTC)

"vernacular" as a label for Russian прост.?[edit]

@Atitarev, Cinemantique, Wikitiki89, Wanjuscha, KoreanQuoter Russian-language dictionaries of Russian commonly use the label "прост." for a certain type of colloquial words; "прост." stands for просторе́чный ‎(prostoréčnyj) which literally means "vernacular" or "common speech". This is different from "разг." = разгово́рный ‎(razgovórnyj) = "colloquial". I gather that words labeled прост. are considered lower-register than those labeled разг. For awhile I labeled these words as "nonstandard" but this doesn't quite seem right, as "nonstandard" would suggest these words are somewhat proscribed, which I don't think is quite correct. My Russian-English dictionary labels both types of words as simply "colloquial". I've started to label the прост. words using {{lb|ru|vernacular}}, but this label doesn't currently exist so it doesn't usefully categorize them. Should we create a "vernacular" label that categorizes into e.g. CAT:Russian colloquialisms (as {{lb|ru|colloquial}} does) and also maybe into CAT:Russian vernacular speech or similar? If not, how should these be handled? Benwing2 (talk) 03:06, 15 June 2016 (UTC)

I don't think "vernacular" is the best word for it. I'm tempted to say that we should choose in each case between "regional", "dialectal", and "colloquial", whichever fits better. --WikiTiki89 14:51, 15 June 2016 (UTC)
There's no good equivalent of просторе́чие ‎(prostoréčije) in English. If you have to choose between "colloquial" and "non-standard" labels, "colloquial" is probably better, IMO but not in all cases and some would argue. --Anatoli T. (обсудить/вклад) 21:54, 15 June 2016 (UTC)
English also has informal as a label that covers broad range of registers, but excludes academic papers, legal and government documents, and similar. DCDuring TALK 23:17, 15 June 2016 (UTC)
It may have to do with perceptions of non-standard language in Russian and English. Non-standard language has always been discouraged and people who use it looked down on. Perhaps a good example is ложи́ть ‎(ložítʹ). If someone uses it without prefixes or reflexive suffixes may be immediately identified as "uneducated", unlike when someone in English saying "gonna", "gotta" or "he/she have" (I'm sure there are better examples). There's not always a clear boundary between colloquial and non-standard (like in other languages), e.g. туды́ ‎(tudý) (another example of "просторечие") is often used in joke. --Anatoli T. (обсудить/вклад) 23:51, 15 June 2016 (UTC)
I see.
Because we have the labels informal ("not suitable for formal speech or writing"), colloquial ("conversational"), non-standard ("scorned by many"), and dialectal ("acceptable only in a region or population") [glosses per my Wiktionary idiolect) available, we should probably avoid using senses of these that overlap with senses of the others. In particular, colloquial has a range of meanings, often overlapping with non-standard, but the "conversational" meaning is distinctive, IMO. DCDuring TALK 00:42, 16 June 2016 (UTC)
@Atitarev OK. I think your point about sounding uneducated is important. In English, educated people will use colloquial or slang speech in a sufficiently informal context but will generally avoid nonstandard speech except when used self-consciously for effect. For this reason we say words like "ain't" and "alls" ("alls you need to do ...") and "drug" (instead of "dragged") and "have dranken" (instead of "have drunk" or "have drank") are nonstandard. ложить sounds like an example of this. But I get the feeling most things that are просторечный are more like colloquialisms or slang. For this reason I'll use "colloquial" from now on for lack of anything better; I'd rather have some way of distinguishing разговорный from просторечный but there may be no help for this. Benwing2 (talk) 04:24, 16 June 2016 (UTC)

{{defdate}} and the Shorter Oxford English Dictionary[edit]

We must have thousands of entries that "reference" this dictionary (example abstain), just copying their dates of earliest attestation. This seems like a copyright violation, given that we are just copying their research across a large number of entries. Am I wrong? DTLHS (talk) 21:10, 15 June 2016 (UTC)

Copyright protects the expression of information, not the information itself. It would likely be a breach of copyright to copy a definition from the SOED word for word, but probably not if what is copied is a piece of information such as the date of earliest attestation. — SMUconlaw (talk) 22:16, 15 June 2016 (UTC)

putting an internal link in a translation of an example sentence[edit]

Are we allowed to put an internal link in a translation of an example sentence? See 行李 (baggage carousel) and 吹噓 (bragging rights). I think it's useful since it makes it clear that the translation is also a set term in English, but I thought I read somewhere that we shouldn't format it as such. ---> Tooironic (talk) 15:13, 17 June 2016 (UTC)

If you really want to, go ahead. I would only do this in very limited circumstances. --WikiTiki89 15:40, 17 June 2016 (UTC)

Sources for pronunciations of English words[edit]

I notice that lots and lots of English words are missing their pronunciations. I'm thinking of trying to write a bot to add them but I need a free source of pronunciations that contains enough detail to map to IPA. Anyone know of such a source? It's not obvious that websters1913.com will work; e.g. for Nation they show Na"tion and for National they show Na"tion*al; whatever those symbols mean they don't seem to indicate the vowel-quality difference in the a in the two words. Benwing2 (talk) 11:28, 18 June 2016 (UTC)

When to use references?[edit]

Should we include references only when a term is rare or disputed? Or should we use them whenever possible? The policy pages don't really say. Is "credibility" such an important factor for our goals? If so there are thousands and thousands of entries we could easily reference in English, Spanish, French, German, etc. Ultimateria (talk) 15:25, 18 June 2016 (UTC)

@Ultimateria: I think each page should have at least one external link to a good monolingual dictionary online as far as possible, or in case of English, OneLook would do. But that is not really a reference to boost credibility but rather an external link to provide further reading. I am sure readers are going to love having great sources one click way. That said, the reader may use the links for verification as well. We should not pester entry creators for failure to add external links; adding is up to the people who want to add them, or actually probably up to bots and similar scripted operations. One risk that I see for us adding external links is that it may confuse reader into thinking that we actually use these external links for verification when we in fact use attesting quotations. That is one of the reasons why I much prefer the "External links" header to "References" for the purpose.
Wiktionary:Votes/bt-2016-06/User:OrphicBot for bot status is currently running and proposes to bot-add certain external links to as many Latin and Ancient Greek entries as possible, which I welcome, despite opposing.
My user User:DPMaid was volume adding external links to multiple languages as documented on its talk page and no one objected so far. (I do not see why anyone would.) --Dan Polansky (talk) 07:56, 19 June 2016 (UTC)
Good point on the header, I'll start using "external links" and try to add them more consistently. Ultimateria (talk) 10:25, 19 June 2016 (UTC)
@Dan Polansky While the choice for the Korean dictionary is good, the Russian {{R:BTS}} is incorrect. It's not linked to Большой толковый словарь but to gramota.ru. Not useful at all. Please undo your edits in Russian entries. I didn't check your other templates. A good bilingual Russian dictionary is [8], if the term can be added to the URL. --Anatoli T. (обсудить/вклад) 12:50, 19 June 2016 (UTC)
@User:Atitarev: The {{R:BTS}} template uses "?bts=x" in the URL to specifically select Большой толковый словарь from all the dicts available at gramota.ru. If you following the link in one of the pages, e.g. from словарь, and then look at the right, you will see a list of checkboxes, and only the one for Большой толковый словарь will get checked. I really don't see how that is "not useful at all"; it cannot get much more useful than it is. Would it help if I add "at gramota.ru" to the text shown by the template, to warn the reader that the dictionary is hosted by the site in case the reader does not like the site or something? --Dan Polansky (talk) 13:02, 19 June 2016 (UTC)
As an alternative, when I was designing the template, I pondered creating "R:DGR" with the text "word in Russian dictionaries at gramota.ru". That would also show the reader some other dicts, like one featuring synonyms and one featuring antonyms. --Dan Polansky (talk) 13:04, 19 June 2016 (UTC)
I see now what you were trying to do. It didn't open correctly on a mobile version. While there's some value of gramota.ru for advanced learners or editors, it's better to use a bilingual or multilingual dictionary for a broader audience, IMO. --Anatoli T. (обсудить/вклад) 13:47, 19 June 2016 (UTC)
Monolingual dictionaries are the best ones as far as coverage, depth and unambiguity. They often contain example sentences, which bilingual ones usually do not do. Some argue that bilingual dictionaries are a really bad thing; while that seems to be overstated, many bilingual dictionaries indeed are severely limited, and lead to a lot of unnecessary misunderstanding on the part of their readers. Nonetheless, I do not object to adding some good bilingual dictionaries as external links. However, removing links to good monolingual dictionaries only because they are monolingual would be a real loss for the reader. Anyone seriously interested in a language should read its monolingual dictionaries. --Dan Polansky (talk) 14:09, 19 June 2016 (UTC)
@Anatoli T.: Mobile: I started my mobile device, went to словарь and followed the link. Indeed, the specific page for the word did not show, and instead, I landed at a page not specific to a word, offering "proverka slova", in the proper script, of course. I was trying to play with the URL on my desktop by placing "m." there, to emulate the mobile view, but the servers seem to redirect me to the desktop view. This will be a rather poor experience for the mobile users, and I do not know how to fix it. One thing the mobile user can do is enter the word again into the search field on gramota.ru and get to the sought dictionary. From my experience, sites that offer both mobile and desktop view usually provide some links at the bottom or the top to make it possible for me to switch between the mobile and desktop views regardless of the type of the actual device; I see no such link at gramota.ru. We may hope they will improve this at some point. --Dan Polansky (talk) 10:23, 20 June 2016 (UTC)
@Dan Polansky. It's okey, I guess, nothing can be done. Mobile users could click on "полная версия" hyperlink ("full version", i.e. desktop version) at the bottom right corner in gramota.ru and get the expected link. --Anatoli T. (обсудить/вклад) 12:40, 20 June 2016 (UTC)
  • Personally, if I was more strict with quality control, I would add an external link to the DRAE on every valid page in Spanish. In reality though, that's never gonna happen. Obviously though, I'd love other users to add references and external links all over the place. --Turnedlessef (talk) 10:38, 20 June 2016 (UTC)

Inconsistency in the treatment of comparatives[edit]


The Latin adjective melior is considered as a lemma, same for the French adjective meilleur, whereas the English adjective best is considered as a non-lemma form. Why this difference, and especially, why best is considered as a non-lemma form? It's not an inflected form of good, is it? — Automatik (talk) 02:43, 19 June 2016 (UTC)

It probably has something to do with the lack of inflected forms for the English term. If you treat meilleur as a form, that makes meilleures a form of a form, which gets confusing. With best, on the other hand, it's always "best"- no matter what it modifies. Chuck Entz (talk) 03:27, 19 June 2016 (UTC)
Also, best is, in fact, an inflected form of good, through the process of w:suppletion, in the same way as лучше ‎(lučše) is the comparative form of хорошо́ ‎(xorošó). Chuck Entz (talk) 03:35, 19 June 2016 (UTC)
meilleur is also a suppletion according to w:suppletion, so it is considered both as lemma and non-lemma form?… — Automatik (talk) 12:29, 19 June 2016 (UTC)
This argumentation doesn't work. We commonly treat participles as forms of verbs, but they have inflections too, including in French. Some languages also have possessive forms for nouns, like Hungarian or Turkish, but we don't treat them as lemmas of their own despite having inflections. I think we should use the same treatment regardless of language, when possible. The consideration I go by is whether you'd expect to find a form in a paper dictionary as a lemma. Participles and comparatives would not normally appear in a paper dictionary, being subsumed under the lemma of the main verb/adjective. So by that reasoning I would not treat them as lemmas on Wiktionary. I would consider nonlemmas that have inflections of their own a "sublemma", a lemma that is part of the paradigm of another lemma. —CodeCat 13:31, 19 June 2016 (UTC)
For sure, the comparative meilleur has a specific entry in French paper dictionaries (under M). Is it the case for best? I don't have any English paper dictionary at home. — Automatik (talk) 13:49, 19 June 2016 (UTC)
That's probably because it's irregular. I wouldn't be surprised if was and were appeared in an English dictionary either. But comparatives generally would not be found there. —CodeCat 15:06, 19 June 2016 (UTC)
Regular comparatives and participles are usually listed within the main lemma entry but without definition in English-language paper dictionaries. The following would be typical:
red adj. Of the color of blood. red·der, red·dest.
walk v. To proceed by placing one foot in front of the other. walks, walked, walk·ing.
Irregular forms, at least those that are alphabetically far removed, would have their own minimal entries, e.g.:
bet·ter comparative of good.
went past tense of go.
Obviously each dictionary is different, but that's sort of typical for paper dictionaries. —Aɴɢʀ (talk) 17:40, 19 June 2016 (UTC)

Redirects to matched pairs[edit]


At least ) has a separate sense: used in lists, like "A) milk, B) eggs, C) flour" so it should be kept as a separate entry and also link to ( ).

It seems that (, ), [, ] can be used alone in set builder notation, so I take it all the 4 entries should be kept as well.

I'd like to do this:

  • Delete all senses of ( and ) that are redundant to ( ).
  • Delete all senses of [ and ] that are redundant to [ ].

And finally:

  • Redirect { and } to { }. (unless { or } can be used alone in some sense)


  • If a symbol is only used as part of a matched pair, redirect the symbol to the matched pair.
  • If a symbol is used as part of multiple matched pairs, create the entry for the symbol and link to all matched pairs.
  • If a symbol is used by itself as well as part of a matched pair, create the entry for the symbol and list the individual uses normally, plus link to the matched pair entry.

--Daniel Carrero (talk) 19:08, 20 June 2016 (UTC)

I agree with you on rules 1 and 2. I am also in favor of rule 3; I think we should include a definition-line pointer to the matched-pair entry (for instance using {{only in}}), rather than e.g. just a "See also" link.
- -sche (discuss) 19:49, 20 June 2016 (UTC)
Maybe we ought to extend rule 3 to apply generally to words that appear as part of a larger idiomatic term? For example, include a link among the definitions of give that leads to give up. —CodeCat 21:24, 20 June 2016 (UTC)
When the number of collocations to be linked to is small (especially if it's just one or two), I support that. For punctuation marks and symbols I could see allowing separate {{only in}} lines for each "collocation" even if there are many of them. But for words with a very large number of collocations, like take (take in, take over, take cover, take back, take up, take up for, etc, etc), I can see how some people might think it was better to list them in a collapsible table as is done at present. An alternative might be a template similar to {{only in}} but which allowed an arbitrarily long list of collocations to be linked to all on one line (rather than separate lines), a bit like how {{&lit}} can link to as many constituent parts as necessary. - -sche (discuss) 22:40, 20 June 2016 (UTC)
I favour putting them among the definitions, though. When someone says "give up", the word "give" in that collocation still has a meaning, but that meaning is only apparent in the combination with "up". It's still the word "give", and per our mission statement, if someone wants to know what a word means, they should be able to look it up. It doesn't matter that it's a collocation or idiom, because the person looking it up might not know that. —CodeCat 23:18, 20 June 2016 (UTC)
@CodeCat, -sche: I agree that ) should link to ( ) in a sense line as opposed to a "see also" section or something.
Would we only have collocations of verb + preposition and adverbs? For example, would the full sense line of give look like this one below?
  1. Used in: give away, give back, give in, give off, give out, give over, give up
--Daniel Carrero (talk) 01:40, 22 June 2016 (UTC)
Something like that, yes. If the list gets too long, we could have a separate section to list them instead, but then that definition should be replaced with "used in: see #section" or similar. —CodeCat 16:47, 22 June 2016 (UTC)

level of detail in English pronunciations[edit]

Under prodigal, the pronunciation looks like this:

/ˈpɹɑdɪɡəl/, [ˈpʰɹ̥ɑɾɨɡɫ̩]

Besides the fact that this is a specifically American pronunciation without labeled as such, do we really need the level of detail expressed in [ˈpʰɹ̥ɑɾɨɡɫ̩]? IMO this is hardly going to help most people and will likely scare a lot of them off. Benwing2 (talk) 21:57, 20 June 2016 (UTC)

AFAIK we are supposed to show phonemic and not phonetic pronunciation. Equinox 23:20, 20 June 2016 (UTC)
We can show both, as long as the phonetic pronunciation is clearly labelled as where it's used, register etc. —CodeCat 23:23, 20 June 2016 (UTC)
Like in every conversation on this topic I restate my conviction that the question should never be 'do we need it' but 'does it harm us'. Korn [kʰũːɘ̃n] (talk) 23:44, 20 June 2016 (UTC)
We're supposed to show phonemic, yes, but there's no ban on also showing phonetic. If the information is correct and correctly-labelled and (ideally) verifiable, include it. Average readers have the broad transcription to look at and advanced language learners and others might be interested in the narrow transcriptions. If the phonetic pronunciations become so numerous that they clutter the entry, collapse them. - -sche (discuss) 23:45, 20 June 2016 (UTC)
At the very least they should not be put on the same line. — Dakdada 08:52, 21 June 2016 (UTC)
  • I consider this sort of thing a case of false precision that should be removed. It's a bit like measuring the distance between two cities down to the nearest nanometer. —Aɴɢʀ (talk) 14:25, 21 June 2016 (UTC)
    I agree with Angr. To use language that would satisfy Korn, false precision is harmful. --WikiTiki89 15:25, 21 June 2016 (UTC)
    Which particular features here would be false precision in your opinion? Aspiration, velarization of coda /l/, and, in American English, medial flapping are quite common features of English pronunciation. Voicelessness of glides after voiceless stops does not seem too bad either. [ɨ] for /ɪ/ and syllabic [ɫ̩] for /əl/ seem more dubious, I suppose. --Tropylium (talk) 21:15, 21 June 2016 (UTC)
    Showing both aspiration of the [p] and devoicing of the [ɹ] is redundant. Flapping is common, but optional, in AmEng, so [ɾ] is not the only possibility. The unstressed vowel is not as far back as [ɨ]. And above all, all this information is predictable, so it doesn't need to be shown. There's a reason why paper dictionaries only give phonemic transcription, not phonetic, and saving space isn't it. —Aɴɢʀ (talk) 21:58, 21 June 2016 (UTC)
    Redundancy is not false precision, optionality is not false precision, predictability is not false precision, claiming that the unstressed vowel is never backed to [ɨ] might be false precision. Declension tables are predictable information too. For languages with the right spelling, the pronunciation section itself is redundant since predictable. If you know the rules, you can predict large parts of most languages down from their proto form. Where draw the line? That said, I agree that false precision is harmful. But I disagree that there is such a thing as too much precision, if the phænomena are well enough recorded. Korn [kʰũːɘ̃n] (talk) 23:30, 21 June 2016 (UTC)
    I do think it's possible to be too precise -- too much obvious detail will swamp the important things and make it harder to read. Benwing2 (talk) 23:43, 21 June 2016 (UTC)
    BTW we ran into this same issue when giving Russian pronunciation. We don't, for example, indicate that non-palatal [l] is heavily velarized, or the exact quality of [ɨ] (which, for example, has a noticeable on-glide preceding it when following labial consonants), but we do indicate the pronunciation of unstressed /a/ as either [ɐ] or [ə] (the rules for this are somewhat complex and easy to forget). The idea here is to include detail that is likely to help language learners and omit detail that is less helpful (either because it's too precise or because it will already be known). Especially unhelpful IMO is including lots of the more obscure IPA diacritics and other symbols, which few people will be familiar with and fewer still will have any idea how to pronounce correctly. Even using [ɾ] for flapped /d/ and /t/ bothers me a bit -- I would be at least as comfortable using [d], even if it's a slight lie. Benwing2 (talk) 23:53, 21 June 2016 (UTC)
    So false imprecision is better than false precision? I'm shocked. The moment we start entering even one smidgen of false information knowingly is the moment when we can scratch the entire project, because we no longer have the goodwill on which this project runs. And as long as we can collapse, I don't see how we can ever get swamped. We can easily make three labeled levels: Archiphonemic (English), phonemic (USA), phonetic (Working Class Michigan) and hide the phonetic levels if they become too many. In all languages. Korn [kʰũːɘ̃n] (talk) 05:27, 22 June 2016 (UTC)
    Giving the phonemic transcription only is not giving false information, but giving the phonetic transcription falsely implies that all other phonetic renderings are wrong, which is harmful. Pronouncing this word without aspiration/devoicing is unusual (except in certain accents like Indian English) but not incorrect. Pronouncing this word without flapping the /d/ is unusual in North America but not incorrect. Pronouncing this word with [ɪ] rather than [ɨ] is normal and not incorrect. Pronouncing this word with a nonvelarized [l] is unusual in North America but not incorrect. That's why this is false precision: it implies that any deviation is wrong, and it isn't. It's like saying the distance from New York to Boston is 13,495,680 inches: it implies that it's more than 13,495,679 inches but less than 13,495,681 inches, which is absurd. —Aɴɢʀ (talk) 11:08, 22 June 2016 (UTC)
    I would not read an implication that everything else is wrong, and even if that was the case, that issue would be fixable by adding labels, even more pronunciations, and not by removing stuff. Korn [kʰũːɘ̃n] (talk) 11:41, 22 June 2016 (UTC)
    No one else has mentioned that this is how they pronounce it so here I am. It's my exact pronunciation (although added by User:msh210). I don't see the harm. If someone wanted to find the phonetic transcription, where else could they find it besides here? Ultimateria (talk) 21:23, 21 June 2016 (UTC)
    OK, I took the liberty of deleting the excessively detailed pronunciation (and adding UK pronunciation in, hopefully I got it right). If we want to put it back we should have a general policy of how to represent phonemic and phonetic detail. I think something like [ˈpʰɹɑɾɪɡəl] is plenty enough detail. Rules for aspiration and flapping are a bit complicated so it may be useful to show them, but devoicing of [ɹ] is obvious and surely excessive, and the quality of [ɪ] and [ə] (and whether the last syllable has a syllabic l) are too variable to quantify, and all /l/'s are velarized in American English so it's probably not necessary to bother with that -- anyone who cares enough about the exact quality of /l/ will almost surely already know that /l/ is velarized. Benwing2 (talk) 23:41, 21 June 2016 (UTC)
    I don't think the devoicing of [ɹ] is obvious to all non-native speakers, nor do I fully agree with your final comment. When learning other languages, I find very detailed phonetic pronunciations extremely helpful, as I am not always able to pick up the finer subtleties of pronunciation just by listening (and by finer, I mean at least as fine as [ˈpʰɹɑɾɪɡəl], and often more specific). In French, for instance, I'm finding that I've become limited in my ability to improve my accent, because nowhere can I find exact enough phonetic transcriptions of words, and I'm often not able to successfully imitate some of the minutiae of pronunciation that I hear. I'm opposed to removing phonetic pronunciations unless their precision is actually false (as opposed to "unnecessary"), but I do think they should be clearly labelled as such. Andrew Sheedy (talk) 04:23, 22 June 2016 (UTC)
    I've restored the pronunciation, labelled as American, as [ˈpʰɹɑɾɪɡɫ̩]. Seeing as many varieties (including US varieties) of English use both [ɫ] and [l], and some languages contrast them, a narrow transcription should distinguish them. I went with [ɫ̩] rather than [əɫ] because the former is what I've seen more of in other entries, e.g. battle, bottle, petal, fiddle (in the broad transcription of that last one — there it should probably be changed to /əl/). - -sche (discuss) 04:47, 22 June 2016 (UTC)
    I think (a select few) non-native speakers might find the transcription of [ɹ] as [ɹ̥] helpful, but I suppose they would be able to find that information elsewhere. Andrew Sheedy (talk) 05:53, 22 June 2016 (UTC)
    No one seems to have mentioned this yet, but reason I would label this as false precision is that it is selectively precise. It is precise about some aspects of the pronunciation and imprecise about others. The problem with that is that our readers will think it is a precise transcription and assume that all aspects of it are precise. What aspects are we being imprecise about? First of all, the [ɫ] symbol is an intentionally imprecise symbol and should never be used in precise transcriptions; this symbol is intentionally vague about whether the [l] is velarized or pharyngealized. Secondly, we are missing the actual place of articulation of the /l/, which for most Americans is dental. Thus the last syllable can be given as [l̪̩ˠ]. Next, the articulation of the /ɹ/ is most certainly not simply alveolar. In fact I'm not entirely sure what it is. But after saying this word over an over, I have come to suspect that in my pronunciation of this particular word, it is [ʟ̹ʷ] (a rounded labialized velar lateral approximant) or perhaps [ɣ̞̹ʷ] (a rounded labialized velar approximant), this also seems to velarize the /p/, giving [pˠʰʟ̹̊ʷ] or [pˠʰɣ̞̹̊ʷ] for the initial consonant cluster. Now we encounter another problem, which is that I have no idea whether all GenAm speakers pronounce it that exact way or not, and if not then by using this transcription we would be making the inaccurate claim that they do. I'm not even gonna bother analyzing the vowel qualities and lengths, but just note that those are another missing piece of precision. My guideline would be that if the phonetic transcription is not illustrating some important peculiarity of a word, then it is superfluous and falsely precise. --WikiTiki89 14:43, 22 June 2016 (UTC)
    The phonetic pronunciation is showing the peculiarity of a specific accent. Something I am absolutely looking for in Wiktionary, it's highly interesting information to me and seems to me to be well apt for our pronunciation section. As long as the data given is correct, I don't see the relevance of other pronunciations which diverge more or less from it. They can be added. Assuming that all people in area X pronounce aword exactly the same way is a lack of understanding that's to be fixed by a lecture on linguistics/phonetics, not a dictionary. Korn [kʰũːɘ̃n] (talk) 11:37, 23 June 2016 (UTC)
    ps.: While I'm usually for assuming that the user is not too well acquainted with linguistics and has a short attention span, clearly anyone knowing how to read IPA in the first place has a basic interest in the topic and can be expected to have a basic understanding. If not, add a disclaimer, don't remove information. Korn [kʰũːɘ̃n] (talk) 11:39, 23 June 2016 (UTC)
    No, the only peculiarity of a specific accent that it is showing is the realization of /d/ as [ɾ] (well and the vowel quality of the first syllable's vowel, but that's already given in the phonemic representation and is actually variable within GenAm). The aspiration of /p/ and the darkening of /l/ are universal in English (perhaps with the exception of small dialects that I don't know about?). The devoicing of the /ɹ/ is not something I've ever noticed or paid attention to before, but I suspect that it is not peculiar to GenAm either. The quality of the second syllable's vowel is disputable (I'm not sure what it actually is), and I don't think it is peculiar to GenAm either. The features I mentioned in my previous post, however, such as the dental nature of the /l/ and the precise articulation of the /ɹ/, are peculiar to GenAm (RP has an alveolar /l/ and in this word I suspect the /ɹ/ is simply [ɹ] or [ɻ], and not velar). The vowel length features of GenAm are also completely ignored (the first syllable has a longer vowel than the other two). The features given are not any more important or interesting than the features not given. --WikiTiki89 15:09, 23 June 2016 (UTC)
    Non-velarised L occurs in Northumbria and Ireland. I'm talking about whether this level of pronunciation should be had in general; I have no merit whatsoever to talk about this pronunciation specifically. I'm just saying that, if e.g. GenAm is /bɜrd/, then having New York: [bɜjd] and Some city: [pɚt] seems to me to be within our scope, and desirable. Every phonetic feature which is distinguishing either for or within the dialect should be visible. So l-velarisation should be featured, for, while it is not phonemic anywhere, it is part of what makes Geordie sound like Geordie. When dealing with most German, just having /a/ and /a:/ might be sufficient, but an extra line for northern accents, where /aː/ and /ɑː/ are contrasting phonemes, actually making that difference, and having that line in the first place, is neither superfluous nor false precision, but simply extra service. Can we be on the same page on that? Korn [kʰũːɘ̃n] (talk) 15:56, 23 June 2016 (UTC)
    Yes, I can agree that "New York: [bɜjd]" is useful, because it does not attempt to be overprecise, it is just highlighting a particular feature. --WikiTiki89 17:18, 23 June 2016 (UTC)

"Category:en:Currencies" and "Category:en:Currency"[edit]

What's the difference between "Category:en:Currencies" and "Category:en:Currency"? Do we need both? — SMUconlaw (talk) 22:14, 21 June 2016 (UTC)

"Currencies" contains the names of particular currencies, while "Currency" contains terms related to currency that are not necessarily currencies. So there is a difference. —CodeCat 01:01, 22 June 2016 (UTC)
Maybe in theory, but that's not the case with those categories at present. Equinox 02:25, 22 June 2016 (UTC)
In that case, the categories need usage notes, and some reclassification is in order. — SMUconlaw (talk) 07:17, 22 June 2016 (UTC)
We really need some clear contrast made between "set" categories and "topic" categories. Other than the fact that "set" categories tend to have plural names and "topic" categories singular names, I don't know how we're supposed to predict which category is of which type. Category:en:Horses, for example, has a plural name and says "English terms for horses", but in fact its content includes lots of terms that relate to horses in some way but are not terms for horses (behind the bit, equine, gait, etc.). Some of them could be moved to Category:en:Equestrianism, but not all of them. —Aɴɢʀ (talk) 10:50, 22 June 2016 (UTC)
Perhaps rename to a clearer word? These little difference are annoying to one that uses of non-plural language just like me. --Octahedron80 (talk) 10:57, 22 June 2016 (UTC)
Wikipedia distinguishes them using the plural in some cases. There's w:Category:Color next to w:Category:Colors. But I do agree that it may make sense to distinguish them more clearly. I just don't know how. Perhaps the simplest solution would be to have Category:Topic:Horses for the topic, or disambiguate the set as Category:Kinds of horses. But then we'd have to do the same for all other categories too, so we might end up with Category:Species of mammals and similar "long" names for all life forms. And the system may not be watertight in any case; someone may still decide to place Stadtkreis in Category:de:Districts of Germany, even if the category may be intended only for the names of actual districts, not terms for specific kinds of districts. Both are sets, but the category would be only intended as one of them. —CodeCat 15:46, 22 June 2016 (UTC)
cat:en:List of colors? --Giorgi Eufshi (talk) 15:54, 22 June 2016 (UTC)
It kind of works, but it also has a connotation to me that implies it's a complete list. Maybe all categories are that way, I don't know, but it feels stronger with "list of". —CodeCat 16:36, 22 June 2016 (UTC)
I don't know if new names are really necessary; it might be sufficient to have more explicit text in the categories themselves. The text currently in CAT:en:Currency and CAT:en:Currencies is pretty good, but maybe they could even say "This is a topic category..." and "This is a set category...". Take CAT:en:Body, which says "English terms for and related to the body and its parts." It seems to be both, as it's both terms for and related to. It should probably be a topic category, with a separate CAT:en:Body parts as the set category. Then, to add to the confusion, there's CAT:en:Anatomy, which I guess is supposed to be just for anatomical technical terms (such as one might learn in anatomy class at university) and not for everyday words, but in practice it's full of every day words for parts of the body. Maybe we should have a third kind of category, the "technical-term category" and label them as such. I was recently at a loss where to put some language's word for "feather". Not in CAT:Birds, because a feather isn't a kind of bird, and not in CAT:Ornithology because it isn't a technical term. CAT:Body is OK I guess, though it seems odd since the average reader probably expects that to refer to the human body. I notice that feather isn't in any category specific to its primary birdy meaning. —Aɴɢʀ (talk) 17:31, 22 June 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I am in favour of fuller usage notes that clearly explain the intended use of a particular category and suggest alternative categories for related words. I just wanted to point out that "This is a topic category ..." and "This is a set category ..." are not clear enough. — SMUconlaw (talk) 21:09, 22 June 2016 (UTC)

People that add categories to entries won't always look at the description on the category page. If they see one language use it, they might add it to their own language entry without checking. I certainly don't look at the descriptions much myself. —CodeCat 21:34, 22 June 2016 (UTC)
I don't look at the descriptions all that much either, but I do look at them when I'm uncertain whether a particular words belongs in a particular category. —Aɴɢʀ (talk) 21:56, 22 June 2016 (UTC)
Nonetheless, it would be useful to define possibly confusing categories on the category pages so that incorrectly categorized words can be spotted and moved. — SMUconlaw (talk) 23:51, 22 June 2016 (UTC)
Perhaps two separate namespaces, like Category (or Topic) and Set (or List)? Functionally they would both operate like categories but the distinction would then be clear(er) even to those who don't look at the cat page. Equinox 22:26, 22 June 2016 (UTC)
Can we actually create new category namespaces? Do we really want to? We can also have Category:Topic:en:Anatomy or similar. But also consider Angr's point that there is a distinction among topical categories between terms related to a topic, and technical terms and jargons used within a field. —CodeCat 00:22, 23 June 2016 (UTC)
Well, one could potentially have a three=way split: "Category:List:en:Religions" (list of religions: Judaism, Islam, etc), "Category:Topic:en:Religion" (words pertaining to religion: god, church, etc), and "Category:Jargon:en:Religion" (or some other word besides "jargon") (for words used chiefly by scholars of religion, like perhaps actual sin). But the last one might be better named "Category:Jargon:en:Theology". In other cases, the Topic and Jargon categories might share a keyword ("...:en:Aviation") while the List category had a different name ("...:en:Aircraft"); that's not a problem, I'm just mentioning it. A related issue the the tendency of people to use labels for all three purposes, making it hard to tell when a sense simply pertains to religion and when only scholars of religion use the sense. - -sche (discuss) 04:11, 23 June 2016 (UTC)
The whole idea of separating topics and sets is likely lost on the vast majority of potential readers, which leads me to believe there's not a lot of benefit in a dual system. Those (such as CodeCat) who want a dual system or even a triple system are quite focused on the minutiae of categorization. They aren't really creating a user-friendly system. Purplebackpack89 15:33, 23 June 2016 (UTC)
I threw the idea out there to see what people would think. I'm only tangentially interested in categories (and would like to see how much use they get, if we could measure such things). But if nobody cared at all, this discussion wouldn't have come up, I suppose. I do think it's worth stating on each category page what it's supposed to achieve, but realise somebody will always add further entries without reading that text. Equinox 16:46, 23 June 2016 (UTC)
  • Merge: Distinction between the two is essentially meaningless. Purplebackpack89 13:36, 23 June 2016 (UTC)